You are here: Home Ideas Make collecting recovery data agile using semantic web technology
 

Jump into the discussion

Here's a question to get you started:

What is one system that has worked well for you in managing diverse types and sources of data? Why?

Make collecting recovery data agile using semantic web technology

idea

What is the idea?

The goal of recovery.org is to create transparency around how stimulus dollars are being allocated and spent. The challenge to tracking, correlating, and analyzing relevant information is that this data is extremely heterogeneous. Some is captured in the spreadsheets that agencies post to recovery.gov. But, other important sources of data originate from many distinct agencies and cover a wide range of projects with differing metrics, definitions of terms, andcharacteristics.  Data residing in a myriad of systems and formats make it hard for the average citizen to access it and interpret it.  In short, complexity inhibits transparency.

Recovery.gov should adopt semantic web technologies as key enablers for promoting an agile, transparent data ecosystem in which federal agencies and other recipients of stimulus monies can share spending and performance data in a way that is truly transparent, readily available, and useful to anyone who wants to view, consume, or analyze the information. 
 
Semantic technologies were created to solve the problem of how to link and query heterogeneous data from files, databases, documents, and web pages.  The World Wide Web Consortium (W3C) publishes a set of open technology standards, known as Semantic Web technologies, for integrating disparate data in a way that is well-defined, traceable, easy to consume, and supported by a variety of tools.
 
Semantic Web technologies ease data collection by acting as a virtual integration layer. That is, agencies can and should continue to collect data in traditional databases, XML files, or Excel spreadsheets. Semantic Web technologies weave this information into a unified fabric so that it can be examined, analyzed, and understood by other citizens, legislators, oversight bodies, etc. 
 
Application of semantic web technologies contributes to a “Web of Linked Data” that allows anyone with a Web browser to ‘surf’ a trail of data back to its source and better understand the information being published under ARRA's requirements.

Why is it important?

ARRA requirements represent a fundamental step forward for transparency, openness, and accountability on the part of government agencies and those benefiting from stimulus funds. But, accessing spreadsheets that agencies use to summarize activity to recovery.gov is only part of the story. To detect waste, fraud, and abuse we need more encompassing data forensics.  That is, information wants to be linked and correlated with heterogeneous information and data from many other sources. Making it easy to expose the semantics and structure of varied sources in a standard way is critical to investigative success.  If the data is stored and presented in a way that is difficult to access or confusing, the interpretation will be that the government’s intention is to be something less than transparent.  Making the data available in a way that is easy to consume and use will foster and image of true transparency on the part of the government.
 

Member comment

I full concur with Mills' idea.  The Semantic Technologies he is describing are needed to build the National Terminology and National Management Life Cycle I've described in own posted ideas.

Comment from RoyERoebuck at One World Information System on Apr 28, 2009
Member comment

I strongly believe this is the correct approach - and the literature backs up this claim. The Web has proven itself as a remarkable tool for sharing information, expressing data in the same manner is a natural development.

Comment from danja on Apr 30, 2009
Member comment

I fully agree with this concept. One of the goals of Recovery.gov is to make information about how stimulus money is spent availalbe to the average american. Assuming first that the average american even has access (some good comments already about virtualization and cloud computing to aid this), this kind of technology is necessary to present visual representations that will be easy to navigate. Some good examples already exist in news media we sites.

Comment from turpyns at Unisys on Apr 30, 2009
Member comment

I also agree wholeheartedly with this idea.  Semantics is needed to bridge the various organizations participating in the Recovery and their different uses of language and terminology, thereby increasing transparency.  It is also essential to keeping the process more fluid, dynamic and responsive, since a semantic web can more readily evolve.

Comment from rdamashek at Binary Group on May 01, 2009
Member comment

This is a win-win proposition. Most of all, by doing this the USA gov will catchup and possibly establish a leadership position on the use of technology that is essential for eGovernment / smart grids / virtual townhall meetings. Addtionally, it will introduce a strategic boost to semantic technology providers who are predominantly small businesses.

Comment from carlmattocks at CheckMi:Understanding on May 02, 2009
Member comment

Same here.  I fully agree with this idea and with the posted comments so far.  Semantically-enabled data on the web is one of those technologoies on which all sorts of tools and strategies will be built that we haven't thought of yet.  Linking data together in non-obvious ways is key in finding patterns and relationships in data we never knew.  (I can't help but think that it's these new found relationships, bring disparate people and groups together that never were previously, that may Facebook, LinkedIn, and Twitter so popular)

Comment from diodata at University of Delaware on May 03, 2009
Member comment

You know, using semantic technologies together with spreadsheet technology at the point of submission would be a really smart move. It would benefit the submitter two ways: first they would be able to report out (visualize) their data in a variety of ways easily using semantic lens technologies; second, they would be able to see (automatically, since the table structure for agency data submissions is the same) and analyze their data combined with that of other agencies. Just going this far is a benefit.

But, having data and table definitions in semantic form has other benefits. It makes it easier to combine recovery.gov data together with other information and data sources -- data, form, document, and web page. For example, to associate recovery.gov data with data.gov data and with open linked data from across the web. And much more.  

It's time to make semantics and knowledge representation first class citizens in the Federal IT approach.

Comment from millsdavis on May 03, 2009