You are here: Home Ideas Use semantic technology to link data.gov and recovery.gov resources
 

Jump into the discussion

Here's a question to get you started:

What sites or programs do you think do an excellent job aggregating and visualizing data for users? What is good about what they do?

Use semantic technology to link data.gov and recovery.gov resources

idea

What is the idea?

Use semantic web technologies to expose metadata and structure of information to link data on recovery.gov with information shared through data.gov. 

The first level win is to accelerate the process of providing raw data, via a RSS feed. This implies a restful API plus providing definition of the data elements and structure so that people know what they are getting. Semantic web technologies are preferable to XML and microformats for this because they are designed to share and expose definitions of tags as well as data, and to map different sets of tags (i.e., schemas or ontologies) with eachother

The second level win is to provide a solution for semantic publishing of information (e.g. data, text, tables, graphics, imagery) tagged and linked for both human and machine interpretation. For example, the Statistical Abstract is a report, well curated, and with a lot of data tables, etc. and also, well thought out metadata, and statistical methodologies about how the data are derived, etc. Making information in recovery.gov and data.gov available for both human and machine processing / interpretation implies being able to expose the concepts and structure of the different kinds of information linked to an underlying ontology. First step might be to post the content in a web 2.0 wiki and upload documents and data sets for downloading. The next step would be to expose the structure / semantics of this information using semantic web technologies. Plus, it is always wise to engage the people who publish the information to help with the modeling. All doable today.

The third level win is to use semantic metadata and modeling to facilitate data mash-up of different information from different sources:

(a) In some cases this is relatively easy to do with web 2.0 techniques. Someone accesses a couple of RSS feeds, and combines these with other data and web services. The mash-up maker resolves any semantic issues as part of creating the mash-up.

(b) In other cases, a new use of information can take advantage of the curation and semantic data linkages that some agency, or other party has already provided and expressed as ontology, making it possible to establish user interface features that allow for do-it-yourself exploration and conversations with data.

(c) In still other cases, the alignment of concepts and resolution of intents, definitional and methodological issues across data will require additional effort.

Obviously, you go for the low hanging fruit first.

Why is it important?

We are moving into an era where we will expect to access and interrelate all information in the internet information by concepts, not just by artifacts and hard coded structures. While it is possible to use XML syntax to identify concepts in hierarchies, assuming we all use the same tags in the same way, this is not adequate for representing and reasoning over the diversity of structured data, documents, and web content. Simple solutions are great. But, we also need an approach that can scale, handle complexity, and can adapt and evolve easily as needs change. Semantic web technology provides a better choice for this. 

Submitted by millsdavis (Consulting) on Apr 28, 2009

This idea is now closed to further comments.

Current number of stars: 3
based on 23 votes
Tags:

6 Comments

Member comment

In theory, this would be GREAT. In practice, it's something of a roll of the dice, isn't it? Is semantic technology sufficiently "there" to handle a project of this undertaking? Maybe it's worth giving Sir Berners-Lee a shout to get his take on the feasbility? 

Comment from tgwilson on Apr 28, 2009
Member comment

Mills,

I concur with the notion of Govt. data being published in formats that facilitate granular and structured access to raw data thereby empowering "citizen analysts" amongst other things.

If the govt. becomes the largest and most aggressive producer of RDF model based Linked Data, it would go a long way to revitalizing the broader economy.

Format options include: RDFa, RDF/JSON, RDF/XML, N3, Turtle, TriX etc..

Links:

1. http://bit.ly/SwvLS - recent blog post about RDF and Linked Data


Kingsley

 

Comment from kidehen at OpenLink Software on Apr 28, 2009
Member comment

I like where this is going, but worry that the use cases for the data are not well understood.  the more data that is put out for consumption, the more the need for adjudication and reconciliation of overlapping data where the majority of time the main differentiator is which agency published it.  With all the overlap in government programs, its very hard for the bureacracy to make sense of each other's data.  Asking the citizen, business community, or watch dogs to do so is very risky.  As i have maintained for 9 years, we need to simplify and unify the data, or at least rationalize it as we make it available to the public.  Also, we need to filter the bad data out of our systems.  Semantics should help that a lot.

Comment from maforman on Apr 28, 2009
Member comment

This is certainly feasible -- and very important!  There is a growing cloud of public data using this technology, from all kinds of public sources.  Already people are going to a lot of trouble to take existing data sources and convert them into Linked Open Data.  It is very important that this recovery data is made available in Linked Data form from the start.

So that it is interoperable with the the other linked open data out there -- especially the data to come in the future.

Tim Berners-Lee

 

Comment from timbl on Apr 30, 2009
Member comment

"Linked Open Data (LOD)" as a concept is very important and useful, but since some of the comments jumped to the conclusion that this necessarily means Semantic Web technologies, we want to point out that there are more lightweight and better accessible solutions to implementing LOD. specifically, XML and REST provide linked open data in a way that is supported in many more programming environments and technologies as the more advanced RDF/PWL technologies. Our Proposed Guideline Clarifications for American Recovery and Reinvestment Act of 2009 proposed and implemented linked open data based on Plain Web technologies (specifically, HTML, Atom feeds, and XML), which we believe would greatly improve the usability and accessibility of data made available by recovery.gov.

Comment from dret at UC Berkeley on Apr 30, 2009
Member comment

This definitely can be achieved with a few caviats.

  1. Using XML without specifying specific flavors of XML is not useful. There are over 1000 known XML data standards. It is not practical to have 1 XML standard however open public standards from ANSI X12, OASIS, UN/CEFACT, and a few other De Jeur standards bodies should be given high consideration.
  2. Semantic technologies do not resolve interoperability at the lowest level of data. Use of semantic technologies need to be augmented with data harmonization and standardization at the lowest level of data.

Comment from SWebb at Vision4Standards on May 03, 2009