You are here: Home Ideas Making stimulus spending data accessible to the public
 

Jump into the discussion

Here's a question to get you started:

What sites or programs do you think do an excellent job aggregating and visualizing data for users? What is good about what they do?

Making stimulus spending data accessible to the public

idea

What is the idea?

Under the provisions of the American Recovery and Reinvestment Act (ARRA) of 2009, Federal agencies as well as recipients of stimulus money are required to report certain spending and performance data for aggregation on Recovery.gov. This mandate provides an unprecedented level of transparency and accountability over investment of stimulus dollars. It also provides agencies with a complex new reporting requirement that must be rolled out fairly quickly.

To meet the accountability and transparency objectives of the Act, agencies and prime recipients must have the ability to provide ARRA-mandated reports meeting the following criteria:

  • Provided in a timely and accurate manner
  • Accessible to the public and searchable (discoverable) using readily available search technologies
  • Published in a structured format (as opposed to unstructured text formats) to facilitate value-added data aggregation and analysis

The easiest way to accomplish this with the least amount of risk is to implement a distributed reporting architecture leveraging widely available Service Oriented Architecture (SOA) - based web standards and technologies such as extensible markup language (XML), Really Simple Syndication (RSS), Atom, extensible hypertext markup language (XHTML), JavaScript Object Notation (JSON), Comma-separated values (CSV), keyhole markup language for geo-encoding (KML), representational state transfer (REST), Simple Object Access Protocol (SOAP), and other modern data access standards.  In this scenario, agencies and prime recipients would perform the following actions:

  • Extract ARRA-mandated data elements from relevant backoffice systems (e.g., financial management systems, grants management systems, etc.) using an appropriate data access application programming interface (API) (e.g., JDBC)
  • Transform those data elements into ARRA – mandated reports such as funding notification reports, periodic financial reports, and award-level reporting
  • Publish ARRA – mandated reports to the Web as "data feeds" using one or more of the aforementioned formats as appropriate. Once published to the web, these feeds could then be combined in value-added ways using “mash-ups,” a popular state-of-the-art web development technique.  They can also be harvested and aggregated by Recovery.gov.
  • Provide a "sitemap" using the popular Sitemaps standard (http://www.sitemaps.org/) to make it easy for subscribers to find all of your feeds

Once these remote data feeds have been published, the Recovery.gov website can aggregate them on an ongoing basis using readily available "spider" or web crawler technology.  The "spider" essentially visits each data feed on a regular, ongoing basis, checks it for changes, parses it, and posts it to a local index or data mart.  This index is then used by Recovery.gov as a data store to allow the public to search and perform analytics on stimulus spending.

There are numerous readily available technologies, both commercial and open source, to implement the features described in this article.  For example, Unisys has developed a Recovery Act reporting solution that can extract ARRA data elements from their host systems, aggregate them into ARRA-mandated reports, and publish those reports to the web as RSS / Atom feeds, KML feeds for geospatial visualization, CSV for spreadsheet integration, and custom XML vocabularies for integration with other applications. The Unisys Recovery Act reporting solution was developed using various open source enterprise Java technologies, and will integrate with most Government web hosting environments.

Submitted by andyhoskinson from Unisys (Application Development) on Apr 27, 2009

This idea is now closed to further comments.

Current number of stars: 4
based on 21 votes
Tags:

8 Comments

Member comment

Great points - definitely whole-heartedly agree that the data needs to be foremost shared via lightweight, open formats. 

 

Researchers at Berkeley have published an outline, and even example data and feeds, for Recovery.gov. Check out http://isd.ischool.berkeley.edu/stimulus/2009-029/ and their report available at http://www.ischool.berkeley.edu/newsandevents/news/20090417recoveryguidelines

Comment from ajturner at FortiusOne (GeoCommons) on Apr 27, 2009
Member comment

This comment deleted. Please read our moderation policy.

Comment from ajturner at FortiusOne (GeoCommons) on Apr 27, 2009
Member comment

The notion of feed aggregation is something that IBM has been working with in research for some time.  This is definitely a great idea.

 

Take it one step further, though.  All data on recovery.gov itself should be provided as feeds also.  This would allow interested citizens to build their own analysis tools to mine the information according to their interests. Instead of having just transparency, you then have a whole army of concerned people augmenting the work of the GAO and other watchdogs!

 

I personally favor the use of atom feeds over SOAP, just because it is a lighter-weight format.

Comment from cliff_hayden at IBM on Apr 28, 2009
Member comment

I'm one of the Berkeley researchers mentioned above involved with making recommendations on how data feeds should be use to make the recovery more transparent (see http://www.ischool.berkeley.edu/newsandevents/news/20090417recoveryguidelines and http://isd.ischool.berkeley.edu/stimulus/2009-029/)

Although some (but not all) agencies receiving and dispersing recovery funds are using feeds in their reporting (see a list that we compiled at http://isd.ischool.berkeley.edu/stimulus/feeds/feeds.html), the best data on dollars appropriated, obligated, or spent is in the Excel spreadsheets.  Although there are apparently templates for the reports, they keep changing format and there's nothing to stop agencies from inserting extra fields or omitting other fields.   We know this for a fact since we've written programs to scrape the data from the spreadsheets and find it a challenge to keep up with changes that keep breaking our scripts.

The federal government should made the data in the form of  XML feeds in the first place (backed by a schema so that we can check that the data is valid),  instead of making people who want to use that data scrape it out of Excel in a highly fragile process.

Comment from raymondyee at UC Berkeley on Apr 28, 2009
Member comment

Excellent idea. I only hope someone is listening and reading this.

Comment from swelter on Apr 30, 2009
Member comment

Yes please make all the data available in XML with Schema information that shows relationships between tables and allows for data validation.

Comment from weex on Apr 30, 2009
Member comment

What is the meaning of the data, in a given context?  Having the data, without an underlying data dictionary or full or partila terminlogy, would be useless.  Make sure that the semantics of the information are documented and shared before the data is given out.

Terminology is needed before syntax and technology.

Comment from RoyERoebuck at One World Information System on May 02, 2009
Member comment

This proposal hits squarely within the sweet spot that Federal CIO Vivek Kundra pursued while CTO of the District of Columbia, and the success of this approach is plain to see in the results of the Apps for Democracy effort.  Beyond the basic concept of lightweight, easy-to-use data formats and feeds, effort and care should be taken to ensure that adequate metadata is provided, documenting source and lineage, completeness, timeliness and accuracy - many examples of these types of approaches exist.  Metadata is critical toward informed decisionmaking and limiting liability and error  Additionally, discovery these types of resources should be facilitated, again a variety of approaches toward data discovery exist, such as OpenSearch.  And finally, this type of effort should be supported on a federated basis, where individual stakeholders can easily stand up their own interoperable, standards-based services using Open Source and/or COTS tools.  The closer the data is to its source, the more likely it is to be current, complete and reliable.  These types of federated approaches, if using vendor-neutral, platform-agnostic and standards-based technologies geared toward interoperability, can easily be aggregated in a variety of means.

Comment from DruidSmith at Synergist Technology Group, Inc. on May 03, 2009