Making stimulus spending data accessible to the public
idea
What is the idea?
Under the provisions of the American Recovery and Reinvestment Act (ARRA) of 2009, Federal agencies as well as recipients of stimulus money are required to report certain spending and performance data for aggregation on Recovery.gov. This mandate provides an unprecedented level of transparency and accountability over investment of stimulus dollars. It also provides agencies with a complex new reporting requirement that must be rolled out fairly quickly.
To meet the accountability and transparency objectives of the Act, agencies and prime recipients must have the ability to provide ARRA-mandated reports meeting the following criteria:
- Provided in a timely and accurate manner
- Accessible to the public and searchable (discoverable) using readily available search technologies
- Published in a structured format (as opposed to unstructured text formats) to facilitate value-added data aggregation and analysis
The easiest way to accomplish this with the least amount of risk is to implement a distributed reporting architecture leveraging widely available Service Oriented Architecture (SOA) - based web standards and technologies such as extensible markup language (XML), Really Simple Syndication (RSS), Atom, extensible hypertext markup language (XHTML), JavaScript Object Notation (JSON), Comma-separated values (CSV), keyhole markup language for geo-encoding (KML), representational state transfer (REST), Simple Object Access Protocol (SOAP), and other modern data access standards. In this scenario, agencies and prime recipients would perform the following actions:
- Extract ARRA-mandated data elements from relevant backoffice systems (e.g., financial management systems, grants management systems, etc.) using an appropriate data access application programming interface (API) (e.g., JDBC)
- Transform those data elements into ARRA – mandated reports such as funding notification reports, periodic financial reports, and award-level reporting
- Publish ARRA – mandated reports to the Web as "data feeds" using one or more of the aforementioned formats as appropriate. Once published to the web, these feeds could then be combined in value-added ways using “mash-ups,” a popular state-of-the-art web development technique. They can also be harvested and aggregated by Recovery.gov.
- Provide a "sitemap" using the popular Sitemaps standard (http://www.sitemaps.org/) to make it easy for subscribers to find all of your feeds
Once these remote data feeds have been published, the Recovery.gov website can aggregate them on an ongoing basis using readily available "spider" or web crawler technology. The "spider" essentially visits each data feed on a regular, ongoing basis, checks it for changes, parses it, and posts it to a local index or data mart. This index is then used by Recovery.gov as a data store to allow the public to search and perform analytics on stimulus spending.
There are numerous readily available technologies, both commercial and open source, to implement the features described in this article. For example, Unisys has developed a Recovery Act reporting solution that can extract ARRA data elements from their host systems, aggregate them into ARRA-mandated reports, and publish those reports to the web as RSS / Atom feeds, KML feeds for geospatial visualization, CSV for spreadsheet integration, and custom XML vocabularies for integration with other applications. The Unisys Recovery Act reporting solution was developed using various open source enterprise Java technologies, and will integrate with most Government web hosting environments.