You are here: Home Ideas Implement a "Trusted Information" Strategy
 

Jump into the discussion

Here's a question to get you started:

What is one system that has worked well for you in managing diverse types and sources of data? Why?

Implement a "Trusted Information" Strategy

idea

What is the idea?

Design a system that is based on a standadized data model and utilize data normalization tools and processes to ensure that received information can be trusted as accurate and authentic.  When receiving data from so many different sources it is imperative that the data be accurate so that the business and performance management analytics that are expected to be produced can be relied upon for accuracy.  In order to reach this objective it is essential that a standard data model, including data fields in an XML schema, be parsed to ensure that the data formats are consistent and reliable.  This requires not only that a specification be published but that tools to detect data that is not compliant with the data model specifications can be automatically corrected to meet the needs of the solution.  This goal can be realized by adopting the appropriate strategy, definition, governance, roadmaps and information infrastructure.  Models for this concept can be found at "The New Information Agenda".   Establishing this foundation will enable not only performance evaluations but will permit the assessment of the collected information for other purposes such as fraud and abuse detection and risk mitigation.

Why is it important?

The foundation of a complex system that collects and aggregates massive amounts of data with the intention of using that source data for very significant performance evaluations such as those called for in the reconciliation aspect of the ARRA needs to ensure that the data received is consistent and reliable.

Submitted by JWJones from IBM (Consulting) on Apr 27, 2009

This idea is now closed to further comments.

Current number of stars: 4
based on 12 votes
Tags:

8 Comments

Member comment

Obviously, from my other idea I think this is critical. Data is worse then meaningless if you don't know where it came from and what level of trust it can be given.

The tendancy to focus on the technical implementation details is high but I believe in the case of many of these ideas we should first specify what's being shared and how we can validate that data.

Then give the IT guys a chance to figure out the hows.

Comment from wjhuie on Apr 27, 2009
Member comment

I strongly agree with the posted idea and its first comment.  Unfortunately, I feel that the idea and comment are not seeing something even more fundamental than data - Meaning.

Data and the metadata structure of databases are built on terms, based on meaning, based on contexts, based on definitions, drawn from words in the operational and analytical data and document content of the organization.

Any data management and standardization effort that does not start from a managed terminology of a subject, also called a controlled vocabulary, is building a "closed-world" view of information needed by the users of a database.  A terminology is a series of semantic/meaning structures that includes data models (conceptual, logical, physical) at an intermediate terminology stage.  See the Wikipedia description of Terminology. 

The stages of Terminology I use are:

  1. Word and extracts from operational and analytical data stores and from document content stores (e.g. by search engines)..
  2. Term lists.
  3. Term definitions (e.g., simple, personal, local, group, organizational, standard, global definitions).
  4. Concept maps between term definitions. (as directed labeled graphs - DLG, with underlying triples of node-link-node relationships, for concepts of operation, conceptual data models, object role models such as ORM, logical data models such as ERD, physical data models such as DDL)
  5. Taxonomies for definitions that have broader/narrower meaning, and the terms (often multiple) that have those specific meanings. (Taxonomies also called "reference models" in OMB FEA terms)
  6. Thesaurus of preferred definitions and their terms for a given meaning within a given domain, and their alternative meanings and alternative terms, abbreviations, acronyms, aliases, and variant spellings.
  7. Ontologies representing the diverse viewpoints, definitions as classes of data, their relationships, their class and relationship attributes for a given domain, and the processes and rules of that domain (Ontology represents a viewpoint of a user or group, modeling their knowledge.  The OMB FEA "line of sight" of how an IT system relates to the OMB FEA Reference Models is an simple ontology).  A DoDAF view is a simple ontology.  Integrated DoDAF views across a single system are also a broader ontology of the single system.)
  8. Ontology providing a foundation for diverse ontology unification, interoperability, and federation from a unified viewpoint. (There is no accepted unifying interoperability ontology for the Federal Government or the nation, so no economical and useful technical way to integrate the diverse simpler ontologies and their knowledge-bases into a holistic view.  I offer the public domain unifying ontology presented at http://gem-ema.one-world-is.org) for this purpose.)
  9. Axiology providing a value-based process model representing the diverse value-streams and value-chains of the diverse and unified ontologies.  See gem-ema URL above.

What is needed to build a stable and yet adaptive, "open-world" data architecture, data structure, and data content is to base it on the continually evolving vocabulary of the domain participants, organized into a controlled vocabulary governed by the domain authorities. 

See work by David C. Hay on open vs closed world views.

See the submitted idea on Terminology, Management Controls, and Management Life Cycle for information on building terminlogies.

Comment from RoyERoebuck at One World Information System on Apr 27, 2009
Member comment

It would be good if such a strategy were documented and shared in conformance with AIIM's emerging Strategy Markup Language (StratML) standard.

Comment from oambur at AIIM StratML Committee on Apr 27, 2009
Member comment

Agree 100% -- see my other note.

On October 10, the media is going to swarm Recovery.gov to test the ease of assessibility, and check the accuracy of the information.  Reporting is one thing, but the accuracy of the information -- can it be trusted -- is another thing entirely, and is bigger than a bread box.  Since the Obama administration has made transparency and accountabiulity the centerpiece of its new approach to Gov't, the very success of the Recover Act rests on the credibility of the results reported.  If we turn on Recovery.gov and the mdeia and public find that the info is innacurate, and/or does not tie back in a consistent way to what is being reported elsewhere in the ecosystem -- that credibility is gone.

So how do you get to "Trusted Information" -- As one of the comments points out below, a good place to start is by developing a common vocabulary and semantics to describe how we define the different data elements.  What constitutes a job?  How is a job define?  How is an investment progrma defined?  are these definitions conistent across the Federal ecosystem, and even down to a State level, since state agencies will ultimately need to report as well.   These definitions define the meata data that can then be used in a standardized data model.

Then, you need to think about data collection, aggregation, cleansing, normalization, standardization, etc -- all the basic work that needs to happen if you are going to collect information from across a broad ecosystem, and bring it together in a meaningul way.  This will require a combination of ETL and data quality capabilities, along with Master Data Management.

From there, it is a hop skip and jump to provide this new agggreagted information in Analysis and Visualization tools -- with the confidence that it can be trusted.

What would be even better is if the Data Dictionary (common terms and definitions), the ETL, Data Quality and MDM capability AND the BI/Visualization tools ALL SHARED AND BUILT ON THE SAME METATDATA. 

Imagine this:  You are on Recovery.gov, viewing a report.  Your are not sure if you trust a certain piece of information, and you want to see where it comes from.  You right click on the field, and up pops a window that shows you the lineage of that particular field, its based data sources, how it is calculate, who "owns" it. In other words, all the metadata about the field.    

This would go a long way towards ensuring the credibility of what is represented on Recovery.gov.  the good news is that the capability exists today.

Comment from TPaydos at IBM on Apr 28, 2009
Member comment

I fully concur with TPaydos and OAmbur. 

I would like to emphasize their points that before you can get to governed and trusted data, you need governed and trusted metadata, and before you can get to either, you need agreement or directive across a domain on the meaning for the metadata and data (or meanings in different contexts/ontologies), whatever terms are used by diverse groups (and this needing translation). 

The interoperability of humans, to cultures, to metadata, to data, to IT is based on agreed meanings between individuals, groups, and organizations, not on standardized terms.

Comment from RoyERoebuck at One World Information System on Apr 28, 2009
Member comment

I like this one -- to me it is the crux of the whole system.  If you can't "trust" the information, who cares what is reported.

What is needed is this whole idea of a common dictionary of terms, vocabulary and semantics, as well as things like data lineage, governance, etc. -- all contained in a common meta data repository.

Check out the link below, which I found on IBM's InfoSphere webiste...if this really exists, it would be a homerun.

http://www-01.ibm.com/software/info/television/index.jsp?cat=clients_software_info&media=video&item=xml/L135107Q57342C89.xml&wm=7115001f4919&cm_sp=CTA14-_-SWP00-_-4919

Comment from sfd363 at Citizen on May 01, 2009
Member comment

I absolutely agree with the concept proposed here.  The participants in this program, at all levels, need to ensure that the data they are capturing and passing up or down for reporting needs to be managed by a process that ensures that the data is highly accurate and can be totally trusted.  I understand that there are a number of technologies that are designed to help ensure that the data is captured according to the expected specifications.  Call it data quality assurrance or data cleansing.  Either way that sort of process will lead to data that you can trust.

Comment from Jerry703 on May 03, 2009
Member comment

Standardization will go a long way in improving data accuracy and trust. The development of standards is an ongoing process. The emerging role of a responsiveness archtect will help drive the sytem towards common data standards.

Comment from qrehmani at EDMC on May 04, 2009