Documentation

This is a guide to document how this website measures and tracks implementation of Project Open Data as well as other tools and resources it provides to help agencies implement their open data programs.

You can help edit this documentation on GitHub.

Agency Dashboard

The Agency Dashboard is used to track how agencies are implementing Project Open Data (aka OMB M-13-13). This is done in two ways:

  1. Review of the leading indicators (detailed below) by OMB staff
  2. Automated metrics that analyze machine readable files (eg, data.json, digitalstrategy.json)

The leading indicators are scored by OMB staff and they can involve a more subjective evaluation process. The indicators are also informed by the automated metrics which are generated by a daily automated script that analyzes files on agency websites to understand the progress and current status of their public data listings.

Milestones

The dashboard is oriented around quarterly milestones. You can use the blue milestone selection menu to navigate between milestones. The OMB scoring as well as the automated metrics are always tied to a specific milestone. The automated metrics will update every 24 hours until the end of the quarter when the milestone has been reached. At that point those automated metrics will represent a historical snapshot. To see the most current automated metrics, you'll need to view the current quarter (the next approaching milestone).

Leading Indicators Strategy

The "Leading Indicators Strategy" refers to the five categories of indicators drawn from the Cross Agency Priority Goals (CAP Goals) for Open Data. The strategies are based on the Enterprise Data Inventory, the Public Data Listing, Public Engagement, Privacy & Security, and Human Capital and are all detailed below.

Leading Indicators

The Leading Indicators Strategies described above are broken down here into their component parts:


Enterprise Data Inventory

Overall Progress this Milestone

This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)

Inventory Updated this Quarter

This element captures whether or not an agency had submitted an updated Enterprise Data Inventory into OMB Max by the milestone deadlines.

Number of Datasets

This element accounts for the total number of all datasets listed in the Enterprise Data Inventory. This includes those marked as "Public", "Non-Public" and "Restricted". (Quantitative)

Schedule Delivered

This element captures whether or not an agency has successfully submitted a schedule, via the digitalstrategy.json, or via another document on thier agency's website, that indicates a schedule of deliverables against the various outlined milestones for the Open Data Initiative. (Qualitative)

Bureaus represented

This number represents the number of bureaus (based on codes from OMB Circular A-11 for the Common Government-wide Accounting Classification - CGAC in Appendix C) that have data sets reported in the agency's EDI. (Quantitative)

Programs represented

This is a count of primary agency programs that are represented within the EDI based on the Federal Program Inventory. (Quantitative)

Access Level = Public

This is a count of data assets that are or could be made publicly available to all without restrictions. (Quantitative)

Access Level = Restricted

This is a count of data assets that are available under certain use restrictions. (Quantitative)

Access Level = Non-Public

This is a count of data assets is not available to members of the public. (Quantitative)

Inventory > Public listing

This is a comparison of the count of data sets (including those marked as "Public", "Non-Public" and "Restricted") in the EDI versus those in the entire public data listing. In is rare for the EDI to be equal to the PDL (which indicates all data sets are publically accessible) and is often greater than the PDL. If the EDI is less than the PDL, this indicates an error in reporting and collection. (Quantitative)

Percentage growth in records since last quarter

This is calculated by subtracting the last quater's EDI count of data sets from the current quarter's EDI count of data sets, then divided by last quarter's EDI count of data sets, then multipied by 100 in order to get the percentage ([(Qb - Qa) / Qa] * 100) (Quantitative)

Schedule Risk for Nov 30, 2014

This is an objective evaluation (Green = On Schedule, Yellow = Possible Schedule Issues, Red = Schedule Miss/Incomplete) if an agency will be able to make/deliver on their published deigital strategy deliverables for Open Data milestones that were outlined in OMB M-13-13. (Qualitative)

Spot Check - Site search, SORNs, PIAs, FOIA

This is a check by OMB eGov for SORNs (System of Records Notices), PIAs (Privacy Imapct Assessments), FOIA (Freedom of Information Act) statements, and through a search for typical data file types, for example (the number in parenthesis indicates how many files matched in the search were returned - the example below is via Google):

allinanchor: site:agencydomain.gov filetype:xls (5,000)

allinanchor: site:agencydomain.gov filetype:csv (300)

allinanchor: site:agencydomain.gov filetype:xml (38,000)


Public Data Listing

Overall Progress this Milestone

This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)

Number of Datasets

This element captures the count of publically listed data sets via the published Public Data List, and corresponds to the number captured during the dashboard's automated crawl. (Quantitative)

Number of Downloadable Datasets

This element captures the count of downlaodable publically listed data sets via the published Public Data List, and corresponds to the number captured during the dashboard's automated crawl. This shoudl correspond with "accessURL" in the PDL JSON file that is the URL providing direct access to the downloadable distribution of a dataset. (Quantitative)

Percentage growth in records since last quarter

This is calculated by subtracting the last quater's PDL count of data sets from the current quarter's PDL count of data sets, then divided by last quarter's EDI count of data sets, then multipied by 100 in order to get the percentage ([(Qb - Qa) / Qa] * 100) (Quantitative)

Valid Metadata

See the section for Valid Schema

/data

This element indicates if an agency has published a page for their Open Data activities, often containing links to their data catalog, links to other Open Data related documents, as well as the Digital Strategy.

/data.json

This element collects whether the agency has successfully published a data.json file, which contains the whole of the Public Data Listing.

Harvested by data.gov

This element captures if DATA.GOV has harvested the PDL for indexing via regular crawls. This usually requires notifying GSA (who houses the DATA.GOV team) to index the PDL.


Public Engagement

Overall Progress this Milestone

This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)

Description of feedback mechanism delivered

This element is a narrative provided by the agency, through the Digital Strategy, on how it plans to enage the public for Open Data initiative activities, including developing two-way forms of communication (e.g., social media, etc.), issue tracking, outreach and other items. These methods sould add-vaule to the open data activities and directly address the public and customer needs. (Qualitative)

Data release is prioritized through public engagement

This is a measure, based on information provided to OMB for review, or gathered from the agency Open Data websites and public engagement mechnaisms, that data sets have been identieid by the public and been prioritized for release based on that engagement (such as e-mail, public open data events, IdeaScale/GitHub/Twitter, etc.) activity. This may include those that were requested via FOIA menchanisms or other formal requests. (Qualitative)

Feedback loop is closed, 2 way communication

This element is an asessment, based on information provided by the agency and confirmed, through review of published public feedback mechanisms by OMB (including reviewing post-event outcomes - such as those from datajams and datapaloozas), if the input from public engagement is acted upon and produces an output to the open data milestone activities, such as inclusion of data sets, quality improvement, format changes, API development or other outcomes. (Qualitative)

Link to or description of Feedback Mechanism

This element should contain a link (URL, email address, etc.) of the primary feedback mechanism used for customer engagement. If more than one is regularly used, this should be a small narrative about each mechanism and how it's used to interact with the public for engagement activities.


Privacy & Security

Overall Progress this Milestone

This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)

Data Publication Process Delivered

This element captures the state of the Open Data publication process deliverable. The process is often located withing the Digital Strategy for an agency, and is usually contained and updated withing the JSON file. Some agencies have indepdently published this schedule on their websites separate from the Digital Strategy site, which is not recommended.

Information that should not to be made public is documented with agency's OGC

As part of the Data Publication Process (this element can't be "Green" without the previous element existing), the Office of General Counsel (OGC) or the agency's Office of the Solicitor, is listed as part of the process for determining which data sets are to be released publically. (Qualitative)


Human Capital

Overall Progress this Milestone

This element is a collection of the qualitative and quantitative measures and an objective assessment of meeting this milestone is compiled and rated (Green = On Schedule to Complete Milestone, Yellow = Possible Milestone Delivery Problem, Red = Will Miss Milestone) (Qualitative)

Open Data Primary Point of Contact

This element should contain the name (and/or contact information) for an agency's primary point of contact for Open Data Initiative activities.

POCs identified for required responsibilities

This element accounts for the agency identifying and publishing primary points of contacts for Open Data activities.


Automated Metrics

These fields are determined by an automated script that analyzes agency data.json, digitalstrategy.json, and /data files.

The automated metrics will update every 24 hours until the end of the quarter when a milestone has been reached. At that point those metrics will represent a historical snapshot. To see the most current automated metrics, you'll need to view the current quarter (the next approaching milestone).

Expected URL

This is the URL where the data.json file is expected to be found. This is based on the main agency URL provided through the USA.gov Directory API

Resolved URL

This is the URL that is resolved after following any redirects.

Redirects

This is the number of redirects used to reach the final data.json URL. Currently this is only set to follow 5 redirects before stopping.

Ideally this should be 0

HTTP Status

This is the HTTP status code received when attempting to reach the expected or resolved URL. For more information on properly using HTTP status codes, see: Knowing Your HTTP Status Codes In Federal Government

This should be 200 it the data.json or /data URL was found successfully.

Content-Type

The Content-Type is how the server announces the type of file it is serving at the requested URL. Usually it won't break anything if this is set incorrectly, but some applications may need to be set to force it to be read as JSON even if it announces it's something else. This is very similiar to how a file extension on a file identifies the file type. Yes, the URL says data.json, but the browser just sees that as an arbitrary URL. The Content-Type is what identifies the actual file type. Setting this incorrectly would be like if you had a file named graph.pdf that was actually a CSV spreadsheet file.

The character encoding should also be specified as part of the Content-Type. This encoding should match the actual encoding of the text in the file. The correct character encoding for JSON is always unicode, preferably UTF-8.

For data.json this should be: application/json; charset=utf-8

For /data this should be: text/html; charset=utf-8

Valid JSON

This identifies whether the data.json was actually JSON. Even if the HTTP Status is 200 for the data.json URL and the Content-Type announces it's application/json; charset=UTF-8 the response might actually be HTML or improperly formatted JSON. If the syntax of the file can be parsed as JSON, the validator will attempt to do additional analysis, but the file may in fact still be invalid JSON if it doesn't use the proper text encoding. While it is possible for the validator to convert the file to the correct encoding to do this additional analysis, it's important that the correct encoding be used at the source so that others will be able to parse the JSON without knowing they need to convert it to a valid encoding. JSON must use Unicode text encoding (use UTF-8) and it should not include a byte order mark. It's highly recommend you generate your JSON with a tool designed to produce JSON rather than attempt to produce JSON by hand. You can check how well formed your JSON is with a tool like JSONLint. When using this tool it is best to enter the URL of the JSON file rather than copying and pasting the JSON. This is because when you copy and paste the raw JSON, your browser may attempt ot automtically fix problems that the server will not know to fix when it retrieves the file directly.

The "Public Datasets" column on the main agency dashboard table will be green if it's a valid JSON file and red or yellow otherwise. If it's not a valid JSON file, the "Valid Metadata" column can't be green - at best it can be yellow. If it's not valid JSON it most likely can't be parsed regardless of how valid the metadata schema is, so this is a serious consideration. This also means it's possible to be listed under the "Valid Metadata" column in yellow even if 100% of the records validate against the schema.

Datasets with Valid Metadata

The percentage and specific number of datasets in the data.json file that successfully validate against the Project Open Data schema.

The "Valid Metadata" column on the main agency dashboard table will be green if 100% of the metadata records validate against the Project Open Data schema and they are from a valid JSON file. It's possible to have 100% valid metadata records but still be shown as yellow if it's not a valid JSON file. Any record that doesn't validate against the schema won't meet the requirements and also won't be included by harvesters like data.gov.

Valid Schema

This identifies whether the data.json has all the required fields and has values that fit within the parameters specified by the Project Open Data schema.

Schema Errors

This displays instances where the data.json doesn't validate against the Project Open Data schema based on rules codified within a JSON Schema document hosted on Project Open Data. For more detailed and more readable results, you should use the Project Open Data validator

Datasets

The total number of datasets listed in the data.json file

Datasets with Downloadable URLs

The total number of datasets listed in the data.json file that include an accessURL for a downloadable file

Total Downloadable URLs

The total number of accessURL download links listed for all datasets in the data.json file

Server Not Found

The number of accessURL download links with a server or domain name that could not be reached. In the error log CSV file this is listed with an error_type of "broken_link" and an http_status of "0".

Broken links (accessURL 4xx)

The number of accessURL download links where the server responded indicating the URL could not be found. In the error log CSV file this is listed with an error_type of "broken_link" and an http_status of anything that starts with "4".

Error Links (accessURL 5xx)

The number of accessURL download links where the server responded indicating the URL had an error preventing it from properly working. In the error log CSV file this is listed with an error_type of "broken_link" and an http_status of anything that starts with "5".

Redirected Links (accessURL 3xx)

The number of accessURL download links where the server responded indicating the URL had moved to a new location. In the error log CSV file this is listed with an error_type of "broken_link" and an http_status of anything that starts with "3".

Correct format (accessURL/format)

The number of accessURL download links where the server responded indicating that the format of the resource did not match what was specified in the data.json metadata. In the error log CSV file this is listed with an error_type of "format_mismatch" and the format specified by the server is format_served while the one listed in the data.json is format_datajson

PDF for raw data (accessURL)

The number of accessURL download links where the server responded indicating that the format of the resource was a PDF file. The accessURL should point to raw machine readable data (like a spreadsheet) rather than documents. Use references, dataDictionary for documents meant to accompany data.

HTML for raw data (accessURL)

The number of accessURL download links where the server responded indicating that the format of the resource was an HTML file. The accessURL should point to raw machine readable data (like a spreadsheet) rather than documents. Use references, dataDictionary, or landingPage for documents meant to accompany data.

Bureaus Represented

The number of bureaus used throughout the data.json metadata as specified with bureauCode.

Programs Represented

The number of programs used throughout the data.json metadata as specified with programCode.

Data.json File Size

The size of the data.json file the last time it was checked by the validator (for the selected milestone)

Data.json Last Modified

The last time the data.json file appears to have been updated (for the selected milestone)

Data.json Last Crawl

The last time this validator analyzed the data.json file