Department of Veterans Affairs
Enterprise Data Inventory - Volume and composition over time
M-13-13 Milestone 10 - February 29th 2016
OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.
Leading Indicators
These indicators are reviewed by the Office of Management and Budget
| Review Status | complete |
|---|---|
| Reviewer | Justin Grimes |
| Last Updated | April 19, 2016, 5:00 pm EDT by Justin Grimes |
Assessment Summary
Department of Veterans Affairs is commended for its examples of public engagement and user feedback, which resulted in improvements to datasets and provided examples of good Interagency collaboration (on SNAP data). Provides some datasets in human-readable form on /data, but not all.
Inventory Composition
Public Dataset Status
Dataset Link Quality
| Status | Indicator | Automated Metrics | ||
|---|---|---|---|---|
| Overall Progress this Milestone | ||||
| Inventory Updated this Quarter | ||||
| 1614 | Number of Datasets | |||
| 5 | Number of APIs | |||
| 4 | Bureaus represented | |||
| 100.0% | Percentage of bureaus represented | |||
| 36 | Programs represented | |||
| 38.3% | Percentage of programs represented | |||
| 1489 | Number of public datasets | |||
| 1 | Number of restricted public datasets | |||
| 124 | Number of non-public datasets | |||
| Percentage growth in records since last quarter | ||||
| To a very great extent (>75%) | To what extent is your agency’s Enterprise Data Inventory (EDI) complete? | |||
| See below | What steps have you taken to ensure your Enterprise Data Inventory is complete | |||
| All activities are coordinated between VA's Chief Technology Officer (CTO) and an Open Data governance structure consisting of two integrated bodies: a Seniorlevel data forum known as the Data Governance Council (DGC) and an action officer/subject matter expertlevel body known as the Open Data Integrated Project Team (IPT). As a result of coordinated and repetitive messaging and communications, each Administration and Staff Office pursues the identification and collection of data assets from every subcomponent. As an example, the Veterans Health Administration (VHA) has used existing inventory systems and matched against what is released through Open Data to ensure that most data assets are included in the Enterprise Data Inventory. Many data assets within VHA are created daily for everything from monitoring performance to modeling future patient outcomes. Many employees participate in work groups such as the VHA Data Consortium and other activities in part to help identify new data assets. To date, Administrations and Staff Offices have focused on the following quarterly efforts: Data assets discovered in the twoway customer feedback process and at events such as the Open Data Roundtable facilitated by the Center for Open Data Enterprise; Analytics Powering Outcomes for Veterans; Suicide Prevention Hackathon and Precision Medicine initiatives. Contribution of data assets in all categories (public, restricted public, and nonpublic) System of Records Notices (SORNs) Data that may be given to the public via Freedom of Information Act (FOIA) requests Data collections Expanded metadata Keywords API's Machine readable Realtime delivery Worldwide domain Data quality improvements Public access to previously restricted public data Free access to previously charged for data OIT inventories (such as administrative and operational) In the future, the incremental strategy will continue in the areas highlighted above, and will also include new data assets such as: Paperwork Reduction Act submissions to Office of Information and Regulatory Affairs SORN Government Publishing Office bulk downloads Data dictionaries In addition, the Leading Indicators Strategy Rubric for the Project Open Data Dashboard guides incremental strategy enhancements. Under current consideration, technology solutions to automate the data asset collection process, improve efficiencies, and free up resource will enable enhanced efforts on data analysis and customer engagement for data discovery, data publication, and improved user experience. Over time, the enterprise data inventory will be expanded, enhanced, and enriched to reflect realtime best practices in public engagement, while maintaining high data governance standards. An agile enterprise data inventory will assist researchers, academia, and entrepreneurs, in developing applications to improve the lives of Veterans and their families. | ||||
| Agency provides a public Enterprise Data Inventory on Data.gov | ||||
| Agency provided updated Enterprise Data Inventory to OMB | ||||
| 100% | License specified | Crawl details | ||
| Number of datasets with redactions | ||||
| 0% | Percent of datasets with redactions | |||
| Status | Indicator | Automated Metrics |
|---|---|---|
| Overall Progress this Milestone | ||
| 1613 | Number of Datasets | Crawl details |
| 14 | Number of Collections | Crawl details |
| 665 | Number of datasets not contained in a collection | Crawl details |
| 1307 | Number of Public Datasets with File Downloads | Crawl details |
| 5 | Number of APIs | Crawl details |
| Number of public APIs | Crawl details | |
| Number of restricted public APIs | Crawl details | |
| Number of non-public APIs | Crawl details | |
| 1312 | Total number of access and download links | Crawl details |
| Quality Check: Links are sufficiently working | Crawl details | |
| 1226 | Quality Check: Accessible links | Crawl details |
| 76 | Quality Check: Redirected links | Crawl details |
| Quality Check: Error links | Crawl details | |
| 4 | Quality Check: Broken links | Crawl details |
| 856 | Quality Check: Percentage of download links in correct format as specified in metadata | Crawl details |
| 227 | Quality Check: Percentage of download links in HTML | Crawl details |
| 317 | Quality Check: Percentage of download links in PDF | Crawl details |
| Percentage growth in records since last quarter | ||
| 100% | Valid Metadata | Crawl details |
| /data exists | Crawl details | |
| Provides datasets in human-readable form on /data | ||
| /data.json | Crawl details | |
| Harvested by data.gov | ||
| 1488 | Number of public datasets | Crawl details |
| 1 | Number of restricted public datasets | Crawl details |
| 124 | Number of non-public datasets | Crawl details |
| Percent growth of public datasets | ||
| Percent growth of restricted public datasets | ||
| Percent growth of non-public datasets | ||
| Percent datasets licensed as U.S. Public Domain | ||
| Percent datasets licensed as Creative Commons Zero | ||
| Percent datasets with other licenses | ||
| Percent datasets with no license |
| Status | Indicator | Automated Metrics | ||
|---|---|---|---|---|
| Overall Progress this Milestone | ||||
| Description of feedback mechanism delivered | Crawl details | |||
| Data release is prioritized through public engagement | ||||
| Provided narrative evidence of data improvements based on public feedback this quarter | ||||
| Feedback loop is closed, 2 way communication | ||||
| See below | Link to or description of Feedback Mechanism | |||
| https://github.com/department-of-veterans-affairs/va-data/issues | ||||
| Provides valid contact point information for all datasets | ||||
| Status | Indicator | Automated Metrics | ||
|---|---|---|---|---|
| Overall Progress this Milestone | ||||
| Data Publication Process Delivered | Crawl details | |||
| Information that should not to be made public is documented with agency's OGC | ||||
| See below | Describe the agency's data publication process | |||
| Department of Veterans Affairs Data Publication Process (OMB MAX/Open Data Requirement #5) The Data Publication Process is a process to determine whether data assets have a valid restriction to not release. Data is identified by customer feedback and internal data custodians. In each program office, the data asset is submitted to leadership and a Privacy Officer for review to insure that sensitive data, Protected Health Information, and Personally Identifiable Information protected by applicable privacy and confidentiality law or regulation is not released in violation of law. Questions regarding restrictions on release may be reviewed by the Office of General Counsel and elevated to the Open Data IPT if necessary. Data assets are then reviewed by the program office Open Data Point of Contact, certified as releasable, and published. Any data asset that contains information protected by applicable privacy and confidentiality law or regulation will not be released as Open Data without effective de-identification. When considering whether information is adequately de-identified for release, an assessment of risk for possible re-identification must be made. Record level data will not be released if the assessment finds a risk of re-identification. For example, health care records that have been de-identified pursuant to the Health Insurance Portability and Accountability Act (HIPAA) safe harbor criteria, but which in combination with other available information and technology could foreseeably be re-identified (i.e., the mosaic effect) will not be released. Proposed changes to narratives will be reviewed quarterly and approved by an integrated data council. | ||||
Automated Metrics
These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot
data.json
| Expected Data.json URL | http://www.va.gov/data.json (From USA.gov Directory) |
|---|---|
| Resolved Data.json URL | http://www.va.gov/data.json |
| Number of Redirects | |
| HTTP Status | 200 |
| Content Type | application/json |
| Valid JSON | Valid |
| Detected Data.json Schema | federal-v1.1 |
| Datasets with Valid Metadata | 100%(1613 of 1613) |
| Valid Schema | Valid |
| Datasets | 1613 |
| Number of Collections | 14 |
| Number of datasets not in a collection | 665 |
| Datasets with Distribution URLs | 81.0% (1307 of 1613) |
| Datasets with Download URLs | 79.4% (1280 of 1613) |
| Total Distribution URLs | 1312 |
| Total Download URLs | 1284 |
| Total APIs | 5 |
| Public Datasets | 1488 |
| Restricted Public Datasets | 1 |
| Non-public Datasets | 124 |
| Normally there would be a set of quality assurance fields here to verify that the download links included within the metadata are functioning properly, but the results of those tests are not currently available. | |
| Bureaus Represented | 4 |
| Programs Represented | 36 |
| License Specified | 100% (1613 of 1613) |
| Datasets with Redactions | 0.0% (0 of 1613) |
| Redactions without explanation (rights field) | 0.0% (0 of 1613) |
| File Size | 2.47MB |
| Last modified | Wednesday, 24-Feb-2016 18:07:18 EST |
| Last crawl | Monday, 29-Feb-2016 23:00:15 EST |
| Analyze archive copies | Analyze archive from 2016-02-29 |
| Nearby Daily Crawls | |
| Expected /data URL | http://www.va.gov/data (From USA.gov Directory) |
|---|---|
| Resolved /data URL | http://www.va.gov/data/ |
| Redirects | 1 redirects |
| HTTP Status | 200 |
| Content Type | text/html |
| Last crawl | Monday, 29-Feb-2016 23:00:02 EST |
| Expected /digitalstrategy.json URL | http://www.va.gov/digitalstrategy.json (From USA.gov Directory) |
|---|---|
| Resolved /digitalstrategy.json URL | http://www.va.gov/digitalstrategy.json |
| Redirects | |
| HTTP Status | 200 |
| Content Type | application/json |
| Valid JSON | Valid |
| Last modified | Wednesday, 24-Feb-2016 18:07:19 EST |
| Last crawl | Monday, 29-Feb-2016 23:00:02 EST |