Department of Health and Human Services

M-13-13 Milestone 16 - August 31st 2017

OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status complete
Reviewer Rebecca Williams
Last Updated December 10, 2017, 9:11 pm EST by Rebecca Williams

Assessment Summary

EDI: There are no non-public datasets, but there has been continued growth. Non-public datasets must be added. PDL: URLs are mostly working; format and license improvements should be initiated.

Inventory Composition

Public Dataset Status

Dataset Link Quality

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
1982 Number of Datasets
Number of APIs
9 Bureaus represented
Percentage of bureaus represented
Programs represented
26 Percentage of programs represented
1927 Number of public datasets
55 Number of restricted public datasets
Number of non-public datasets
Percentage growth in records since last quarter
To what extent is your agency’s Enterprise Data Inventory (EDI) complete?
What steps have you taken to ensure your Enterprise Data Inventory is complete
Agency provides a public Enterprise Data Inventory on
Agency provided updated Enterprise Data Inventory to OMB
92.0% License specified Crawl details
Number of datasets with redactions
Percent of datasets with redactions
Status Indicator Automated Metrics
Overall Progress this Milestone
1982 Number of Datasets Crawl details
Number of Collections Crawl details
1982 Number of datasets not contained in a collection Crawl details
1486 Number of Public Datasets with File Downloads Crawl details
Number of APIs Crawl details
Number of public APIs Crawl details
Number of restricted public APIs Crawl details
Number of non-public APIs Crawl details
3322 Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
2031 Quality Check: Accessible links Crawl details
1126 Quality Check: Redirected links Crawl details
3 Quality Check: Error links Crawl details
131 Quality Check: Broken links Crawl details
0.4% Quality Check: Percentage of download links in correct format as specified in metadata Crawl details
10.6% Quality Check: Percentage of download links in HTML Crawl details
0.2% Quality Check: Percentage of download links in PDF Crawl details
Percentage growth in records since last quarter
91.5% Valid Metadata Crawl details
/data exists Crawl details
Provides datasets in human-readable form on /data
/data.json Crawl details
Harvested by
1927 Number of public datasets Crawl details
55 Number of restricted public datasets Crawl details
Number of non-public datasets Crawl details
Percent growth of public datasets
Percent growth of restricted public datasets
Percent growth of non-public datasets
Percent datasets licensed as U.S. Public Domain
Percent datasets licensed as Creative Commons Zero
Percent datasets with other licenses
Percent datasets with no license
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Provided narrative evidence of data improvements based on public feedback this quarter
Feedback loop is closed, 2 way communication
Link to or description of Feedback Mechanism
Provides valid contact point information for all datasets
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
Describe the agency's data publication process
Status Indicator Automated Metrics
Overall Progress this Milestone
Open Data Primary Point of Contact
POCs identified for required responsibilities
Chief Data Officer (if applicable)
Status Indicator Automated Metrics
Overall Progress this Milestone
Provided narrative evidence of open data impacts for this quarter
Digital Analytics Program on /data
Views on for this quarter
Percentage growth in views on for this quarter
Views on agency /data page for this quarter

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

Expected Data.json URL (From Directory)
Resolved Data.json URL
Number of Redirects 4 redirects
HTTP Status 200
Content Type application/json
Valid JSON Valid
Detected Data.json Schema federal-v1.1
Datasets with Valid Metadata 91.5%(1814 of 1982)
Valid Schema Invalid
For more complete and readable validation results, see the full schema validator results
Schema Errors There are validation errors on 168 records

Only showing errors from the first 10 records:

Errors on record 1:
Errors on record 35:
Errors on record 36:
Errors on record 37:
Errors on record 38:
Errors on record 84:
Errors on record 508:
Errors on record 1060:
  • The property bureauCode is required
Errors on record 1105:
  • The property bureauCode is required
Errors on record 1135:
  • The property bureauCode is required
Datasets 1982
Number of Collections 0
Number of datasets not in a collection 1982
Datasets with Distribution URLs 75.0% (1486 of 1982)
Datasets with Download URLs 75.0% (1486 of 1982)
Total Distribution URLs 3322 (but only 2031 accessible)
Total Download URLs 3322
Total APIs 0
Public APIs 0
Restricted Public APIs 0
Non-public APIs 0
Public Datasets 1927
Restricted Public Datasets 55
Non-public Datasets 0
Server Not Found 0.9% (31 of 3322)
Working links (HTTP 2xx)
Broken links (HTTP 4xx)
Error Links (HTTP 5xx)
Redirected Links (HTTP 3xx)
Correct format
PDF for raw data
HTML for raw data
Bureaus Represented 9
Programs Represented 26
License Specified 92.0% (1824 of 1982)
Datasets with Redactions 0.0% (0 of 1982)
Redactions without explanation (rights field) 0.0% (0 of 1982)
File Size 3.14MB
Last modified Thursday, 31-Aug-2017 00:36:36 EDT
Last crawl Thursday, 31-Aug-2017 01:11:18 EDT
Analyze archive copies Analyze archive from 2017-08-31
Nearby Daily Crawls
/data page
Expected /data URL (From Directory)
Resolved /data URL
Redirects 2 redirects
HTTP Status 200
Content Type text/html; charset=utf-8
Last modified Thursday, 31-Aug-2017 00:35:11 EDT
Last crawl Thursday, 31-Aug-2017 00:36:36 EDT
Expected /digitalstrategy.json URL (From Directory)
Resolved /digitalstrategy.json URL
Redirects 1 redirects
HTTP Status 200
Content Type application/json
Valid JSON Valid
Last modified Monday, 07-Nov-2016 01:35:55 EST
Last crawl Thursday, 31-Aug-2017 00:36:36 EDT
Digital Strategy

Date specified: Tuesday, 26-Aug-2014 15:05:31 EDT

Date of digitalstrategy.json file: Monday, 07-Nov-2016 01:35:55 EST

1.2.4 Develop Data Inventory Schedule - Summary

Summarize the Inventory Schedule

Described in the The HHS Health Data Initiative Strategy & Execution Plan.

1.2.5 Develop Data Inventory Schedule - Milestones

TitleHealth Data Initiative Strategy & Execution Plan
DescriptionThe HHS Health Data Initiative Strategy & Execution Plan is a date driven, metrics based living document that details the strategies and execution plans for the Department’s Health Data Initiative (HDI). The HDI plan describes the steps that HHS and other contributors will take to expand, enrich, and open the vast catalog of data resources in the department and across the health care and human services ecosystem. Read more at
Milestone DateOngoing
Description of how this milestone expands the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.
Description of how this milestone enriches the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.
Description of how this milestone opens the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.

1.2.6 Develop Customer Feedback Process

Describe the agency's process to engage with customers

The now three-year-old Health Data Initiative (HDI), the collective effort to release vast stores of data for innovation, has at its core a mission to help improve health, healthcare, and the delivery of human services by harnessing the power of data and fostering a culture of innovative uses of data in public and private sector institutions, communities, research groups and policy making arenas.  The HDI’s goal is to make health data openly available, disseminate the data broadly across the health and human services ecosystem, and continuously educate internal and external participants in the ecosystem about the value of the data. ( serves as the discovery resource for those data assets, as well as a platform for communications, commentary on, and feedback about the data to improve the public’s understanding of each data set. The platform helps new data users discover resources they may not otherwise know exist.  This site is a flexible platform that acts as a discovery resource for new and seasoned users across the healthcare ecosystem from researchers to tech/developers, and healthcare professionals to academia.  Any organization or individual is free to employ the data to solve problems in the transformation of our nation’s healthcare system through data driven innovations in areas like: research; technology development; healthcare delivery; academia; policy making; human services delivery.

Methods for Customer Feedback and Public Engagement

This document describes how HHS is identifying and engaging with key data customer groups like these to help expand the value of our health data assets and prioritize the release of new data. To assist that prioritization HHS intends to capitalize on the quantity and quality of user demand it receives through various feedback channels as well as focusing on the identification of strategically relevant data assets (SRDA) tied directly to HHS’s articulated strategic goals.  To ensure the customer feedback loops are meaningful and robust HHS will regularly review feedback processes and refine them as opportunities and challenges present themselves.  

Here are some of the ways the HHS HDI seeks opportunities for public engagement:
* (
Through this catalog data is available in multiple formats for maximum utilization by health care ecosystem participants.  Human readable data and machine readable data formats are accessible which are spawning and feeding key transformations across health care and the delivery of human services.  HHS is working to make broader volumes of machine readable data available. 
The “Ideas” tab ( on the site is designed to invite public to provide feedback to HHS.  An idea could be anything from the submissions of data that you’d like to see cataloged on the platform, to ways you’d like to see the site improved, or suggestions for communications about data assets and their uses.  These submissions are very informative for our data liberation strategy so send in your great ideas! The section is divided into “Most Recent” submission which is ordered by the date the idea was posted, and “Most Popular” which are ranked by the number of public votes that idea has received.   Each idea can be voted on by the public using a five (5) star rating system (one (1) is the lowest rating, five (5) is the highest).  
The “Q & A” tab ( users an opportunity to ask questions and receive answers from HDI staff about the data.  HHS is working to associate a direct point of contact individual, by name and email address, with each data set listing in the catalog.   This will allow direct interactions about the data with the experts who have cataloged it.  The tab similarly broken down by “Most recent” and “Most popular”. 
Our Blog ( offers a robust source of information about the HDI’s activities including the availability of new data, some of the creative and innovative uses of health data, and the technological advancement of healthcare and human services delivery supported by data’s broadening availability.  
The HDI staff makes every attempt to address ideas, questions and answers, and blog responses in a timely fashion.  

*Health Datapalooza! (– This perennial health data event is a favorite among entrepreneurs, innovators, policy makers, data geeks, researchers and more.  The Health Data Initiative is widely represented during this event put on by the Health Data Consortium (, a public private partnership between government, non-profit, and private sector organizations working to foster the availability and innovative use of data to improve health and health care.  HHS welcomes this opportunity to engage face to face with the many innovators that are using, or seeking to use publicly available sources to support their work and initiatives. 

*Social Media – More than ever before topical conversations are occurring through social media and the open health data movement is no exception.  
You can follow the Health Data Initiative on Twitter @HealthDataGov.  From this account you will see announcements about new opportunities, new blog posts, and information from others on Twitter that we think is important (which could also be something you post).   Be sure to follow us and remember if you have a question, just ask!

Join the health data community online using the Facebook page U.S. Department of Health and Human Services Innovations!  Twitter is great, but sometimes what we have to say needs more than 140 characters.   On our page you will find highlights coming from the HDI, but more importantly, it is an avenue in which you can engage directly with us.

The HDI is also participating in communities on LinkedIn, bringing the information directly to you in established LinkedIn Groups that have been working in the areas we care about the most. Look for us in various communities on LinkedIn.

1.2.7 Develop Data Publication Process

Describe the agency's data publication process

The ( online platform is the central data access point and communications vehicle for the HHS Health Data Initiative (HDI) offering access to, dissemination of, and bi-directional communications about HHS and other sources health data.  The HDI is the collective effort to release vast stores of health data for innovation with a mission of improving health, healthcare, and the delivery of human services by harnessing the power of data and fostering a culture of innovative uses of data in public and private sectors.  Therefore the goal for the platform is to be a highly useful, reliable platform for sharing datasets and fostering innovation. will continue to be at the forefront of HDI’s efforts to create a discovery zone for HHS and other health data.
Supporting the promotion of data to the platform are Health Data Leads, liaisons representing each division across the Department who are contributors to the execution of the Health Data Initiative Strategy and Execution Plan (  Each lead relies on a cadre of colleagues within their division to proactively discover and catalog data resources from the various research projects, surveys, contracts or other mechanisms for data generation and curation.   This document describes the data promotion process and workflow that Health Data Leads and their teams execute at HHS.

Data Promotion Process

Here, it is assumed that each division has completed its internal process for identifying data to be cataloged and is ready to begin the process for data promotion.   Therefore the first step in the workflow requires the completion of a metadata template which for catalog entry.  The template requires data elements including: 

1.	Title: Descriptive title for the data asset to be displayed on 
2.	HHS Group:  The Operating or Staff division within HHS responsible for the data in this catalogue entry.
3.	Description: A detailed description of the dataset or tool (e.g., an abstract) that help a user to determine the nature and purpose of the data. 
4.	Privacy and Information Quality Certification – Attestations that the data submission meets the agency’s standards for privacy and information quality. 
5.	Author Information and Agency – Includes fields for: 
	a.	HHS sub agency
	b.	Agency program URL
	c.	Data series or tool URL
	d.	Subject Area for the Data: Administrative, Biomedical Research, Children's Health,  Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments
	e.	Additional Subject Area: Searchable keywords help users discover your datasets from different perspectives, providing ways of identifying other similar datasets.  Includes terms that would be used by both technical and non-technical users.
	f.	Date Released: Date when the dataset was first made available to the public. (Not the date the data was entered into as it could have already been published on the agency website prior to being cataloged)
	g.	Date Updated: Date of last change to dataset or tool. (Note that this could be the same as the date released if the data has not changed since first being published.)
	h.	Contact Information: Name and email address for the individual making the catalogue entry.
	i.	Data Collection Date and Frequency:  N/A, Annually, Daily, Monthly, Quarterly, Semi-Weekly
	j.	Data Coverage Period: Dates covered by the data
6.	Data Documentation:
	a.	Technical Documentation URL: URL for the technical documentation for this dataset. This may include description to the study design, instrumentation, implementation, limitations, and appropriate use of the dataset or tool. 
	b.	Data Dictionary URL: URL to resource containing variable names, descriptions, standard vocabularies and taxonomies, units, multipliers, etc. if different from the technical documentation URL.
	c.	Data Collection Instrument URL: URL for resource containing a copy of, or detailed descriptions of, the data collection instrument, if different from the technical documentation.
	d.	 Dataset Use Requires a License Agreement: This is a required field to ensure that license agreements are not bypassed during the one-click download interface on the website. 
	e.	 Dataset License Agreement URL: URL to the license agreement page for the dataset or tool, if there is one.
7.	Resources
	a.	Access Point URL: If the data set is downloadable, enter the URL for instant access to the downloadable data file. This is the URL for access to the data set via a "one-click download".
	b.	Media Format - In some cases files are downloaded in a compressed file (e.g. zip). 

Data Promotion Roles

The specific roles of each individual in the workflow are Author, Editor, and Approver:  The responsibilities for each role are detailed below.

Initiates the data catalog entry based on the current version of the metadata template.  Authors strive for adherence to Plain Language writing guidance to make the entry understandable for technical audiences and the general public.  When the data appears in the live online catalog it can be easily found using key words entered in the template during this step.  They also attest the submission meets agency privacy guidelines and information quality guidelines.  Authors may assign themselves as the public's point of contact (POC) for a data set, or insert the alternate contact information for a colleague who is the POC or the subject matter expert (SME) for the data or tool.  The author typically is not the Health Data Lead, but a colleague within the same division or a contractor supporting the project where the data originates.   Once completed, the catalogue entry's status is advanced to "Editor Review".

Editors, typically the Health Data Lead for their HHS division, are tasked with reviewing the initial catalog entry drafted by an author, then advanced to "Editor Review" status.  Their review consists of reviewing the metadata confirming adherence to Plain Language guidelines, adding key words, and confirming the attestations for .... . An editor may ask the author to modify or adjust a catalog entry.  Once the editor has completed their review and modifications have been made, the catalogue entry’s status is advanced to "Awaiting Approval".   

The Approver performs the final check on all elements of the catalog entry before it is made public on the platform. Approvers verify compliance with administrative procedures and with internal protocols for data promotion. Once the Approver has completed their review the catalogue entry’s status is advanced to “Approved”. 

Once approved, the catalogue appears on within minutes.  Updates to already cataloged data assets are processed through the workflow again offering the Author an opportunity to alter the metadata to accurately reflect the update, allowing the Editor to re-validate that privacy and plain language compliance before being re-approved.