Department of Health and Human Services

Milestone 6 - February 28th 2015

OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status complete
Reviewer Jamie Berryhill
Last Updated April 17, 2015, 1:19 pm EDT by Jamie Berryhill

Assessment Summary

EDI decreased by 16.2% this quarter.

PDL: Only 53.9% of links are working, and the number of datasets fell 33%.

Note: The doughnut charts for Inventory Composition, Public Dataset Status, and Dataset Link Quality are new for this milestone, and they only appear is data are available. For detailed documentation on these charts, please see The "Other" category includes URLs that did not provide a response within 5 seconds of the automated crawl, which may indicate a server issue and not necessarily a completely non-working link.

Inventory Composition

Public Dataset Status

Dataset Link Quality

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
953 Number of Datasets
Number of APIs
Schedule Delivered Crawl details
11 Bureaus represented
11 Programs represented
770 Number of public datasets
31 Number of restricted public datasets
152 Number of non-public datasets
Inventory > Public listing
-16.2% Percentage growth in records since last quarter
3160 Spot Check - datasets listed by search engine
Agency provides a public Enterprise Data Inventory on
License specified Crawl details
Status Indicator Automated Metrics
Overall Progress this Milestone
710 Number of Datasets Crawl details
Number of Collections Crawl details
260 Number of Public Datasets with File Downloads Crawl details
Number of APIs Crawl details
295 Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
159 Quality Check: Accessible links Crawl details
105 Quality Check: Redirected links Crawl details
21 Quality Check: Error links Crawl details
4 Quality Check: Broken links Crawl details
-33% Percentage growth in records since last quarter
100% Valid Metadata Crawl details
/data exists Crawl details
/data.json Crawl details
Harvested by
334 Views on for the quarter
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Feedback loop is closed, 2 way communication
Blog, Q&A, Link to or description of Feedback Mechanism
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
Status Indicator Automated Metrics
Overall Progress this Milestone
See below Open Data Primary Point of Contact
Damon Davis (
POCs identified for required responsibilities
Status Indicator Automated Metrics
Overall Progress this Milestone
Identified 5 data improvements for this quarter
See below Primary Uses
Some of the primary ways HHS data is being put to use are for new or improved products or services that are generating new knowledge and introducing efficiencies across the healthcare and social services continuum or improving transparency. Several initiatives seek to improve consumer engagement, education, and awareness. HHS datasets are also used in community and state epidemiological workgroups, local prevention programs and strategy evaluations, and national projects to monitor substance abuse and detect emerging drugs and drug trends. Others are contributing to better population health strategies, more comprehensive clinical capabilities, and improved stability in the healthcare system in the long term.
See below Value or impact of data
The innovations HHS data are fueling are driving improvements in the delivery of higher quality health care, reductions in health care costs, and greater transparency into the system that individuals and families require to be fully participatory in their own health. Improved products and services in the healthcare industry will lead to improved outcomes in personal and public health, more effective preventive health programs, and greater awareness of emerging issues in health care and law enforcement.
See below Primary data discovery channels
HHS's communications through the and @HHSIDEALab twitter handle are the main source of ongoing education and outreach. However, the Health Data Consortium’s Health Datapalooza is the annual event that attracts the most attention to HHS data assets as effective components of the national healthcare transformation. HHS also uses prizes and challenges to generate awareness of and derive value from HHS data.
See below User suggestions on improving data usability
Users have conveyed multiple suggestions including more timely delivery of data assets, increased availability of machine-readable data formats, and development of “linked data systems” to help identify datasets that are otherwise siloed. However, the challenge remains that opening data in machine readable formats, developing linked data systems, and improving timely delivery of data are not without costs. HHS and sister agencies are required to make more data available and more accessible without the allocation of additional budgetary resources to accomplish those tasks, creating a tension between desired outcomes and what is achievable with existing resources.
See below User suggestions on additional data releases
HHS receives a continual stream of requests for additional data resources. The many requests are primarily focused on the Centers for Medicare and Medicaid Services, Food and Drug Administration, Centers for Disease Control and Prevention, and the National Institutes of Health.
Digital Analytics Program on /data

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

Expected Data.json URL (From Directory)
Resolved Data.json URL
Number of Redirects 3 redirects
HTTP Status 200
Content Type application/json
Valid JSON Valid
Detected Data.json Schema federal-v1.1
Datasets with Valid Metadata 100%(710 of 710)
Valid Schema Valid
Datasets 710
Datasets with Distribution URLs 0.0% (0 of 710)
Total Distribution URLs 0
Public Datasets 691
Restricted Public Datasets 19
Non-public Datasets 0
Bureaus Represented 11
Programs Represented 11
File Size 1.24MB
Last modified Friday, 27-Feb-2015 21:10:31 EST
Last crawl Friday, 27-Feb-2015 23:01:06 EST
Analyze archive copies Analyze archive from 2015-02-28
/data page
Expected /data URL (From Directory)
Resolved /data URL
Redirects 3 redirects
HTTP Status 200
Content Type text/html; charset=utf-8
Last crawl Friday, 27-Feb-2015 23:01:07 EST
Expected /digitalstrategy.json URL (From Directory)
Resolved /digitalstrategy.json URL
HTTP Status 200
Content Type text/plain
Valid JSON Valid
Last modified Friday, 29-Aug-2014 07:33:21 EDT
Last crawl Friday, 27-Feb-2015 23:01:07 EST
Digital Strategy

Date specified: Tuesday, 26-Aug-2014 15:05:31 EDT

Date of digitalstrategy.json file: Friday, 29-Aug-2014 07:33:21 EDT

1.2.4 Develop Data Inventory Schedule - Summary

Summarize the Inventory Schedule

Described in the The HHS Health Data Initiative Strategy & Execution Plan.

1.2.5 Develop Data Inventory Schedule - Milestones

TitleHealth Data Initiative Strategy & Execution Plan
DescriptionThe HHS Health Data Initiative Strategy & Execution Plan is a date driven, metrics based living document that details the strategies and execution plans for the Department’s Health Data Initiative (HDI). The HDI plan describes the steps that HHS and other contributors will take to expand, enrich, and open the vast catalog of data resources in the department and across the health care and human services ecosystem. Read more at
Milestone DateOngoing
Description of how this milestone expands the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.
Description of how this milestone enriches the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.
Description of how this milestone opens the InventoryDescribed in the The HHS Health Data Initiative Strategy & Execution Plan.

1.2.6 Develop Customer Feedback Process

Describe the agency's process to engage with customers

The now three-year-old Health Data Initiative (HDI), the collective effort to release vast stores of data for innovation, has at its core a mission to help improve health, healthcare, and the delivery of human services by harnessing the power of data and fostering a culture of innovative uses of data in public and private sector institutions, communities, research groups and policy making arenas.  The HDI’s goal is to make health data openly available, disseminate the data broadly across the health and human services ecosystem, and continuously educate internal and external participants in the ecosystem about the value of the data. ( serves as the discovery resource for those data assets, as well as a platform for communications, commentary on, and feedback about the data to improve the public’s understanding of each data set. The platform helps new data users discover resources they may not otherwise know exist.  This site is a flexible platform that acts as a discovery resource for new and seasoned users across the healthcare ecosystem from researchers to tech/developers, and healthcare professionals to academia.  Any organization or individual is free to employ the data to solve problems in the transformation of our nation’s healthcare system through data driven innovations in areas like: research; technology development; healthcare delivery; academia; policy making; human services delivery.

Methods for Customer Feedback and Public Engagement

This document describes how HHS is identifying and engaging with key data customer groups like these to help expand the value of our health data assets and prioritize the release of new data. To assist that prioritization HHS intends to capitalize on the quantity and quality of user demand it receives through various feedback channels as well as focusing on the identification of strategically relevant data assets (SRDA) tied directly to HHS’s articulated strategic goals.  To ensure the customer feedback loops are meaningful and robust HHS will regularly review feedback processes and refine them as opportunities and challenges present themselves.  

Here are some of the ways the HHS HDI seeks opportunities for public engagement:
* (
Through this catalog data is available in multiple formats for maximum utilization by health care ecosystem participants.  Human readable data and machine readable data formats are accessible which are spawning and feeding key transformations across health care and the delivery of human services.  HHS is working to make broader volumes of machine readable data available. 
The “Ideas” tab ( on the site is designed to invite public to provide feedback to HHS.  An idea could be anything from the submissions of data that you’d like to see cataloged on the platform, to ways you’d like to see the site improved, or suggestions for communications about data assets and their uses.  These submissions are very informative for our data liberation strategy so send in your great ideas! The section is divided into “Most Recent” submission which is ordered by the date the idea was posted, and “Most Popular” which are ranked by the number of public votes that idea has received.   Each idea can be voted on by the public using a five (5) star rating system (one (1) is the lowest rating, five (5) is the highest).  
The “Q & A” tab ( users an opportunity to ask questions and receive answers from HDI staff about the data.  HHS is working to associate a direct point of contact individual, by name and email address, with each data set listing in the catalog.   This will allow direct interactions about the data with the experts who have cataloged it.  The tab similarly broken down by “Most recent” and “Most popular”. 
Our Blog ( offers a robust source of information about the HDI’s activities including the availability of new data, some of the creative and innovative uses of health data, and the technological advancement of healthcare and human services delivery supported by data’s broadening availability.  
The HDI staff makes every attempt to address ideas, questions and answers, and blog responses in a timely fashion.  

*Health Datapalooza! (– This perennial health data event is a favorite among entrepreneurs, innovators, policy makers, data geeks, researchers and more.  The Health Data Initiative is widely represented during this event put on by the Health Data Consortium (, a public private partnership between government, non-profit, and private sector organizations working to foster the availability and innovative use of data to improve health and health care.  HHS welcomes this opportunity to engage face to face with the many innovators that are using, or seeking to use publicly available sources to support their work and initiatives. 

*Social Media – More than ever before topical conversations are occurring through social media and the open health data movement is no exception.  
You can follow the Health Data Initiative on Twitter @HealthDataGov.  From this account you will see announcements about new opportunities, new blog posts, and information from others on Twitter that we think is important (which could also be something you post).   Be sure to follow us and remember if you have a question, just ask!

Join the health data community online using the Facebook page U.S. Department of Health and Human Services Innovations!  Twitter is great, but sometimes what we have to say needs more than 140 characters.   On our page you will find highlights coming from the HDI, but more importantly, it is an avenue in which you can engage directly with us.

The HDI is also participating in communities on LinkedIn, bringing the information directly to you in established LinkedIn Groups that have been working in the areas we care about the most. Look for us in various communities on LinkedIn.

1.2.7 Develop Data Publication Process

Describe the agency's data publication process

The ( online platform is the central data access point and communications vehicle for the HHS Health Data Initiative (HDI) offering access to, dissemination of, and bi-directional communications about HHS and other sources health data.  The HDI is the collective effort to release vast stores of health data for innovation with a mission of improving health, healthcare, and the delivery of human services by harnessing the power of data and fostering a culture of innovative uses of data in public and private sectors.  Therefore the goal for the platform is to be a highly useful, reliable platform for sharing datasets and fostering innovation. will continue to be at the forefront of HDI’s efforts to create a discovery zone for HHS and other health data.
Supporting the promotion of data to the platform are Health Data Leads, liaisons representing each division across the Department who are contributors to the execution of the Health Data Initiative Strategy and Execution Plan (  Each lead relies on a cadre of colleagues within their division to proactively discover and catalog data resources from the various research projects, surveys, contracts or other mechanisms for data generation and curation.   This document describes the data promotion process and workflow that Health Data Leads and their teams execute at HHS.

Data Promotion Process

Here, it is assumed that each division has completed its internal process for identifying data to be cataloged and is ready to begin the process for data promotion.   Therefore the first step in the workflow requires the completion of a metadata template which for catalog entry.  The template requires data elements including: 

1.	Title: Descriptive title for the data asset to be displayed on 
2.	HHS Group:  The Operating or Staff division within HHS responsible for the data in this catalogue entry.
3.	Description: A detailed description of the dataset or tool (e.g., an abstract) that help a user to determine the nature and purpose of the data. 
4.	Privacy and Information Quality Certification – Attestations that the data submission meets the agency’s standards for privacy and information quality. 
5.	Author Information and Agency – Includes fields for: 
	a.	HHS sub agency
	b.	Agency program URL
	c.	Data series or tool URL
	d.	Subject Area for the Data: Administrative, Biomedical Research, Children's Health,  Epidemiology, Health Care Cost, Health Care Providers, Medicaid, Medicare, Other, Population Statistics, Quality Measurement, Safety, Treatments
	e.	Additional Subject Area: Searchable keywords help users discover your datasets from different perspectives, providing ways of identifying other similar datasets.  Includes terms that would be used by both technical and non-technical users.
	f.	Date Released: Date when the dataset was first made available to the public. (Not the date the data was entered into as it could have already been published on the agency website prior to being cataloged)
	g.	Date Updated: Date of last change to dataset or tool. (Note that this could be the same as the date released if the data has not changed since first being published.)
	h.	Contact Information: Name and email address for the individual making the catalogue entry.
	i.	Data Collection Date and Frequency:  N/A, Annually, Daily, Monthly, Quarterly, Semi-Weekly
	j.	Data Coverage Period: Dates covered by the data
6.	Data Documentation:
	a.	Technical Documentation URL: URL for the technical documentation for this dataset. This may include description to the study design, instrumentation, implementation, limitations, and appropriate use of the dataset or tool. 
	b.	Data Dictionary URL: URL to resource containing variable names, descriptions, standard vocabularies and taxonomies, units, multipliers, etc. if different from the technical documentation URL.
	c.	Data Collection Instrument URL: URL for resource containing a copy of, or detailed descriptions of, the data collection instrument, if different from the technical documentation.
	d.	 Dataset Use Requires a License Agreement: This is a required field to ensure that license agreements are not bypassed during the one-click download interface on the website. 
	e.	 Dataset License Agreement URL: URL to the license agreement page for the dataset or tool, if there is one.
7.	Resources
	a.	Access Point URL: If the data set is downloadable, enter the URL for instant access to the downloadable data file. This is the URL for access to the data set via a "one-click download".
	b.	Media Format - In some cases files are downloaded in a compressed file (e.g. zip). 

Data Promotion Roles

The specific roles of each individual in the workflow are Author, Editor, and Approver:  The responsibilities for each role are detailed below.

Initiates the data catalog entry based on the current version of the metadata template.  Authors strive for adherence to Plain Language writing guidance to make the entry understandable for technical audiences and the general public.  When the data appears in the live online catalog it can be easily found using key words entered in the template during this step.  They also attest the submission meets agency privacy guidelines and information quality guidelines.  Authors may assign themselves as the public's point of contact (POC) for a data set, or insert the alternate contact information for a colleague who is the POC or the subject matter expert (SME) for the data or tool.  The author typically is not the Health Data Lead, but a colleague within the same division or a contractor supporting the project where the data originates.   Once completed, the catalogue entry's status is advanced to "Editor Review".

Editors, typically the Health Data Lead for their HHS division, are tasked with reviewing the initial catalog entry drafted by an author, then advanced to "Editor Review" status.  Their review consists of reviewing the metadata confirming adherence to Plain Language guidelines, adding key words, and confirming the attestations for .... . An editor may ask the author to modify or adjust a catalog entry.  Once the editor has completed their review and modifications have been made, the catalogue entry’s status is advanced to "Awaiting Approval".   

The Approver performs the final check on all elements of the catalog entry before it is made public on the platform. Approvers verify compliance with administrative procedures and with internal protocols for data promotion. Once the Approver has completed their review the catalogue entry’s status is advanced to “Approved”. 

Once approved, the catalogue appears on within minutes.  Updates to already cataloged data assets are processed through the workflow again offering the Author an opportunity to alter the metadata to accurately reflect the update, allowing the Editor to re-validate that privacy and plain language compliance before being re-approved.