Environmental Protection Agency

http://www.epa.gov/

Milestone 10 - February 29th 2016

OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status complete
Reviewer Justin Grimes
Last Updated April 18, 2016, 1:36 pm EDT by Justin Grimes

Assessment Summary

EPA is to be commended for excellent examples of engagement with its data users on its public feedback site.Insufficient link quality (20% or more broken and error links this quarter)

Inventory Composition

Public Dataset Status

Dataset Link Quality

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
3184 Number of Datasets
2344 Number of APIs
1 Bureaus represented
100.0% Percentage of bureaus represented
8 Programs represented
6.6% Percentage of programs represented
3132 Number of public datasets
11 Number of restricted public datasets
41 Number of non-public datasets
Percentage growth in records since last quarter
To a great extent (50-75%) To what extent is your agency’s Enterprise Data Inventory (EDI) complete?
See below What steps have you taken to ensure your Enterprise Data Inventory is complete
In March 2015, the EPA CIO signed the EPA Enterprise Information Management Policy (EIMP), the EIMP Cataloguing Data Resources Procedure, and the EIMP Minimum Metadata Standards. The EIMP requires all EPA Organization officials, employees, and individuals or non EPA organizations, if applicable to ensure information is: The EIMP Cataloguing Data Resources Procedure states: The Agency's internal metadata catalog, the Environmental Dataset Gateway (EDG) was established in 2006. The EDG Team has worked with data owners since its inception to catalog Agency datasets. In response to Project Open Data requirements the EPA CIO issued an Agencywide data call asking all EPA organizations to register their datasets in EDG. The EDG Team worked with the Agency's Information Management Officers (IMOs) and the EDG's Stewardship Network and other key data owners to ensure that as many Agency data sets as possible were identified for registration. In addition, EPA's registry for IT systems, READ, was reviewed to ensure that all possible data owners were contacted. Since that time, the EDG team has established an ongoing relationship with the IMOs and has increased its network of stakeholders to ensure that any datasets not identified during the 2013 data call are registered in EDG. Quarterly meetings and training sessions held with these groups to educate them on Open Data requirements and metadata best practices as well as to encourage them to continue cataloguing their datasets. Targeted outreach, based on new entries in READ are conducted to ensure that all datasets are listed in the EDI. This includes working with Offices that have Confidential Business Information to ensure that we have a full registration of all data not shared with the public. In addition, EPA's Security Office is planning an audit of all Agency data systems and is coordinating with the EDG team to ensure that any uncatalogued datasets discovered in this process are registered in EDG and become part of EPA's EDI. And finally, EPA is also developing an
Agency provides a public Enterprise Data Inventory on Data.gov
Agency provided updated Enterprise Data Inventory to OMB
100% License specified Crawl details
Number of datasets with redactions
100% Percent of datasets with redactions
Status Indicator Automated Metrics
Overall Progress this Milestone
3184 Number of Datasets Crawl details
21 Number of Collections Crawl details
1722 Number of datasets not contained in a collection Crawl details
2531 Number of Public Datasets with File Downloads Crawl details
2344 Number of APIs Crawl details
Number of public APIs Crawl details
Number of restricted public APIs Crawl details
Number of non-public APIs Crawl details
2691 Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
1078 Quality Check: Accessible links Crawl details
625 Quality Check: Redirected links Crawl details
6 Quality Check: Error links Crawl details
848 Quality Check: Broken links Crawl details
6.8% Quality Check: Percentage of download links in correct format as specified in metadata Crawl details
60.0% Quality Check: Percentage of download links in HTML Crawl details
0.6% Quality Check: Percentage of download links in PDF Crawl details
Percentage growth in records since last quarter
100% Valid Metadata Crawl details
/data exists Crawl details
Provides datasets in human-readable form on /data
/data.json Crawl details
Harvested by data.gov
3132 Number of public datasets Crawl details
11 Number of restricted public datasets Crawl details
41 Number of non-public datasets Crawl details
Percent growth of public datasets
Percent growth of restricted public datasets
Percent growth of non-public datasets
Percent datasets licensed as U.S. Public Domain
Percent datasets licensed as Creative Commons Zero
Percent datasets with other licenses
Percent datasets with no license
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Provided narrative evidence of data improvements based on public feedback this quarter
Feedback loop is closed, 2 way communication
See below Link to or description of Feedback Mechanism
https://developer.epa.gov/forums/forum/dataset-qa/
Provides valid contact point information for all datasets
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
See below Describe the agency's data publication process
EPA’s data registration, evaluation, and publication processes are being developed as part of the Agency’s Environmental Information Management Policy (EIMP). The EIMP is currently undergoing Agency-wide review before being finalized. The EIMP includes a data asset registration procedure that is scheduled to be adopted with the policy. Additional procedures, such as defining the details of the data publication process will follow. The registration and classification of data assets will be addressed by the following: • Issue the EIMP, Cataloging EPA Data Resources Procedure and associated Standard Operating Procedures (SOPs) which require the registration of data assets. Create comprehensive processes to fill metadata gaps for existing records and ensure compliance with the registration of non-listed assets. • Modify the EDG to include an initial “data sensitivity” evaluation during the registration of an asset noting a determination of a range of data sensitivity categories such as: - Controlled Unclassified Information (CUI) - Personally Identifiable Information (PII) - Confidential Business Information (CBI) - Information with National Security sensitivities • EPA currently conducts reviews to evaluate the appropriate release of information to the public, however, to address the anticipated increase in demand for information the Agency is developing a more formal process to document sensitivity determinations and help set expectations when determinations will be completed. EPA’s Office of General Counsel (OGC) is frequently involved in data release determinations and will be a part of the more formal process--making final determinations on data that is deemed too sensitive for disclosure.

Best Practice: Environmental Protection Agency has been highlighted for demonstrating a best practice on the Human Capital indicator

Status Indicator Automated Metrics
Overall Progress this Milestone
greene.ana@epa.gov Open Data Primary Point of Contact
POCs identified for required responsibilities
See below Chief Data Officer (if applicable)
thottungal.robin@epa.gov
Status Indicator Automated Metrics
Overall Progress this Milestone
Provided narrative evidence of open data impacts for this quarter
Digital Analytics Program on /data
2154 Views on data.gov for this quarter
378.7% Percentage growth in views on data.gov for this quarter
Views on agency /data page for this quarter

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

data.json
Expected Data.json URL http://www.epa.gov/data.json (From USA.gov Directory)
Resolved Data.json URL https://edg.epa.gov/data.json
Number of Redirects 2 redirects
HTTP Status 200
Content Type application/json
Valid JSON Valid
Datasets with Valid Metadata 100%(3184 of 3184)
Valid Schema Valid
Datasets 3184
Number of Collections 0
Datasets with Distribution URLs 79.5% (2531 of 3184)
Datasets with Download URLs 78.5% (2498 of 3184)
Total Distribution URLs 4842
Total Download URLs 2498
Total APIs 2344
Public Datasets 3132
Restricted Public Datasets 11
Non-public Datasets 41
Bureaus Represented 1
Programs Represented 5
License Specified 100% (3184 of 3184)
Datasets with Redactions 0.0% (0 of 3184)
Redactions without explanation (rights field) 0.0% (0 of 3184)
File Size 6.57MB
Last modified Wednesday, 25-Nov-2015 02:11:26 EST
Last crawl Monday, 01-Feb-2016 11:09:49 EST
Analyze archive copies Analyze archive from 2016-02-29
Nearby Daily Crawls
/data page
Expected /data URL http://www.epa.gov/data (From USA.gov Directory)
Resolved /data URL http://developer.epa.gov/category/data/
Redirects 2 redirects
HTTP Status 200
Content Type text/html; charset=UTF-8
Last crawl Monday, 01-Feb-2016 11:09:03 EST
/digitalstrategy.json
Expected /digitalstrategy.json URL http://www.epa.gov/digitalstrategy.json (From USA.gov Directory)
Resolved /digitalstrategy.json URL http://www.epa.gov/sites/production/files/2015-05/digitalstrategy.json
Redirects 1 redirects
HTTP Status 200
Content Type text/plain
Valid JSON Valid
Last modified Friday, 29-May-2015 18:15:34 EDT
Last crawl Monday, 01-Feb-2016 11:09:04 EST
Digital Strategy

Date specified: Friday, 31-May-2019 09:16:54 EDT

Date of digitalstrategy.json file: Friday, 29-May-2015 18:15:34 EDT

1.2.4 Develop Data Inventory Schedule - Summary

Summarize the Inventory Schedule


In 2005 the EPA developed a metadata catalog for Agency datasets known as the Environmental Dataset Gateway (EDG). In response to Project Open Data requirements the EPA CIO issued an Agency-wide data call in Fall of 2013 asking all EPA organizations to register their datasets in EDG. Since that time the Agency has been committed to increasing the percentage of EPA data cataloged within the EDG. The Office of Environmental Information works with the Agency's Information Management Officers on an ongoing basis to encourage them to ensure that data within their program offices are registered in the EDG and has increased its network of stakeholders to ensure that any datasets not identified during the 2013 data call are registered in EDG.

1.2.5 Develop Data Inventory Schedule - Milestones

TitleProvide reports to programs to identify metadata holes.
DescriptionEPA uses its registry of IT Systems, READ, to identify datasets that may not be cataloged in the EDG. The EDG team works with the data stewardship network to identify and catalog the missing datasets.
Milestone DateOngoing
Description of how this milestone expands the InventoryThis milestone allows EPA to identify and catalog new datasets that were not identified in the original 2013 data call within the Agency's Environmental Dataset Gateway, the metadata catalog which is used to create the Enterprise Data Inventory.
Description of how this milestone enriches the InventoryThis milestone brings new and important datasets into EPA's Enterprise Data Inventory.
Description of how this milestone opens the InventoryThis milestones provides additional datasets that are publicly available through both the EDG and data.gov.

1.2.6 Develop Customer Feedback Process

Describe the agency's process to engage with customers


EPA interacts with the public on its data in numerous ways including public meetings/forums, feedback buttons on websites, webinars, mailboxes, FOIA Online and help desks. Recently the Agency launched an online public data forum to communicate with the public about its data in a fully transparent manner: http://developer.epa.gov/forums/forum/dataset-qa/. This forum enables two-way, transparent feedback between the Agency and the public. It can be accessed through three different webpages: the Environmental Dataset Gateway (EDG) webpage, the Developer Central webpage, and EPA's Digital Strategy webpage.  The forum shows both the public user's question and the Agency's answer and categorizes questions to increase discoverability about specific topics. Development work is underway to embed the forum into the Agency's metadata stylesheet. This would allow people to ask questions about a specific dataset directly from a metadata record and have that question routed to the metadata owner for response.
 In addition to the Forum, EPA has developed an error correction tool which allows the public to report errors that they find relating to data, especially in relation to the geographic locations of data points. The errors are routed through a system that returns feedback to the person who has reported the error.

1.2.7 Develop Data Publication Process

Describe the agency's data publication process


EPA has a number of policies and procedures concerning the publication of Agency data.  The Enterprise Information Management Policy requires all EPA Organization officials, employees, and individuals or non-EPA organizations, if applicable, to ensure information is cataloged and or labeled with metadata.  This includes geographic references, as appropriate, in EPA and Federal-wide registries, repositories or other information systems.  The EPA GeoPlatform Publishing Workflow Standard Operating Procedure and the EPA Environmental Dataset Gateway (EDG) Governance Structure and Standard Operating Procedure outline the details of EPA data publishing.
 EPA provides a range of tools and registry content (e.g. Reusable Component Services, Environmental Dataset Gateway, and Data Element Registry) through its System of Registries located at: www.epa.gov/sor. EPA is continuing efforts to document APIs through the development of an Agency-wide API Strategy.  The proposed strategy is based on 18F's API standards.  The proposal encourages the use of api.data.gov's API management platform. In addition, APIs produced by the EPA should be described using one of the common API definition formats (such as Swagger, API Blueprint and RAML).  The strategy is being finalized and an Agency-wide communication plan is being developed.  This communication plan will include Standard Operating Procedures (SOPs) that require API developers to register dataset APIs in the EPA's Environmental Dataset Gateway, which will allow these APIs to become part of the EPA's EDI/PDL.  All other APIs will be registered in EPA's Reusable Components Services (RCS).