Environmental Protection Agency

http://www.epa.gov/

Milestone 8 - August 31st 2015

OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status complete
Reviewer Rebecca Williams
Last Updated November 3, 2015, 6:41 pm EST by Rebecca Williams

Assessment Summary

EPA should be commended for documenting the Public Domain/licensing information for 99.9% of their datasets and identifying 5 dataset improvements this quarter.

Action Items: 1. This quarter EPA has seen a reduction in EDI/PDL datasets. While reductions in dataset count may be due to consolidating records, EPA must prioritize expanding their EDI/PDL every quarter until it is complete. EPA is also showing errors in their metadata documentation (marking all datasets as APIs, showing a discrepancy between their EDI and PDL) and should improve the quality of their broken and redirected dataset URLs. 2. EPA must include their EDI as a dataset in their data.json. 3. EPA must prioritize creating and maintaining a public two-way feedback mechanism on their /data page.

Inventory Composition

Public Dataset Status

Dataset Link Quality

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
1000 Number of Datasets
926 Number of APIs
Schedule Delivered Crawl details
1 Bureaus represented
1 Programs represented
971 Number of public datasets
4 Number of restricted public datasets
25 Number of non-public datasets
Inventory > Public listing
-63%? Percentage growth in records since last quarter
6,820 Spot Check - datasets listed by search engine
Agency provides a public Enterprise Data Inventory on Data.gov
99.9% License specified Crawl details
Status Indicator Automated Metrics
Overall Progress this Milestone
2532 Number of Datasets Crawl details
Number of Collections Crawl details
2420 Number of Public Datasets with File Downloads Crawl details
2419 Number of APIs Crawl details
4839 Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
1402 Quality Check: Accessible links Crawl details
3086 Quality Check: Redirected links Crawl details
9 Quality Check: Error links Crawl details
112 Quality Check: Broken links Crawl details
-6% Percentage growth in records since last quarter
2530 Valid Metadata Crawl details
/data exists Crawl details
/data.json Crawl details
Harvested by data.gov
1769 Views on data.gov for the quarter
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Feedback loop is closed, 2 way communication
See below Link to or description of Feedback Mechanism
http://blog.epa.gov/data/2013/11/what-epa-data-do-you-need-most/
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
Status Indicator Automated Metrics
Overall Progress this Milestone
greene.ana@epa.gov Open Data Primary Point of Contact
POCs identified for required responsibilities
Status Indicator Automated Metrics
Overall Progress this Milestone
Identified 5 data improvements for this quarter
See below Primary Uses
•Agency data is used on various third party analyses especially using web based and mobile applications. Much of EPA’s inventory and source data are used to identify regulated facilities or other sources that are near users’ property, schools, hospitals, and any area of interest. It is also used to examine and understand pollution sources and their impacts on the environment For data sets like the EPA FRS Power Plant Map Service which combines data form several agencies users can identify which power plants are burning natural gas, coal and petroleum or Identify which power plants are Solar, Wind, Geothermal or Nuclear Some examples of recently updated/ developed third-party applications that use EPA data include: • H20safe which uses information from EPA’s Safe Drinking Water Information System (SDWIS) to inform citizens of water contamination in their area and act on it by contacting their representatives. [http://developer.epa.gov/h2osafe/] • Mobile Access to Pesticides and Labels (MAPL) - Designed for the general public and applicators who need to access pesticide labels on mobile devices, this app complements EPA’s computer-based Pesticide Product Label System. [http://developer.epa.gov/mobile-access-to-pesticides-and-labels-mapl/] • Pesticide Education & Search Tool (PEST) - Designed for the general public as they search for pest control solutions, this app brings together product search functions and new pest control information in an easy-to-understand format. [http://developer.epa.gov/pesticide-education-search-tool-pest/]
Value or impact of data
See below Primary data discovery channels
EPA regularly participates in non-agency sponsored hackathon events where users can learn more about our data and use our data to build applications. The most recent event is the 2015 Water/Energy Nexus Hackathon August 15-16. This hackathon helped bridge a gap between technology, water and energy – allowing students, professionals and technology enthusiasts the opportunity to showcase their talents and innovation. Other ways for users to learn about EPA data include: National Geospatial Program website Environmental Dataset Gateway (EDG) website Envirofacts Website EnviroMapper Facility Registry Service (FRS) website MyEnvornment Various EPA Program System websites and applications including AirNOW, Waters, Enforcement and Compliance History Online (ECHO), EJScreen, Clean Ups in My Community EPA GeoPlatform websites, EPA GeoPlatform press releases and EPA GeoPlatform listserv EPA GeoPlatform training classes and webinars. EPA GeoPlatform Exhibit booths at various conferences EPA GeoPlatform (GIO) weekly newsletter Various EPA GeoPlatform working groups EPA GeoPlatform blog entries We are also currently updating our two way communication on our open government site. We will have a discussion forum on our Developer Central site and a direct connection to Open Data Help desk site. The improved communication tools are expected to be operation in the first quarter FY 16. Developer Central site will contain a discussion forum to encourage folks to post suggestions for new APIs or Datasets to make available to the public as well as post examples of how our data and APIs are being used. Open Data Help desk will be used for reporting data issues. We will be replacing the Github discussion forum with these two things. The Github discussion forum is limited in terms of being branded as a place to report issues and does not have robust moderation capabilities. Currently, we have on place feedback buttons and blogs as well as an Error correction tool which located the web interfaces of numerous applications and allows the public to submit corrected information about specific data points. Once the problems are analyzed users are informed about the actions that will be taken to correct the issues.
See below User suggestions on improving data usability
EPA regularly participates in non-agency sponsored hackathon events where users can learn more about our data and use our data to build applications. The most recent event is the 2015 Water/Energy Nexus Hackathon August 15-16. This hackathon helped bridge a gap between technology, water and energy – allowing students, professionals and technology enthusiasts the opportunity to showcase their talents and innovation. Other ways for users to learn about EPA data include: National Geospatial Program website Environmental Dataset Gateway (EDG) website Envirofacts Website EnviroMapper Facility Registry Service (FRS) website MyEnvornment Various EPA Program System websites and applications including AirNOW, Waters, Enforcement and Compliance History Online (ECHO), EJScreen, Clean Ups in My Community EPA GeoPlatform websites, EPA GeoPlatform press releases and EPA GeoPlatform listserv EPA GeoPlatform training classes and webinars. EPA GeoPlatform Exhibit booths at various conferences EPA GeoPlatform (GIO) weekly newsletter Various EPA GeoPlatform working groups EPA GeoPlatform blog entries We are also currently updating our two way communication on our open government site. We will have a discussion forum on our Developer Central site and a direct connection to Open Data Help desk site. The improved communication tools are expected to be operation in the first quarter FY 16. Developer Central site will contain a discussion forum to encourage folks to post suggestions for new APIs or Datasets to make available to the public as well as post examples of how our data and APIs are being used. Open Data Help desk will be used for reporting data issues. We will be replacing the Github discussion forum with these two things. The Github discussion forum is limited in terms of being branded as a place to report issues and does not have robust moderation capabilities. Currently, we have on place feedback buttons and blogs as well as an Error correction tool which located the web interfaces of numerous applications and allows the public to submit corrected information about specific data points. Once the problems are analyzed users are informed about the actions that will be taken to correct the issues.
See below User suggestions on additional data releases
EPA regularly participates in non-agency sponsored hackathon events where users can learn more about our data and use our data to build applications. The most recent event is the 2015 Water/Energy Nexus Hackathon August 15-16. This hackathon helped bridge a gap between technology, water and energy – allowing students, professionals and technology enthusiasts the opportunity to showcase their talents and innovation. Other ways for users to learn about EPA data include: National Geospatial Program website Environmental Dataset Gateway (EDG) website Envirofacts Website EnviroMapper Facility Registry Service (FRS) website MyEnvornment Various EPA Program System websites and applications including AirNOW, Waters, Enforcement and Compliance History Online (ECHO), EJScreen, Clean Ups in My Community EPA GeoPlatform websites, EPA GeoPlatform press releases and EPA GeoPlatform listserv EPA GeoPlatform training classes and webinars. EPA GeoPlatform Exhibit booths at various conferences EPA GeoPlatform (GIO) weekly newsletter Various EPA GeoPlatform working groups EPA GeoPlatform blog entries We are also currently updating our two way communication on our open government site. We will have a discussion forum on our Developer Central site and a direct connection to Open Data Help desk site. The improved communication tools are expected to be operation in the first quarter FY 16. Developer Central site will contain a discussion forum to encourage folks to post suggestions for new APIs or Datasets to make available to the public as well as post examples of how our data and APIs are being used. Open Data Help desk will be used for reporting data issues. We will be replacing the Github discussion forum with these two things. The Github discussion forum is limited in terms of being branded as a place to report issues and does not have robust moderation capabilities. Currently, we have on place feedback buttons and blogs as well as an Error correction tool which located the web interfaces of numerous applications and allows the public to submit corrected information about specific data points. Once the problems are analyzed users are informed about the actions that will be taken to correct the issues.
Digital Analytics Program on /data

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

data.json
Expected Data.json URL http://www.epa.gov/data.json (From USA.gov Directory)
Resolved Data.json URL https://edg.epa.gov/data.json
Number of Redirects 2 redirects
HTTP Status 200
Content Type application/json
Valid JSON Valid
Datasets with Valid Metadata 99.9%(2530 of 2532)
Valid Schema Invalid
For more complete and readable validation results, see the full schema validator results
Schema Errors There are validation errors on 2 records

Errors on record 0:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 1:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Datasets 2532
Datasets with Distribution URLs 95.9% (2429 of 2532)
Datasets with Download URLs 95.6% (2420 of 2532)
Total Distribution URLs 4839
Total Download URLs 2420
Total APIs 2419
Public Datasets 2518
Restricted Public Datasets 5
Non-public Datasets 9
Bureaus Represented 1
Programs Represented 1
License Specified 99.9% (2530 of 2532)
Datasets with Redactions 0.0% (0 of 2532)
Redactions without explanation (rights field) 0.0% (0 of 2532)
File Size 5.38MB
Last modified Sunday, 30-Aug-2015 03:00:12 EDT
Last crawl Monday, 31-Aug-2015 00:03:25 EDT
Analyze archive copies Analyze archive from 2015-08-31
Nearby Daily Crawls
/data page
Expected /data URL http://www.epa.gov/data (From USA.gov Directory)
Resolved /data URL http://www.epa.gov/datafinder/
Redirects 2 redirects
HTTP Status 200
Content Type text/html
Last modified Tuesday, 18-Mar-2014 15:25:37 EDT
Last crawl Monday, 31-Aug-2015 00:02:43 EDT
/digitalstrategy.json
Expected /digitalstrategy.json URL http://www.epa.gov/digitalstrategy.json (From USA.gov Directory)
Resolved /digitalstrategy.json URL http://www2.epa.gov/sites/production/files/2015-05/digitalstrategy.json
Redirects 1 redirects
HTTP Status 200
Content Type text/plain
Valid JSON Invalid Check a JSON Validator
Last modified Friday, 29-May-2015 18:15:34 EDT
Last crawl Monday, 31-Aug-2015 00:02:43 EDT