Environmental Protection Agency

http://www.epa.gov/

Milestone 7 - May 31st 2015

OMB Review Has Not Begun: OMB has not begun reviewing the agency for this milestone. The review will begin after the milestone date.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status
Reviewer Rebecca Williams
Last Updated November 3, 2015, 6:37 pm EST by Rebecca Williams

Assessment Summary

PDL: EPA's count of APIs appears to be inflated. Some datasets that do not appear to be true APIs are tagged as APIs, and many datasets are listed as separate datasets when they should actually be listed as one dataset if they were properly grouped as a "collection" in the metadata schema. For example, the "Toxic Release Inventory" should be one dataset, but is currently provided as over 1500 separate entries broken down by year and state. 97% of datasets have license information available.

Public engagement: EPA still uses an outdated blog for customer feedback. No signs of updates to the feedback mechanism since last quarter.

NOTE: Technical details on EPA's PDL were not available due to a technical limitation on OMB's part. The percentage of datasets that are published was not available for this review so we used last quarter's figures as an estimate.

Inventory Composition

Public Dataset Status

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
2686 Number of Datasets
2550 Number of APIs
Schedule Delivered Crawl details
1 Bureaus represented
1 Programs represented
2622 Number of public datasets
19 Number of restricted public datasets
46 Number of non-public datasets
Inventory > Public listing
Percentage growth in records since last quarter
6,820 Spot Check - datasets listed by search engine
Agency provides a public Enterprise Data Inventory on Data.gov
97% License specified Crawl details
Status Indicator Automated Metrics
Overall Progress this Milestone
2686 Number of Datasets Crawl details
Number of Collections Crawl details
2577 Number of Public Datasets with File Downloads Crawl details
2550 Number of APIs Crawl details
5127 Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
Quality Check: Accessible links Crawl details
Quality Check: Redirected links Crawl details
Quality Check: Error links Crawl details
Quality Check: Broken links Crawl details
-27% Percentage growth in records since last quarter
94.8% Valid Metadata Crawl details
/data exists Crawl details
/data.json Crawl details
Harvested by data.gov
Views on data.gov for the quarter
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Feedback loop is closed, 2 way communication
See below Link to or description of Feedback Mechanism
http://blog.epa.gov/data/
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
Status Indicator Automated Metrics
Overall Progress this Milestone
greene.ana@epa.gov Open Data Primary Point of Contact
POCs identified for required responsibilities
Status Indicator Automated Metrics
Overall Progress this Milestone
Identified 5 data improvements for this quarter
See below Primary Uses
All of the respondents use of data from the U.S. Environmental Protection Agency (EPA) is to support improved environmental outcomes. As one noted, they work with local governments, non-profit organizations, academic research projects and federal agencies in a broad range of domains that include: public safety, land conversation, urban ecosystems, watersheds, transportation, and human services. Another firm integrates EPA data with financial information to allow their clients to research environmental impact and history when screening stocks and customizing their stock portfolios. Another used EPA data and the Department of Energy data to produce a product that informs their users about the carbon dioxide emissions of their building portfolios. Several were using EPA data within a geographic information system or mapping program. One was combining EPA data with state and local information to visualize and map data for the public, so communities can make sense of and share raw government data to which citizens would not otherwise have access.
Value or impact of data
See below Primary data discovery channels
The respondents have learned about EPA datasets in the following ways input from their clients, conferences, EPA webpages, Google searches, and Twitter inquiries to their professional network. A couple have worked with EPA data for years and knew what was available. One even reported all of the data sets they used as EPA acronyms. One mentioned the need to hunt data down. Some noted that EPA (program) web pages were key to a better understanding of the data they used.
See below User suggestions on improving data usability
EPA received several suggestions (some ardent) to improve its (and other Federal) data. One firm stated that their work and research indicates that there might be other, more curated means of advancing the amount and value of information provided by federal agencies. Another suggested a portal of all the data, making it as flat (not multiple tables) and as granular as possible, and include latitude and longitude. The same firm noted EPA might consider pre-formatting some data in something like the omgstandard.com format as an alternative to multiple tables. This reflects that Federal data is often integrated with data from local governments that have their own emerging standards. One said, the most important thing with opening data is to follow the Sunlight Foundation's recommendations. Also they asked that EPA make sure to make the columns as granular as possible and include accurate latitude and longitude for all data when applicable. A couple of firms noted that it would be beneficial to us to have the data in an exportable file that is already aggregated at the parent level. Another suggestion was to make data easier to use and access (TRI, for example, seemed to be more difficult to use and has less useful data than data released 10 years ago, and it was unclear why that was the case). One asked that EPA EnergyStar for buildings adopt the Green Button national standard as the format for uploading usage data. They also asked for the federal government (Department of Energy) to provide more detailed pricing data on how much electricity costs end users. EPA appreciates this feedback and will direct the specific feedback to the appropriate data stewards.
See below User suggestions on additional data releases
None, really. A couple of the companies alluded that they we were not sure what all was available.
Digital Analytics Program on /data

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

data.json
Expected Data.json URL http://www.epa.gov/data.json (From USA.gov Directory)
Resolved Data.json URL http://www2.epa.gov/sites/production/files/2013-11/dcat_public_v2.json
Number of Redirects 1 redirects
HTTP Status 200
Content Type text/plain
Valid JSON Valid
Datasets with Valid Metadata 94.8%(2545 of 2686)
Valid Schema Invalid
For more complete and readable validation results, see the full schema validator results
Schema Errors There are validation errors on 141 records

Only showing errors from the first 10 records:

Errors on record 0:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 1:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 2:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 3:
description
  • NULL value found, but a string is required
Errors on record 4:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 5:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 6:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Errors on record 7:
description
  • NULL value found, but a string is required
Errors on record 8:
description
  • NULL value found, but a string is required
Errors on record 9:
license
  • Invalid URL format
  • string value found, but a null is required
  • does not match the regex pattern ^(\[\[REDACTED).*?(\]\])$
  • failed to match at least one schema
Datasets 2686
Datasets with Distribution URLs 96.3% (2586 of 2686)
Datasets with Download URLs 95.9% (2577 of 2686)
Total Distribution URLs 5127
Total Download URLs 2577
Total APIs 2550
Public Datasets 2621
Restricted Public Datasets 19
Non-public Datasets 46
Bureaus Represented 1
Programs Represented 1
License Specified 97.0% (2606 of 2686)
Datasets with Redactions 0.0% (0 of 2686)
Redactions without explanation (rights field) 0.0% (0 of 2686)
File Size 6.79MB
Last modified Friday, 29-May-2015 17:45:17 EDT
Last crawl Sunday, 31-May-2015 00:22:22 EDT
Analyze archive copies Analyze archive from 2015-05-31
Nearby Daily Crawls
/data page
Expected /data URL http://www.epa.gov/data (From USA.gov Directory)
Resolved /data URL http://www.epa.gov/data/
Redirects 1 redirects
HTTP Status 200
Content Type text/html
Last modified Tuesday, 18-Mar-2014 15:25:37 EDT
Last crawl Sunday, 31-May-2015 00:21:38 EDT
/digitalstrategy.json
Expected /digitalstrategy.json URL http://www.epa.gov/digitalstrategy.json (From USA.gov Directory)
Resolved /digitalstrategy.json URL http://www2.epa.gov/open/digital-strategy
Redirects 1 redirects
HTTP Status 200
Content Type text/html; charset=utf-8
Valid JSON Invalid Check a JSON Validator
Last crawl Sunday, 31-May-2015 00:21:39 EDT