Environmental Protection Agency


M-13-13 Milestone 6 - February 28th 2015

OMB Review Complete: OMB has completed the agency review for this milestone. Agencies should contact their OMB desk officer if anything looks incorrect.

Leading Indicators

These indicators are reviewed by the Office of Management and Budget

Review Status complete
Reviewer Jamie Berryhill
Last Updated April 27, 2015, 1:30 pm EDT by Jamie Berryhill

Assessment Summary

PDL: EPA's count of APIs appears to be inflated. Some datasets that do not appear to be true APIs are tagged as APIs, and many datasets are listed as separate datasets when they should actually be listed as one dataset if they were properly grouped as a "collection" in the metadata schema. For example, the "Toxic Release Inventory" should be one dataset, but is currently provided as over 1500 separate entries broken down by year and state.

Public engagement: EPA still uses an outdated blog for customer feedback. No signs of updates to the feedback mechanism since last quarter.

NOTE: Technical details on EPA's PDL were not available due to a technical limitation on OMB's part. The percentage of datasets that are published was not available for this review so we used last quarter's figures as an estimate.

Inventory Composition

Public Dataset Status

Status Indicator Automated Metrics
Overall Progress this Milestone
Inventory Updated this Quarter
4315 Number of Datasets
3856 Number of APIs
Schedule Delivered Crawl details
Bureaus represented
Programs represented
4154 Number of public datasets
46 Number of restricted public datasets
115 Number of non-public datasets
Inventory > Public listing
21% Percentage growth in records since last quarter
8,000 Spot Check - datasets listed by search engine
Agency provides a public Enterprise Data Inventory on Data.gov
License specified Crawl details
Status Indicator Automated Metrics
Overall Progress this Milestone
3674 Number of Datasets Crawl details
Number of Collections Crawl details
3674 Number of Public Datasets with File Downloads Crawl details
3267 Number of APIs Crawl details
Total number of access and download links Crawl details
Quality Check: Links are sufficiently working Crawl details
Quality Check: Accessible links Crawl details
Quality Check: Redirected links Crawl details
Quality Check: Error links Crawl details
Quality Check: Broken links Crawl details
3% Percentage growth in records since last quarter
100% Valid Metadata Crawl details
/data exists Crawl details
/data.json Crawl details
Harvested by data.gov
7257 Views on data.gov for the quarter
Status Indicator Automated Metrics
Overall Progress this Milestone
Description of feedback mechanism delivered Crawl details
Data release is prioritized through public engagement
Feedback loop is closed, 2 way communication
See below Link to or description of Feedback Mechanism
Status Indicator Automated Metrics
Overall Progress this Milestone
Data Publication Process Delivered Crawl details
Information that should not to be made public is documented with agency's OGC
Status Indicator Automated Metrics
Overall Progress this Milestone
See below Open Data Primary Point of Contact
POCs identified for required responsibilities

Best Practice: Environmental Protection Agency has been highlighted for demonstrating a best practice on the Use & Impact indicator

Status Indicator Automated Metrics
Overall Progress this Milestone
Identified 5 data improvements for this quarter
See below Primary Uses
All of the respondents use of data from the U.S. Environmental Protection Agency (EPA) is to support improved environmental outcomes. As one noted, they work with local governments, non-profit organizations, academic research projects and federal agencies in a broad range of domains that include: public safety, land conversation, urban ecosystems, watersheds, transportation, and human services. Another firm integrates EPA data with financial information to allow their clients to research environmental impact and history when screening stocks and customizing their stock portfolios. Another used EPA data and the Department of Energy data to produce a product that informs their users about the carbon dioxide emissions of their building portfolios. Several were using EPA data within a geographic information system or mapping program. One was combining EPA data with state and local information to visualize and map data for the public, so communities can make sense of and share raw government data to which citizens would not otherwise have access.
See below Value or impact of data
One firm stated, in many cases, their work would not be possible or would be significantly different if the (government) data sets did not exist. Open data has enabled greater access to EPA and other federal data and has allowed companies to integrate (mash-up) and explore a far wider range of information sources, illustrating to federal clients the value of more fully exploiting the vast repositories of administrative data they regularly collect. One firm noted the benefit from the revenue made by consulting projects using the data and through our API service, which other citizens and companies use. Others noted value added benefits to their clients and to the public. One stated that open data allowed them to help their clients identify unaddressed environmental protection program priorities. Some of the respondents pointed out that the quality or the format of the data actually limited the value of some information (see the suggestions for improving usability of data question).
See below Primary data discovery channels
The respondents have learned about EPA datasets in the following ways input from their clients, conferences, EPA webpages, Google searches, and Twitter inquiries to their professional network. A couple have worked with EPA data for years and knew what was available. One even reported all of the data sets they used as EPA acronyms. One mentioned the need to hunt data down. Some noted that EPA (program) web pages were key to a better understanding of the data they used.
See below User suggestions on improving data usability
EPA received several suggestions (some ardent) to improve its (and other Federal) data. One firm stated that their work and research indicates that there might be other, more curated means of advancing the amount and value of information provided by federal agencies. Another suggested a portal of all the data, making it as flat (not multiple tables) and as granular as possible, and include latitude and longitude. The same firm noted EPA might consider pre-formatting some data in something like the omgstandard.com format as an alternative to multiple tables. This reflects that Federal data is often integrated with data from local governments that have their own emerging standards. One said, the most important thing with opening data is to follow the Sunlight Foundation's recommendations. Also they asked that EPA make sure to make the columns as granular as possible and include accurate latitude and longitude for all data when applicable. A couple of firms noted that it would be beneficial to us to have the data in an exportable file that is already aggregated at the parent level. Another suggestion was to make data easier to use and access (TRI, for example, seemed to be more difficult to use and has less useful data than data released 10 years ago, and it was unclear why that was the case). One asked that EPA EnergyStar for buildings adopt the Green Button national standard as the format for uploading usage data. They also asked for the federal government (Department of Energy) to provide more detailed pricing data on how much electricity costs end users. EPA appreciates this feedback and will direct the specific feedback to the appropriate data stewards.
See below User suggestions on additional data releases
None, really. A couple of the companies alluded that they we were not sure what all was available.
Digital Analytics Program on /data

Automated Metrics

These metrics are generated by an automated analysis that runs every 24 hours until the end of the quarter at which point they become a historical snapshot

Expected Data.json URL http://www.epa.gov/data.json (From USA.gov Directory)
Resolved Data.json URL http://www2.epa.gov/sites/production/files/2013-11/dcat_public_v2.json
Number of Redirects 1 redirects
HTTP Status 200
Content Type text/plain
Valid JSON Invalid Check a JSON Validator
Datasets with Valid Metadata
Valid Schema Invalid
File Size 8.95MB
Last modified Thursday, 26-Feb-2015 18:24:47 EST
Last crawl Friday, 27-Feb-2015 23:01:55 EST
Analyze archive copies Analyze archive from 2015-02-28
/data page
Expected /data URL http://www.epa.gov/data (From USA.gov Directory)
Resolved /data URL http://www.epa.gov/data/
Redirects 1 redirects
HTTP Status 200
Content Type text/html
Last modified Tuesday, 18-Mar-2014 15:25:37 EDT
Last crawl Friday, 27-Feb-2015 23:01:55 EST
Expected /digitalstrategy.json URL http://www.epa.gov/digitalstrategy.json (From USA.gov Directory)
Resolved /digitalstrategy.json URL http://www.epa.gov/digitalstrategy.json
HTTP Status 200
Content Type application/json
Valid JSON Invalid Check a JSON Validator
Last modified Wednesday, 18-Dec-2013 09:47:54 EST
Last crawl Friday, 27-Feb-2015 23:01:55 EST