The benefits from generating and using open data are great, and the Obama administration deserves credit for advancing this initiative in the U.S. government. But federal open data and transparency initiatives are really just entering the toddler stage.
Presidential actions and recent legislation have moved us past the birth and crawling stages. Here are some milestones:
• The president’s Jan. 21, 2009, Memorandum on Transparency and Open Government.
• Executive Order 13642 titled “Making Open and Machine Readable The New Default For Government Information,” which was signed May 9, 2013.
• The Digital Accountability and Transparency Act of 2014, also known as the DATA Act.
The reason I suggest that we are in the toddler stage is because while we have started well and have made progress, agencies still are not quite ready to run.
What we have seen to date at the agency level has generally sounded like the following conversation:
Official A: “We have to provide some open data to the public because there is a clamor for transparency — and our administration and Congress are committed to it.”
Official B: “OK, but what should we provide?”
Official A: “Well, what data sets do we have that are: (1) in-hand, or easy to dig up; (2) fairly accurate; and, (3) unlikely to embarrass our agency?”
Official B: “Only three of our 1,200 data sets are like that. Let’s share those three.”
The moral of this story: Beware when only suppliers decide what data should be transparent.
The status quo right now is that the suppliers of open data (i.e. the federal agencies) make all the decisions about what data to provide.
Although the three criteria mentioned above (finding data that’s in-hand, accurate and not embarrassing) are not always the only ones applied, they are definitely a big part of the data harvesting mix.
What is missing is input from the data users’ perspective and input from nonusers — those who could benefit and find value in the open data, but who do not see what they need among what is being offered.
It is the classic “unknown unknown” problem applied to data transparency: Nonusers do not know what data is available, nor do they have a way to identify what data they want.
Conversely, agencies may not know who the beneficiaries are, and a potential beneficiary might not even realize that they could benefit.
For example, those who could benefit from knowing the prices charged by different health providers may not know that the Centers for Medicare and Medicaid Services has that data. What’s more, most agencies have reason to avoid asking for feedback about their data transparency efforts from any more than nine users — otherwise they trigger a Paperwork Reduction Act, or PRA, approval process by the Office of Management and Budget.
What to Do?
Three actions would substantially improve the government’s open data programs and efforts.
1. Ask for input and feedback from data consumers. One entity (OMB or the General Services Administration probably make the most sense) should propose methods that could be used by any interested agencies and that are consistent with PRA. This should include feedback on data already exposed (who uses it, for what and how valuable is it?) as well as feedback on data that is not exposed, and feedback from people who do not yet access open data (what data would you want, what would you do with it and how valuable would it be?) The PRA hurdle should not be overly daunting, since responses will be voluntary.
Identify and briefly describe all the data sets that exist at each agency. After that, tell potential data consumers about those data sets and let them ask for what they want. (That’s as long as the data is not personally identifiable information or national-security-sensitive.) While this may be time consuming and troublesome, it is the only way to fairly ensure that the decision of “What data should be exposed” includes the consumer’s input. Ultimately, this is the best path to increase trust in government and to maximize the value from open data. While this may seem daunting, in actuality, technology such as an intranet web-crawler can accomplish much of the work with modest cost and effort. A side benefit is that efforts to cleanse and ensure accuracy of data before exposing it can be focused just on those data sets that users really want.
3. Provide the data that is most wanted and valued. Agencies should not simply offer the data that is easiest to dig up or that won’t embarrass the agency. If your agency’s budget for cleansing and validating the data isn’t enough to do everything, prioritize what you expose according to how much interest the data consumers show and what value they will get from the data. Also, make sure to consider data that is wanted by people who are not be currently accessing your agency’s data.
We already let the public see and imagine what they can do with the data that the government exposes – through GPS, the Weather Channel and various established data services. Why not build on that by letting the public help decide which government data they should get to see?
Jeff Myers is a principal at REI Systems Inc. He focuses on analytics, transparency and performance for REI’s federal civilian clients. He has contributed to the work behind DHS’ award-winning Management Cube, GSA’s Data.gov and Performance.gov, FDA’s Data Dashboard, and the Department of Education’s risk management and analytics systems.