Despite the hype surrounding big data, federal data leaders offered this warning: Just because it’s there doesn’t mean you have to collect it.
Data gathering efforts need a purpose, they said during a FedInsider-hosted event Tuesday. Niall Brennan, chief data officer of the Centers for Medicare and Medicaid Services, told audience members that CMS has seen success recently improving the quality of patient care by establishing “tangible outcomes.”
The world of data science, Brennan said, has become flooded with meaningless buzzwords, almost creating a fad with “way too much trust placed in magical, out-of-the-box plug-in solutions.”
The Federal Communications Commission’s Tony Summerlin, who advises Chief Information Officer David Bray, takes an equally pragmatic, and perhaps more cynical, approach to federal big data. “Why are you collecting this in the first place?” Summerlin asked. “What you’re collecting it for is so important.”
He added, “I think sometimes we think that just putting all this stuff together somehow creates value for someone, and I’m not so sure about that.”
Viewing massive amounts of data, Summerlin reasoned, can be good. It can provide valuable insights if it’s analyzed correctly. But his worry is that viewing it with vague intentions could lead to poor results — “garbage in, garbage out.”
“I can correlate data and make anything true … and I think that’s what we need to be careful of,” he said.
There’s also concern on the privacy side, in aimlessly collecting and publishing large volumes of data. Linda Powell, chief data officer for the Consumer Financial Protection Bureau, said many agencies want to rush into analyzing data without accounting for the fundamentals, like privacy and security.
“I spend a tremendous amount of time ensuring privacy, the maintenance of the privacy, for the data that we get,” Powell said. Scrubbing data of personally identifiable information, or PII, is a tedious and incomplete science, she said. “It’s not possible to completely scrub data so that it couldn’t ever be re-identified,” but it’s something she said agencies must work toward.
Summerlin was a lot less confident in the government’s ability to provide privacy.
“I believe in privacy protection and so forth, but the government is inherently really crappy at it,” he said. “I admire the government for trying their best to keep PII and information out of the realm. I hope we get better at it. But it’s kind of a bifurcation when part of the government is trying to make sure you have no anonymity and the other part of the government is trying to make sure you’re completely protected.”
Generating value from de-identified federal data sets makes extracting their value even more difficult, as most of what would be considered immediately meaningful information is redacted. And it’s hard to do without making harmful mistakes, Summerlin said.
There is value in publishing data, “there’s no question about it,” he said. “But publishing, just giving people access to data that is not PII, that can’t hurt someone, is not easy.”
Nevertheless, Brennan said it can be surprising who might consider released data useful.
“All of the sudden somebody takes it, matches it with someone else and builds something cool,” Brennan said. “So you almost have to err on the side of openness.”
Looking forward, he said the billions of data points his agency collects will help it improve health care across the country.
“We view ourselves and our data as being a key accelerant in health care reform, health care transformation,” Brennan said. “Readmission rates are declining, hospital-acquired conditions are declining, incremental debts are declining. I always like to say, ‘We’re not declaring victory — we’re declaring progress.’ We know there’s a long way to go, but we equally know data is going to play a key role in that evolution.”