Ben Franklin famously observed that the only things certain in life are death and taxes — but now we can add data to that list, according to Bill Marion, deputy CIO for the U.S. Air Force.
The result? “It was easier to find [that] 200-year-old quote from Ben Franklin … than the notes for a speech I made last year in Colorado.”
“That’s the reality we all face today,” Marion told MarkLogic’s Data Innovation Summit, presented by FedScoop, at the Newseum in Washington, D.C. on Wednesday.
A succession of federal officials described the challenges they face on a daily basis, dealing with the tsunami of data generated in an increasingly wired world.
According to figures Marion presented from consulting firm Excelacom, 150 million emails will be sent every minute this year, along with 347,000 tweets. More than 1300 Uber rides will also get booked every minute.
To make matters worse, “data is an afterthought” for most policymakers, observed Robin Thottungal, the Environmental Protection Agency’s first chief data scientist.
As a result, said Marion, up to 60 percent of the data most organizations hold is “dark” — not visible to managers and potentially stored in inappropriate or non-compliant ways. Marion said “dark data” in the military even includes classified material.
Even without dark data, large enterprises often waste resources “wrangling” data to get it “into a shape where it can be used,” said Jon Bakke, MarkLogic’s executive vice president for worldwide customer operations. Data scientists in such organizations often spend as much as 80 percent of their time on what he called “ETL — extraction, transfer and loading.”
“That takes time, effort and a lot of money,” he said.
Marion added that recent advances in technology mean “we don’t have to standardize the data” into a particular format any more.
But there are still serious challenges. “The key elements,” he said, “are securing it, getting access to it, and correlating it using data science.”
Bringing it back to Franklin’s certainties, Marion quipped that “we will all be dead before the data problem is solved.”