It’s hard to ignore the clamor inside and outside of government circles for agencies to harness data more effectively.
One of underlying lessons revealed thus far by the ongoing pandemic has been the challenges federal and state leaders have faced trying to obtain current, reliable data in roughly real time in order to make critical public policy decisions.
Within government agencies, however, there’s a deeper issue: How to manage vast repositories of data — and develop more cohesive data governance strategies — so that whether you’re an analyst, a program manager or an agency executive, you can get the information you need, when you need it and the way you need it.
That’s why data virtualization is emerging as a pivotal solution in the eyes of a growing number of IT experts.
One of the ironies of the Big Data movement over the past decade is that while it helped organizations come to grips with the explosion of data being generated every day, it also caused a bit of damage by overpromising and underdelivering. The evolution of the cloud and software like Apache Hadoop was supposed to help government agencies, for instance, pour their siloed data sets into vast data lakes, where they could merge, manipulate and capitalize on the previously unseen value lying dormant in all those databases.
A lot of CIOs said, “Great, we have this gigantic data lake, and we can dump everything into them, and that will resolve all or most of our data sharing problems.”
Yet, over time, what many agencies discovered they had created was more of a data swamp.
Part of the problem stems from the fact that organizations hadn’t always taken the necessary steps to resolve and apply adequate data governance structures to their digital assets. Another factor is the extent to which agencies still need to complete the work of inventorying and properly cataloging those assets.
But perhaps the biggest issue is the assumption that data lakes make sense in the first place.
For many organizations, data lakes make data readily discoverable and easy to analyze. But in government, where data tends to remain federated, there are inherent inefficiencies in physically pooling data. It’s not that people don’t want to share data — although some are more willing to share than others — it’s just that it’s complicated to do so.
It often takes a dozen or more applications to properly locate, identify, authenticate and catalog data before migrating it to a data lake or the cloud; and depending on the type of data, it can also require a suite of other products to harmonize it with other data and make it useful to different stakeholders.
And the question is, why are we doing that if there’s a simpler, faster and more functional way to deliver the same end results?
A growing number of organizations are in fact discovering there is a better alternative —data virtualization.
A data virtualization system essentially creates a virtual fabric, delivering an easy-to-consume view across hundreds of data sources. Instead of lifting and shifting source data to the lake or cloud, data virtualization directly accesses the original sources, providing the equivalent access to all the sources, but with about one-tenth the development time and about the same query speed.
Data virtualization offers several advantages:
First, it provides a much more agile, less resource-intensive way of getting the data into the hands of decision makers. For instance, it can give agency analysts the ability to aggregate views of various sets of data, port selected data more easily into visual analytics tools and ultimately perform different predictive analytics exercises all without having to actually assemble all that data in temporary repositories.
Second, It gives different users — and more employees overall — the self-service ability to find and view the data in ways that are most relevant to their needs. Consider all the people working on COVID-19 data. You’ve got data scientists, who need to analyze what’s happened and come up with a predictive models. You’ve got business analysts, who need to understand the evolving impact on agency operations as employees began working remotely. And you have public policy makers, who need to decide what steps to recommend to national leaders. They are all looking at the data in a completely different ways. Data virtualization allows agency employees to spin up needed data sets in the right format, at the right time for the right persona, so it’s useful to those who need it.
Third, data virtualization can help agencies not only pivot faster around emergency management, but also quickly address other cross-agency issues that no single user might discern, such as sudden changes in supply chains or emerging instances of fraud — all in a matter of hours or days, not months.
Fourth and finally, virtualization also can go a long way in helping agencies streamline their federal data management strategies, by creating a uniform approach to accessing and utilizing federated data across their agency and across multicloud environments, without all the heavy lifting that commonly occurs using data lakes.
If the pandemic taught us anything, it’s that when time is of the essence, thinking outside the box — or in this case, beyond the data lake — is not only possible, but for many agencies, can also be more advantageous.
Mark Palmer is General Manager of Analytics, Data Science and Data Virtualization and Cuong Nguyen is Vice President of Federal Sales at TIBCO.
Find out more on how TIBCO can help your agency master and virtualize its data.