Nasheb Ismaily is a seasoned big data and cloud software engineer. He currently works for Cloudera as a senior solutions engineer.
As federal agencies grapple with increasing demands to meet the government’s Federal Data Strategy and various data privacy and security regulations, the need for data transparency and traceability has taken on greater importance.
The patchwork of traditional agency systems and applications, however, is no longer up to the task to provide the transparency and traceability around data that agencies require today.
Given the detailed data-handling and compliance requirements of GDPR, GLBA, HIPAA, PIPEDA and CCPA, agencies and the organizations they work with need a comprehensive solution to help them understand and document how data is gathered, manipulated and enriched across an increasingly distributed environment.
Additionally, agencies must also manage an increasing amount of data being generated at the edge. While in the past, data from sensors streamed into the cloud to get processed, today there just isn’t enough bandwidth for this to work.
When those concerns are taken together with the intensifying demands to protect government data, federal agencies need to consider a more modern data management solution that can fit the current and future needs of their enterprise operations.
Rising to the challenge of integrating data
Today, many agencies contend that their custom-built point solutions are still suited to manage their data. The challenge they tend to create is an inability to track the overall flow of data. That not only limits agencies from tracing data provenance, but in case of a breach it becomes more difficult, if not impossible, to review what vulnerabilities led to the breach.
The greater the number of point solutions that information must travel through, the greater the chances that end-to-end tracing details will break down.
That’s where an integrated data management platform, which captures data from the edge and tracks its progress through the entire pipeline, can make a huge difference. It can trace how data came into the organization, who manipulated it, how it was enriched or how it was changed, no matter which cloud or server it resides on.
Modern data management platforms also have an advantage over point solutions in being to manage data throughout its lifecycle, including the ability to:
- Collect data as it streams across the enterprise.
- Enrich data by cleaning and structuring it, making it easier to analyze.
- Report data and generate meaningful visualizations.
- Serve data using operational databases.
- Make predictions with the data using machine learning and AI applications.
Underlying those capabilities is the ability to standardize and centralize data policies, security, governance and metadata and empowers agency leaders to get better insights something more.
Prescriptive analytics to course-correct
A lot of people are familiar with the notion of predictive analytics. Aided by machine learning and artificial intelligence technology, organizations can use data insights to predict probable outcomes. However, with the scale and velocity of today’s information demands, organizations need the ability to do more than to predict outcomes; they need to invoke responses automatically.
There aren’t enough skilled workers or hours in the day to monitor activity on the growing number of edge devices — or keep up with the pace of cyberthreats. That’s why platforms like those offered by Cloudera are increasingly needed to help enterprise leaders operationalize prescriptive analytics into corrective action.
Agencies are already familiar with variations of that in the cybersecurity space: If an intrusion occurs on a certain network, port, or device, an AI bot can automatically detect it and shut down traffic. And that kind of response can work in a variety of other circumstances across the enterprises, to better support employees and serve the public.
Cloudera’s DataFlow (CDF) technology enables this kind of actionable intelligence by solving several key challenges relating to the data, real-time insights, operations efficiency and security.
CDF is able to handle high volumes of data arriving with multiple different types of formats and a diverse set of protocols by leveraging NiFi. This technology, also called “Niagara Files”, was previously in development and used at scale within the NSA. It was made available to the Apache Software Foundation through the NSA Technology Transfer Program (TTP) and we are using it today for all types of data movement, enrichment and transformation.
In terms of real-time insights, the need for analyzing streaming data in real time and the ability to respond back to opportunities and anomalies is extremely important and that’s exactly where analytics come in. However, analytics cannot be hindsight. Predictive and prescriptive analytics take the data that is being streamed in, predict what’s going to happen, and prescribe what kind of corrective actions need to be taken. Cloudera enables this kind of actionable intelligence through our streaming analytics technologies which include Spark Streaming, Kafka Streams and Flink.
From an operations perspective, the lack of visibility into end-to-end streaming data flows, the inability to troubleshoot bottlenecks, and the need for understanding who’s consuming what type of data and who’s being the bottleneck are all examples of usability challenges that enterprises face. Cloudera solves these challenges and enables operational efficiency by providing analytic experiences across the entire streaming data lifecycle.
Finally, the comprehensive nature of CDF is managed, governed, and secured by Cloudera’s Shared Data Experience (SDX) allows organizations to trace its data in real time, from the edge all the way to AI.