Agencies face continued challenges making security data ready for machine learning

While agencies have advanced capabilities to monitor and analyze user behavioral data, they need added analytics and ML support to improve cyber resilience, according to a new report.
(Getty Images)

The ability for federal agencies to harness artificial intelligence and machine learning to identify anomalous behavior on their networks depends increasingly on having robust data gathering, preparation and analytics capabilities in place.

Though a large majority of federal IT and agency leaders polled in a new FedScoop study say their agencies have above-industry-average capabilities to monitor, collect and analyze behavioral data across their networks, trying to use that data for machine learning (ML) remains a significant challenge, especially for identifying and responding to anomalous behaviors on their networks.

Read the full report

Chief among their challenges, according to the survey, are a lack of lack of experience and requisite skills in training and testing machine learning algorithms; a lack of adequate tools to perform all the work in processing data, as well as a lack of clarity about what tools and services in the marketplace meet their ML needs; and a lack of reliable ML-ready data to work with.

This FedScoop study, released this week, surveyed 160 prequalified IT and program executives at large, medium and small federal agencies to explore the state of their data analytics and machine learning capabilities. The study also identified the obstacles agencies continue to face across the life cycle of gathering, processing and analyzing data. And the study looked at the types of services agencies are turning to for greater support. The survey was conducted online in August and September 2021 and underwritten by Cloudera.

Among other findings:

ML challenges vary by agency size — 4 in 10 of respondents at large agencies (10,000-plus employees) — which tend to deal with larger scale data challenges — cited a lack of adequate ML-related skills as a top challenge, compared to 2 in 10 respondents at small agencies (fewer than 1,000 employees) — which are often still ramping up ML efforts or which rely more on third parties.

Conversely, 1 in 3 respondents at small agencies cited a lack of adequate tools among their biggest challenges, compared to less than 1 in 4 at large agencies. And more than twice as many respondents at small and mid-size agencies struggle with a lack of reliable ML-ready data, compared to their counterparts at large agencies.

Skills gaps across ML process — Respondents at agencies of all sizes say they face significant deficiencies in skills across the data processing and machine learning life cycle — from data ingestion, to extraction, to transforming and loading, to analysis, to ML-training to operationalizing ML. The study suggests those deficiencies are hampering the ability for agencies to implement zero-trust models and establish greater cyber resiliency. 

Agencies appear to have the data they need — There was positive news in the study, which found that agencies have the capabilities required to monitor, process, store and analyze behavioral data about users, devices and applications operating on their networks — with more than 2 in 3 respondents saying those capabilities meet or exceed industry and NIST accepted standards. What was less clear, the report said, was how fully or effectively agencies are harnessing those capabilities.

Reliance on external support — While federal IT leaders indicate they have the capabilities to handle anomalous behavior data, a sizeable portion also report they’re opting to tap the expertise of external service providers at every stage of the data-gathering-to-ML process. The areas where agencies are most often seeking help are for data analytics and data integration and production; but there’s also high demand for help with ML governance, and ML training. 

The study also touched on other dimensions of data readiness, including:

  • Agencies’ capability to securely gather data at the edge of their networks as well as across their network environments.
  • Where agencies are storing their ML production data.
  • The extent to which agencies are relying on open-source solutions versus in-house and commercial solutions to prepare their ML data.

“While federal IT leaders maintain their agencies have the capabilities to ingest, prepare and analyze data, they still need help harnessing those capabilities to leverage machine learning in order to better detect and respond to anomalous behavior on their networks,” the study concluded.

Additionally, the deficiency in skills across most ML-related data processing stages — and the rapid evolution of data management and ML tools — suggest agencies “would benefit from moving to more modern, integrated platforms for ingesting and analyzing behavioral data to improve cyber resilience. They would also likely achieve zero trust frameworks faster by engaging with service providers specializing in modernized data and ML solutions that can spot anomalous behaviors,” the report said. 

Download the full report, “Data analytics readiness for cyber resilience” for detailed findings and guidance on improving data gathering, preparation and analytics for improved threat detection.

This article was produced by FedScoop and sponsored by Cloudera.

Latest Podcasts