The most serious attack on an artificial intelligence system may not come from malware but rather from a single sticky note.
Because AI systems process tremendous amounts of data to “learn” how to interact in their environment, a single anomaly — such as a sticky note affixed to a stop sign — could alter the technology’s perception of what it is seeing based on how it’s been trained. So instead of interpreting a stop sign, the sticky note could trigger an AI-enabled autonomous vehicle to register it as a speed limit sign, which are in some places yellow and square, causing potential injury. And if you embed enough triggers in an AI system’s learning data, you can dramatically alter its behavior.
Officials from the Intelligence Advanced Research Projects Activity (IARPA) are looking for a software package that could help thwart such a scenario, known as Trojan or backdoor attack, and are reaching out to industry for help.
The intelligence community’s research arm issued a draft broad agency announcement Saturday calling for industry input on a solution that could evaluate and secure an AI platform’s data and training pipeline to protect it from potential trojan attacks.
IARPA’s program, dubbed TrojAI, seeks to detect when potential trojan triggers are placed in the data that an AI system uses to train before deploying for real-world operations. But such a goal is not without challenges itself.
“The obvious defenses against Trojan attacks are cybersecurity (to protect the training data) and data cleaning (to make sure the training data is accurate),” the BAA said. “Unfortunately, modern AI advances are characterized by vast, crowdsourced datasets that are impractical to clean or monitor.”
AI systems also incorporate public data sets used to train other AI and adapt it for a specific use case, a process called transfer learning. As a result, trojan triggers could be discretely ingested in the learning data of multiple AI systems without the knowledge of the developers.
“For Trojan attacks to be effective, the trigger must be rare in the normal operating environment, so that it does not affect the AI’s performance on test data sets or in normal operations,” IARPA officials said. “Additionally, the trigger is ideally something that the adversary can control in the AI’s operating environment, so they can activate the Trojan behavior. Alternatively, the trigger is something that exists naturally in the world, but is only present at times where the adversary knows what it wants the AI to do.”
The TrojAI program wants to develop a solution that can detect those triggers after an AI system has been trained. Specifically, they are looking for an industry partner with deep learning, cybersecurity and other experience to help craft inspection techniques for an AI’s internal processing to determine when and if a trojan trigger has been placed.
Officials said they will first test the potential solution on deep neural networks that have been exposed to trojan attacks and scale it up to other AIs as it becomes more effective.
The program will run for one base year, with an option year and the potential for additional follow-on work.
IARPA officials said they would accept feedback, comments and questions on the draft BAA until Jan. 4, followed by the issuance of a final BAA at a later date.