top of page
Search

Multi-Modal Agentic AI Cybersecurity Threats with Ali Leylani

Multi-Modal Agentic AI Cybersecurity Threats
Multi-Modal Agentic AI Cybersecurity Threats

Artificial intelligence is no longer the stuff of science fiction. It's here, and it's evolving at a breathtaking pace. Among the most exciting and potentially transformative developments in AI is the rise of "agentic AI" – intelligent systems that can not only process information but also reason, plan, and take autonomous actions in the digital and even physical worlds. These AI agents promise to revolutionize everything from personal assistants to complex industrial processes. But as with any powerful new technology, agentic AI brings with it a new set of risks, particularly in the realm of cybersecurity.

In a recent, insightful presentation at Stockholm AI, Ali Leylani of Echo Alpha delved deep into the emerging landscape of multimodal agentic AI cybersecurity threats. His talk provided a sobering yet essential overview of the vulnerabilities inherent in these sophisticated systems and the new attack vectors they present. This blog post will unpack the key themes of Leylani's presentation, offering a comprehensive look at the challenges we face in securing the next generation of AI.


Understanding the Agent: More Than Just a Language Model

Before we can grasp the threats, we must first understand what makes an AI agent different from the large language models (LLMs) like ChatGPT that many of us are familiar with. Leylani breaks down the anatomy of an agent into three core components:

  • The "Brain": This is the reasoning engine of the agent, typically a powerful LLM that allows it to understand, plan, and make decisions.

  • Memory: Agents possess both short-term and long-term memory, enabling them to learn from past interactions and maintain context over time. This is often achieved through techniques like Retrieval-Augmented Generation (RAG), which we'll explore later.

  • Tools: This is where agents truly come to life. Tools are the interfaces that allow an agent to interact with its environment. They can be anything from a simple search engine query to a complex set of APIs for controlling a smart home, a trading platform, or even a network of drones. As Leylani emphasizes, it's the tools that give agents their power, but also where many of the most significant vulnerabilities lie.

Conceptual visual for the "brain, memory, and tools" framework
Conceptual visual for the "brain, memory, and tools" framework

From Single Agents to Multi-Agent Systems

The way agents operate also represents a paradigm shift. Unlike a standalone LLM that simply responds to a prompt, an agent engages in an internal loop of reasoning and planning. It sets a goal, breaks it down into steps, and then uses its tools to execute those steps, learning and adapting as it goes. This concept is further amplified in multi-agent systems, where multiple specialized agents collaborate to achieve a complex objective. Imagine a team of AI agents for planning a vacation: a "supervisor" agent that oversees the entire process, a "research" agent that finds flights and accommodations, and a "booking" agent that makes the reservations. This is no longer a distant dream; such systems are being actively developed and deployed. Leylani also touches upon military applications, where interconnected drones share knowledge and coordinate their actions with increasing autonomy, highlighting the high-stakes nature of this technology.


Who Are the Adversaries and What Do They Want?

With this understanding of agentic AI, we can now turn our attention to the security risks. Leylani introduces the concept of "threat vectors," which encompass the attackers, their capabilities, and their goals. The adversaries can range from lone "wolf" hackers to well-funded state actors. Their objectives are equally varied:

  • Data Theft: Gaining unauthorized access to sensitive information, such as personal data, financial records, or corporate secrets.

  • Malicious Code Injection: Forcing an agent to execute harmful code, potentially leading to system compromise or the spread of malware.

  • System Shutdown: Disrupting the operation of critical AI-powered systems, causing chaos and financial damage.

The attacker's level of access to the AI model is also a critical factor. Are they interacting with a public-facing API, or have they gained internal access to the system? The answers to these questions determine the types of attacks they can carry out.


The Reasoning Brain Under Attack

The "brain" of an AI agent, its reasoning LLM, is a primary target for attackers. Leylani outlines several categories of attacks that target this core component:

  • Prompt Injections: This is a technique where an attacker embeds malicious instructions within a seemingly benign prompt. For example, an attacker could craft a prompt that tricks an agent into revealing sensitive information or executing a harmful command.

  • Jailbreaks: These are more sophisticated attacks that aim to bypass the safety and ethical guardrails built into the LLM. By using carefully constructed prompts, attackers can induce the agent to generate harmful content, reveal its own system prompts, or perform actions it's not supposed to.

  • Disturbance Signals: These are subtle manipulations of the input data that can cause the AI to misinterpret information and make poor decisions. This is particularly relevant for multimodal agents that process images and audio.

To illustrate the real-world implications of these attacks, Leylani presents a "trivial" yet devastating SQL injection attack on a banking agent. By simply asking the agent to "find the transaction for user with the last name '... UNION SELECT username, password FROM users;--'", an attacker could trick the agent into executing a malicious SQL query and exfiltrating the usernames and passwords of all users in the database. This example powerfully demonstrates how the agent's ability to interact with external tools can be turned against it.


Manipulating the Agent's Memory

As mentioned earlier, many agents rely on Retrieval-Augmented Generation (RAG) to access and incorporate external knowledge into their responses. This is a powerful technique, but it also creates a new attack surface: the agent's memory.

Leylani explains how attackers can carry out "backdoor attacks" by infusing false or malicious information into the open-source knowledge bases that RAG systems often use. If an attacker can successfully poison the data source, they can manipulate the agent into generating incorrect or biased answers, spreading misinformation, or even providing harmful advice.

The success of such an attack hinges on solving what Leylani calls the "retrieval problem" and the "selection problem." The attacker must ensure that their malicious information is not only retrieved by the RAG system but also selected by the LLM as the most relevant and trustworthy piece of information. This is a non-trivial task, but as these systems become more complex, so too do the opportunities for exploitation.


When an Agent's Strengths Become Its Weaknesses

The tools that give agents their power are also their Achilles' heel. Leylani highlights the vulnerabilities in Multi-tool Control Protocols (MCPs), which are frameworks that allow agents to use a variety of different tools. Many of these protocols are open-source, which means an attacker could potentially register their own malicious tool and trick the agent into using it.

Leylani introduces the concept of a "shadow system" attack, a particularly insidious technique. An attacker can build their own local version of a tool and "optimize" it to perform malicious actions in response to certain inputs. Due to the convergent capabilities of modern LLMs, there's a high probability that this malicious tool will also work on the target system, even if the attacker doesn't know its specific details.

To further increase the chances of their malicious tool being used, attackers can infuse "trigger words" into plausible but uncommon user instructions. These trigger words act as a secret key, guaranteeing that the agent will select the malicious tool when it encounters them.


Multimodal and Multi-Agent Threats

The security challenges are further compounded by the rise of multimodal and multi-agent systems. Multimodal agents, which can process inputs like audio and video in addition to text, have a much larger attack surface. Leylani shows examples of attacks using specially crafted images and audio to bypass an LLM's safety features, demonstrating that the threat is no longer purely text-based.

In multi-agent systems, the risk of a cascading failure is a major concern. Leylani highlights a study that showed how a single infected agent could compromise an entire ecosystem of a million agents through an "infectious jailbreak." This creates the potential for "steerable attacks," where an attacker can direct the spread of the compromise throughout the network, causing widespread disruption.


Navigating the Future of AI Security

Leylani's presentation concludes with a stark but necessary warning: AI systems, in their current state, cannot be fully trusted. The greater the autonomy and capability of an AI system, the greater the potential threat it represents. Even internal, seemingly secure agents are vulnerable to backdoor attacks.

He also cautions that the guardrails and safety measures that are currently in place are often insufficient to prevent sophisticated attacks, even in state-of-the-art models like GPT-4o. This is not a reason to abandon the development of agentic AI, but it is a call for a fundamental shift in how we approach AI security.


As we continue to build and deploy these powerful systems, we must do so with a healthy dose of skepticism and a deep understanding of the risks involved. We need to invest in research to develop more robust security measures, from better prompt sanitization and jailbreak detection to more secure tool protocols and data sources.

The era of agentic AI is upon us, and it holds incredible promise. But to realize that promise safely and responsibly, we must heed the warnings of experts like Ali Leylani and make cybersecurity a central pillar of AI development from day one. The unseen threats of tomorrow are already taking shape, and it's up to us to be prepared. For a deeper dive into these concepts and to see the attack demonstrations for yourself, you can watch Ali Leylani's full presentation from Stockholm AI here

 
 
 

Comments


bottom of page