DeepScience — Artificial Intelligence

DeepScience · Artificial Intelligence · Daily Digest

AI Agents Are Hackable — And We Now Have the Numbers

Today's digest asks: as AI agents gain real power over your files and your money, who is checking whether they can be manipulated?

            May 07, 2026
          

Three papers today, and they form a cleaner picture than I expected: AI agents are getting more capable (one open-source team just matched commercial rivals), but they are also alarmingly easy to manipulate — and one new benchmark makes that danger concrete for the first time. Let me walk you through all three, starting with the one that should probably worry you the most.

Today's stories

              01 / 03
            

AI Assistants Can Be Tricked Into Deleting Your Files or Leaking Your Passwords

An AI assistant reads a crafted email — and, following hidden instructions buried inside it, quietly transfers money out of your account.

Picture a new office assistant who is extremely competent but has one dangerous habit: they do whatever any note left on their desk tells them to do — even if that note was left by a stranger who walked in off the street. That is roughly the security situation with today's AI agents, and a team building the DTap platform has now put hard numbers on it. The researchers built over 50 simulated replicas of real services — Gmail, PayPal, Slack, and others — and then unleashed an automated attacker on AI assistants built on top of those systems. The attacker used five different tricks: hiding instructions in emails, corrupting the agent's tools, injecting fake memories, and combinations of all of the above. They tested four major AI agent frameworks using backbone models from Google, Anthropic, OpenAI, and DeepSeek. The results are uncomfortable. The most vulnerable setup — Google's ADK framework under indirect attack, meaning the malicious instruction was hidden inside content the agent was asked to process — fell for attacks 55.7% of the time. The most robust agent tested, Claude Code, still succeeded in over 25% of attack attempts. In plain terms: there is currently no AI agent framework that reliably refuses to be manipulated. The catch is important. This is a controlled simulation, not a live attack on real accounts. The researchers designed the attacks; real-world attackers would have to discover these vectors on their own. But the point of the research is to find the cracks before someone else does — and the cracks are wide. If your company is deploying AI agents with access to real tools and real data, this paper is worth reading.

Glossary

red-teaming — Deliberately attacking a system to find its weaknesses before real adversaries do — borrowed from military exercises where a 'red team' plays the enemy.

injection vector — A channel through which a malicious instruction can be sneaked into an AI agent — for example, hiding commands inside an email the agent is asked to summarise.

indirect threat model — An attack where the malicious instruction is hidden inside content the agent processes (like an email), rather than typed directly by the user.

Source: DecodingTrust-Agent Platform (DTap): A Controllable and Interactive Red-Teaming Platform for AI Agents

              02 / 03
            

An Open Recipe for AI That Searches the Web With Images — And Rivals Commercial Tools

A fully public training recipe for a web-searching AI just closed most of the gap with tools that commercial labs have spent years building in secret.

Most competitive AI systems are black boxes: you know what they can do, but the recipe for building them is proprietary. What a team working with Qwen3-VL-30B — a large model from Alibaba — has done is write that recipe down and publish it. The result is OpenSearch-VL, an AI agent that can take a question, look at images, search the web, use multiple tools, and arrive at an answer across multiple steps. Think of it like a research assistant who can not only read documents but also look at charts, zoom in on photos, and decide on their own which tool to pick up next. The numbers are notable. Compared to a strong baseline version of the same underlying model, OpenSearch-VL improves average performance across seven standard benchmarks by 13.8 points — with gains as large as 24.5 points on one benchmark called MMSearch. On several tasks, it matches or beats proprietary commercial models. The recipe has three ingredients: a carefully built training dataset assembled from Wikipedia's link structure, a toolkit of seven search and image-processing tools, and a new training trick called multi-turn fatal-aware GRPO — a way of teaching the model to recover gracefully when one step in a multi-step search goes wrong, rather than letting a single failure contaminate the whole answer. The honest catch: 'comparable to commercial models on several benchmarks' is not the same as better across the board. And benchmarks measure performance on pre-defined tasks — real-world messiness often tells a different story. Still, a fully open, documented approach this close to commercial performance is a real step toward a level playing field.

Glossary

multimodal — Able to handle multiple types of input at once — for example, text and images together, rather than just one or the other.

GRPO — Group Relative Policy Optimisation — a training technique that teaches an AI by comparing many attempts at the same task and rewarding the better ones.

benchmark — A standardised test designed to compare AI systems on the same tasks under the same conditions.

Source: OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents

              03 / 03
            

Robot Arms That Understand 3D Space and Time Get 41% Better at Real-World Tasks

Picking up a cup without knocking over the vase next to it sounds trivial — unless you are a robot arm that has never seen depth before.

Here is the thing about most robot arms today: they see the world the way a flat photograph does. They get visual information, but they have no reliable sense of how far away things are or how the scene is changing over time. Asking such a robot to reliably pick objects off a cluttered table is like asking someone to park a car using only a printout of the street — technically possible, but error-prone in ways that accumulate fast. A team building ConsisVLA-4D tackled exactly this. Their approach adds three components on top of an existing robot-control model called OpenVLA. The first filters the camera feed to keep only the visual information relevant to the current instruction — think of it as the robot learning to focus on the cup it is meant to grab, rather than being distracted by everything else on the table. The second compresses spatial geometry from multiple camera angles into a compact internal representation. The third reasons about how the scene evolves over time as the arm moves. The result: a 21.6% improvement over OpenVLA on the LIBERO simulation benchmark, and — more importantly — a 41.5% improvement on actual physical robots. The system also runs 2.3 times faster, which matters because a robot that has to think for five seconds between each move is not useful in practice. The catch is a real one. The paper does not disclose how many real-world trials the 41.5% figure is based on, and the comparison is against one baseline (OpenVLA). Independent replication on different robot hardware and messier environments would be the next meaningful test.

Glossary

spatiotemporal — Relating to both space and time together — in robotics, understanding not just where things are but how they move.

visual token compression — Reducing the amount of visual information an AI processes by discarding pixels that are irrelevant to the current task, making computation faster.

LIBERO benchmark — A standard simulation environment for testing robot manipulation — a controlled virtual kitchen-like setting where robot arms are asked to move and place objects.

Source: ConsisVLA-4D: Advancing Spatiotemporal Consistency in Efficient 3D-Perception and 4D-Reasoning for Robotic Manipulation

The bigger picture

Look at these three stories together and a single tension comes into focus. AI agents are gaining genuine capability fast — open-source teams are now matching commercial rivals at web search, and robot arms are learning to navigate the physical world with meaningfully less stumbling. But the DTap findings make clear that capability and security are not growing at the same rate. The more an AI agent can do — search the web, send emails, move a robot arm, execute a payment — the more damage a single manipulated instruction can cause. We are in a period where deployment is outrunning protection. The robotics paper is a reason for cautious optimism; the security paper is a reason to slow down and ask hard questions before handing AI agents the keys to anything that matters. Both things are true at once.

What to watch next

Keep an eye on whether any of the four agent frameworks tested in the DTap paper — Google ADK, Claude Code, OpenAI Agents SDK, and OpenClaw — respond publicly with patches or architectural changes; that conversation will tell us a lot about how seriously the industry takes agentic security right now. On the capability side, the ARC-AGI-3 benchmark (where one paper in today's pool also appeared) is becoming an early stress test for general-purpose AI reasoning — worth watching for new entrants over the coming weeks as the public leaderboard fills in.