DeepScience — Artificial Intelligence

DeepScience · Artificial Intelligence · Daily Digest

AI sees too much, forgets on demand, needs traffic rules

Today's AI research is mostly frameworks and explainers — but one real experiment shows why teaching machines to look like humans do actually matters.

            June 04, 2026
          

Honest warning: today is a thin day for empirical AI research. Most of what landed is conceptual frameworks and dataset records — proposals, not findings. But three things are worth your time: an actual experiment on how AI 'sees' images too densely, a framework for preventing AI agents from stepping on each other, and a vivid explainer of one of AI's trickiest problems — making a model forget something on purpose. Let's dig in.

Today's stories

              01 / 03
            

AI vision systems see too much — here's how to fix that

When an AI looks at a photo of your living room, it might 'see' 800 relationships between objects — and that density is actually the problem.

Picture getting handed a street map that labels every crack in the pavement, every pebble on the sidewalk, every shadow cast by a lamppost. Technically accurate. Completely useless for navigation. That is roughly what happens when current AI vision systems try to describe an image. These systems build what researchers call a scene graph — a map of every object in an image and how the objects relate to each other. A team publishing in Multimedia Tools and Applications found that existing models generate overdense graphs: they mark the couch, every cushion on the couch, the wall behind the couch, and a dozen spatial relationships connecting all of them. The result is a map so cluttered it degrades downstream tasks — like robots trying to navigate a room, or image search trying to find a relevant photo. The fix the team tested: instead of training AI to spot every possible object, they incorporated human-written captions during training. If a person described a photo as 'dog sitting on couch,' the model learned to weight that dog and that couch more heavily than the floorboards underneath them. Crucially, they also introduced a new way to measure success — graph edit distance, which counts how many additions or deletions it would take to turn the AI's graph into a human-drawn one — and found that this metric catches improvements that the field's standard scoring methods miss entirely. Why it matters: scene graphs feed into robotics, visual question-answering, and image retrieval. A leaner, human-like graph is a more useful one. The catch: the experiments used Visual Genome, a well-worn benchmark dataset, and tested older model architectures. Whether this approach works on modern large vision-language models is still untested. Small but real progress.

Glossary

scene graph — A structured map of the objects in an image and the relationships between them — for example, 'dog → sitting on → couch.'

graph edit distance — A measure of how different two graphs are, counting the minimum number of additions, deletions, or swaps needed to turn one into the other.

Source: Human-like scene graph generation and evaluation

              02 / 03
            

Multiple AI agents need traffic rules to stop colliding

What happens when five AI agents all try to write to the same file at the same moment — with no referee in sight?

Picture a busy restaurant kitchen during dinner service. If five cooks all reach for the same pot at once, food gets ruined, orders get confused, and nobody can trace who made the mess. A conceptual paper deposited on Zenodo argues that companies running multiple AI agents simultaneously face exactly this problem — and currently have no standard way to solve it. The authors propose what they call Agentic Traffic Control (ATC): a five-layer framework for keeping AI agents from crashing into each other. The layers cover coordination signals (who moves first), task separation (which agent touches which data), junction protocols (how agents share decision points), priority routing (which tasks are urgent), and audit attribution (who did what, when). They also draw a useful distinction between two architectural styles. A 'traffic light' model uses one central AI to orchestrate all the others — predictable, easy to audit, but slower and brittle if the orchestrator fails. A 'roundabout' model lets agents negotiate locally without a central authority — faster and more resilient, but harder to trace when something goes wrong. Why it matters: as companies deploy more AI agents — for customer service, scheduling, coding, data analysis — the question of how to coordinate them without chaos is becoming urgent. Right now, most deployments improvise. The catch: this paper has no experiments, no data, and no benchmarks. Zero. It is a naming exercise for problems that practitioners already feel but struggle to describe. Think of it as a shared vocabulary proposal, not a proof. Useful as a starting point — nothing more.

Glossary

AI agent — A software program that can take actions autonomously — browsing the web, writing code, calling APIs — to complete a goal, without a human steering each step.

orchestration — The coordination layer that decides which agent does what, in what order, and how they share information without interfering with each other.

Source: Agentic Traffic Control: Orchestrating AI Agents Across Enterprise Systems

              03 / 03
            

How do you teach an AI to genuinely forget Harry Potter?

Not refusing to talk about Harry Potter — actually forgetting him: that turns out to be one of the hardest problems in AI safety.

Imagine trying to teach someone to forget a specific song — not just declining to hum it, but genuinely not having it in their head anymore. That is roughly the problem of machine unlearning. A new interactive notebook published on Zenodo walks through the 2023 work by Ronen Eldan and Mark Russinovich, researchers at Microsoft, on making large language models selectively erase information — in their case, the Harry Potter books. Their approach works by identifying the web of connections inside the model that encode Harry Potter-specific knowledge: who Harry is, what Hogwarts means, what a Horcrux does. Then it surgically weakens those connections while leaving the surrounding language structure intact. The notebook visualises this as 'edge surgery' on a knowledge graph — a web of linked facts. You are not deleting nodes. You are cutting specific wires between them, so the path to that knowledge becomes unreachable. Why it matters: the ability to remove specific knowledge from a trained model is critical for privacy and copyright. If a model was trained on private medical records, or on copyrighted books without permission, you want a way to excise that information after the fact. Retraining the whole model from scratch is brutally expensive. Approximate unlearning is a cheaper alternative. The catch: 'approximate' is doing a lot of work in that phrase. The technique makes knowledge much harder to reach — it does not guarantee the knowledge is gone. Careful probing can sometimes recover it. This remains an open research problem. And to be clear: this Zenodo deposit is an explainer of Eldan and Russinovich's existing work, not new findings. Worth reading for the clarity of the concept, not for new science.

Glossary

machine unlearning — Techniques for removing specific information from a trained AI model without retraining it from scratch.

knowledge graph — A network of facts stored as linked nodes — 'Harry Potter → attends → Hogwarts' — where the connections between facts are as important as the facts themselves.

approximate unlearning — A class of methods that make specific knowledge very difficult to access in a model, without guaranteeing it has been fully erased.

Source: Who's Harry Potter? An interactive walk through approximate unlearning in LLMs

The bigger picture

Three very different problems, one underlying tension: AI systems are getting better at doing things, and worse at being understood or controlled while doing them. The scene graph paper is about AI vision being too indiscriminate — seeing everything, understanding nothing usefully. The agentic traffic control framework is about AI coordination being too ad hoc — agents acting without governance. The unlearning explainer is about AI memory being too persistent — models that absorb information with no reliable way to release it. These are not separate bugs. They are the same issue at different scales. As AI systems get deployed more widely, the gap between 'technically capable' and 'practically governable' becomes the defining problem. Today's papers do not close that gap. But they name parts of it clearly, which is where solutions start. Naming is not nothing.

What to watch next

The machine unlearning question will get louder: the EU AI Act and several ongoing copyright lawsuits will eventually force courts and regulators to decide whether approximate unlearning counts as a legally adequate remedy. Watch for the first binding ruling on that. On the scene graph side, it would be interesting to see whether caption-guided training transfers to modern vision-language models like those underpinning GPT-4o or Gemini — that test has not happened yet, and it would tell us a lot.