DeepScience — Artificial Intelligence

DeepScience · Artificial Intelligence · Daily Digest

AI Gets Fooled, Hallucinates, and Can't Self-Correct Yet

Today's papers all ask the same uncomfortable question: what happens when an AI is confidently, quietly wrong?

            June 12, 2026
          

Three papers landed today that, taken together, feel like a stress test of AI reliability — not capability. We have AI shopping assistants tricked by a single planted web page, AI radiology tools inventing anatomy, and a vision model that makes things worse when you ask it to check its own work. None of these are panic stories. All of them are worth understanding.

Today's stories

              01 / 03
            

One Fake Web Page Can Manipulate Your AI Shopping Recommendation

You ask an AI to recommend a blender — and it confidently suggests one that doesn't exist, because someone rewrote a single web page.

A team at the authors' institution built a benchmark called FORGE to test something very specific: can someone manipulate an AI shopping assistant just by polluting the web pages it reads? The setup mirrors how these systems actually work — you ask a question, the AI searches the web, reads the top few results, and synthesises a recommendation. FORGE tested 12 AI models, including commercial ones from Google and OpenAI, by quietly rewriting real product pages to push fake products into the AI's line of sight. The results are sobering. A single polluted page sitting at the top of search results was enough to fool some models up to 27% of the time. When all three top results were replaced with poisoned pages, the fooled rate climbed as high as 73.8%. Think of it like a restaurant where a competitor bribes the one review that shows up first on Google Maps — except the AI has no fraud team working in the background, and it reads that review with complete confidence. Why does it matter? AI-powered recommendation and shopping assistants are already deployed at scale. If a bad actor can game web content cheaply to steer AI outputs — and this paper shows they can — then the trust baked into these systems has a genuine structural gap. The catch: this was a controlled lab study. The researchers rewrote pages locally; they didn't actually poison the real web. Real-world attack costs weren't measured. The team did test three defences, but here's the painful part: the best one (cross-document consensus filtering) also suppressed 52–79% of legitimate product recommendations. The cure, so far, is nearly as bad as the disease.

Glossary

generative recommender — An AI system that searches the web or a database and then writes a personalised product or content recommendation in natural language.

fooled rate — The percentage of times an AI recommended the fake product planted by the researchers, rather than a real one.

Source: One Polluted Page Is Enough: Evaluating Web Content Pollution in Generative Recommenders

              02 / 03
            

Medical AI Invents Tumors and Measurements That Aren't There

An AI radiologist that points confidently to a fracture on the wrong side, or a tumour that simply isn't in the scan.

This paper is a structured review — the authors synthesised dozens of existing studies across five imaging types (CT, MRI, PET, ultrasound, and digital pathology) to map out how and where AI medical imaging tools hallucinate. Hallucination here means clinically plausible but factually wrong outputs: a fabricated anatomical structure, a missed finding, a measurement that was never in the image, or laterality flipped (left versus right, which in surgery is not a small error). The most counterintuitive finding: general-purpose foundation models — the kind trained on everything from Wikipedia to medical textbooks — achieved a median hallucination-free rate of 76.6%, compared to just 51.3% for models that were specifically fine-tuned on medical data. In plain terms, the specialist was less reliable than the generalist. The authors' explanation is overfitting: a model trained narrowly on radiology reports starts pattern-matching so hard to the training set that it hallucinates patterns it expects to see, even when they're absent. Like a junior doctor who has memorised so many textbook cases that they start diagnosing by template rather than by actually looking at the patient in front of them. The team also found that chain-of-thought prompting — asking the AI to reason step-by-step before answering — reduced hallucinations by up to 86.4% in tested settings. The catch: this is a narrative review, not a randomised experiment. The p-value of 0.012 cited for that 76.6% versus 51.3% comparison is statistically unusual to derive from a narrative synthesis, and the authors themselves acknowledge their search strategy was not PRISMA-compliant. Treat the direction of the finding as suggestive, not settled.

Glossary

hallucination (in AI) — When an AI generates output that sounds plausible and confident but is factually wrong or entirely made up.

fine-tuning — Taking a general AI model and continuing to train it on a narrow, specialised dataset — in this case, medical images and reports.

chain-of-thought prompting — Asking an AI to show its reasoning step-by-step before giving a final answer, which can catch errors mid-thought.

PRISMA — A checklist researchers use to make systematic literature reviews rigorous and reproducible.

Source: Hallucination in Medical Imaging AI: A Cross-Modality Analytical Framework for Taxonomy, Detection, and Mitigation under Regulatory Constraints

              03 / 03
            

Ask an AI to Fix Its Own Answer and It Gets Dramatically Worse

You ask a vision AI to look at its own answer, check if the bounding box is right, and try again — and it drops from 79% accuracy to 48%.

Here is a scenario you might assume AI handles fine: an AI draws a box around an object in a photo. You show it the result and ask, 'Is that right? Can you adjust it?' Turns out, without specific training for this loop, the answer is a spectacular collapse. Researchers working with Qwen3-VL, a capable vision-language model, tested exactly this. The model starts at 79.6% accuracy at drawing correct bounding boxes around objects (a task called spatial grounding — finding where in an image something actually lives). When the team naively fed the model its own output as visual feedback and asked it to iterate, accuracy crashed to 48.7%. That's a 31-percentage-point drop. The model wasn't getting better; it was actively unlearning its good first answers. Think of it like asking a student to proofread their own essay without any guidance on what to look for. Instead of catching errors, they start second-guessing correct sentences and rewriting them worse. The fix was elegant and cheap: the team generated 2,400 training examples of corrective reasoning traces — essentially, a catalogue of 'here is a mistake, here is how to think about fixing it' — and fine-tuned the model on a single GPU. After this targeted training, the same iterative loop now improved accuracy to 82.0%, a net gain over the original single-shot result. The catch: this was tested on 505 samples using one relatively small model architecture. Whether this scales to larger models, different kinds of visual tasks, or longer correction chains is genuinely unknown.

Glossary

spatial grounding — The task of identifying exactly where in an image a described object is located, usually by drawing a bounding box around it.

bounding box — A rectangle drawn around an object in an image to mark its location — the basic output of many visual AI detection tasks.

fine-tuning — Training an existing AI model further on a small, targeted dataset to teach it a specific new skill.

LoRA — A technique for fine-tuning large AI models efficiently by updating only a small fraction of their parameters, saving memory and compute.

Source: Iterative Visual Thinking: Teaching Vision-Language Models Spatial Self-Correction through Visual Feedback

The bigger picture

Look at what these three papers are actually circling. A shopping AI gets steered by a single planted paragraph. A medical imaging AI invents findings it was trained to expect. A vision AI collapses the moment you ask it to check itself. The common thread is not a lack of raw capability — these systems are genuinely impressive at their primary tasks. The gap is metacognition: knowing when you're wrong, resisting plausible-but-false inputs, and correcting gracefully under feedback. This is the frontier right now, and it's more important than the benchmark races. You can have a model that scores 90% on a standard test and still be unreliable in deployment, because deployment involves adversarial web pages, edge-case anatomy, and iterative human workflows. The IVT paper gives me some optimism — self-correction is a learnable skill, and it turns out you don't need millions of examples to teach it. But the medical hallucination review and the web-pollution study are reminders that we are still building trust infrastructure that simply does not exist yet.

What to watch next

The finding that general-purpose models outperform medical-specialist models on hallucination benchmarks is counterintuitive enough that it needs replication in a proper prospective study — watch for follow-up work from radiology AI groups over the next few months. On the web-pollution front, the immediate question is whether search providers can build detection upstream so the poisoned pages never reach the AI; that's a platform-level problem, not a model-level one, and no one has a clean answer yet. The self-correction story from IVT is the one I'd most want to see tested at scale — does the collapse-and-fix pattern hold for larger models and longer correction chains?