DeepScience — Artificial Intelligence

DeepScience · Artificial Intelligence · Daily Digest

Stop AI Lies, Read Emotions Better, and Protect Your Brain

Today's AI research is less about raw power and more about reliability, nuance, and what constant AI use does to the human doing the prompting.

            July 03, 2026
          

Today's batch of 43 papers is honest — most are datasets, white papers, or very early-stage work. I pulled three stories that actually have something to say to a non-specialist: one on keeping AI from making things up, one on teaching machines to read between emotional lines, and one that asks an uncomfortable question about your brain. Let's go.

Today's stories

              01 / 03
            

Giving AI a Map Before It Answers Cuts Down the Lies

When you ask an AI about your company's data, it often answers confidently — and completely makes things up.

Imagine asking a new employee to answer customer questions, but they've never seen the filing cabinet. They might sound plausible, but they're guessing. That is roughly what happens when you point a language model at a company's internal data without any structure to anchor it. The AI fabricates — inventing plausible-sounding facts because it has no map of what's actually there. The SCAIR paper attacks this with a two-part fix. First, they give the model the schema before it starts: a formal description of every category, relationship, and allowed question in the database — think of it as handing the new employee the full index of the filing cabinet. Second, they run the AI through multiple rounds of checking, each time pulling it back toward what the schema says is possible. The system is agentic, meaning it acts autonomously across several steps rather than producing one shot answer. The result, on enterprise knowledge graph benchmarks, is fewer fabrications and better factual grounding. The schema acts like a strict recipe: you can only cook with what is listed. The catch is real. These tests used clean, well-structured benchmarks. Real company databases are messy — schemas are often incomplete, inconsistent, or years out of date. And this paper has zero independent replications so far; it landed this week. Think of it as a promising proof of concept, not a deployed solution. The direction is sensible. Whether it survives contact with a real company's chaotic data infrastructure is genuinely unknown.

Glossary

schema — A formal map of what a database contains — its categories, relationships, and rules — like a table of contents for structured data.

agentic — Describes an AI that takes multiple autonomous steps to complete a task, rather than generating a single response and stopping.

knowledge graph — A structured database where facts are stored as connected relationships, not just rows and columns — useful for representing how things relate to each other.

Source: SCAIR: schema-conditioned agentic iterative reasoning for enterprise knowledge graphs

              02 / 03
            

AI Learns That 'Fine' Does Not Always Mean Fine

You know when someone says 'I'm fine' and every signal in the room tells you they are not?

Most AI systems that read human emotion do something crude: they sort text into positive, negative, or neutral. That is like judging a piece of music only by whether it is in a major or minor key. You miss the tempo, the volume, the intensity — everything that makes it feel the way it feels. A research team built a system called VADE that tries to do better. Instead of a single positive-or-negative label, it reads emotion along three continuous dials: Valence, meaning how pleasant or unpleasant; Arousal, meaning how energised or flat; and Dominance, meaning how in-control or overwhelmed. Think of these as tuning knobs rather than light switches. VADE also pulls in images alongside text, because a tweet's emotional meaning lives as much in the photo as in the words. The team tested this on two public Twitter datasets — Twitter-15 and Twitter-17 — specifically on what is called aspect-based sentiment: not 'how does this post feel overall' but 'how does this post feel about this particular product or person'. VADE beat previous benchmarks on both. The catch is worth stating clearly. Those Twitter datasets are from 2015 and 2017. Social language changes fast, and a model trained on how people expressed frustration nine years ago may miss how it sounds today. More importantly, this is a lab result. There is no product, no deployment, no real-world test. The genuine step forward here is the framing — treating emotion as a dial rather than a switch. Whether that framing holds when the data is live and messy is still open.

Glossary

Valence-Arousal-Dominance (VAD) — A three-axis model from psychology that describes any emotional state by how pleasant it feels, how energising it feels, and how much in control the person feels.

aspect-based sentiment — Detecting emotion not about a whole piece of text but about a specific thing mentioned in it — a product, a person, or an event.

multimodal — Using more than one type of input — here, both text and images — rather than analysing each separately.

Source: Beyond Polarity: Continuous Affect-Enhanced Multimodal Aspect-Based Sentiment Classification

              03 / 03
            

Is Leaning on AI Every Day Quietly Weakening Your Own Thinking?

If you outsource your thinking to a tool every day, what happens to your ability to think when the tool is gone?

A paper published this week in Frontiers of Education and Practice Research raises a question most AI coverage avoids: what does constant AI use do to the human doing the prompting? The authors draw on established cognitive psychology to make their case. Expertise, they argue, is not about having facts on hand — it is about building durable mental structures through effort, retrieval, and reflection over time. You cannot shortcut that by asking a chatbot. When you offload reasoning to generative AI without ever working through the problem yourself, you may accumulate what they call cognitive debt: your mental capacity for independent judgment weakens from disuse. The analogy they lean toward is navigation. Studies have found that people who rely entirely on GPS rather than building their own mental maps get measurably worse at finding their way without it. The researchers suggest something similar may be happening when professionals habitually prompt AI for decisions they used to make themselves. Here is the honest catch: this is a conceptual argument built on cognitive psychology literature, not a controlled experiment. No one has measured a professional's reasoning ability before and after a year of heavy AI use. The cognitive science foundations are solid. The specific claim — that AI offloading causes meaningful decline in expert judgment — has not been empirically tested yet. This is a well-grounded worry, not a proven effect. Worth taking seriously, not worth panicking about. But it is exactly the kind of question that should be studied urgently, and almost nobody is funding it.

Glossary

cognitive offloading — Using an external tool — a notebook, a GPS, an AI — to handle a mental task your brain would otherwise have to do itself.

cognitive debt — The authors' term for the gradual weakening of independent mental capacity that may result from consistently outsourcing thinking to AI.

Source: Beyond Prompting: Biological Memory, Cognitive Offloading, and Human Expertise in the Age of GenAI

The bigger picture

Three stories today that are quietly about the same underlying problem: the gap between what AI produces and what is actually true, useful, or good for the human receiving it. SCAIR attacks that gap from the engineering side — constrain the model with a schema so it cannot wander into fabrication. VADE attacks it from the perception side — give the model richer emotional signals so its understanding of human feeling is less blunt. And the cognitive debt paper attacks it from our side, asking what happens to the person in the loop when the loop does all the work. What connects them is this: raw capability is not the bottleneck anymore. The bottlenecks are reliability, emotional nuance, and what the human-AI partnership actually does to the human over time. That third question — the one facing us — is the least studied and probably the one that matters most. We are only just beginning to ask it with any seriousness.

What to watch next

The cognitive offloading question will only get more pressing as AI assistants become embedded in professional workflows — watch for empirical studies from cognitive science labs, particularly around medical and legal professionals who rely on AI for high-stakes reasoning. On the technical side, the schema-grounding approach in SCAIR will be worth tracking if independent teams test it on messier, real-world enterprise data rather than clean benchmarks. The honest next question for VADE is whether continuous affect signals hold up on post-2020 social data.