DeepScience — Mental Health

DeepScience · Mental Health · Daily Digest

Using AI as a Friend Has Real Mental Health Costs

This week's research asks whether AI is helping or hurting us — and the answer turns entirely on how you use it.

            April 27, 2026
          

Today I have three stories for you, and they fit together better than I expected when I started reading this morning. Two of them are about AI and mental health — one zooms into your brain, one zooms out to the population level — and the third is a large study of children that quietly delivers one of the most actionable numbers I've seen in a while. Let's dig in.

Today's stories

              01 / 03
            

Using AI as a therapist, not a tool, looks bad for your brain

What if using AI like a calculator makes your brain stronger, while using it like a best friend quietly hollows something out?

Think of your brain like a muscle. Use a particular part repeatedly, and it tends to get a little thicker, a little better-connected over time. A team of researchers scanned the brains of 222 university students using high-resolution MRI, then cross-referenced those scans with surveys about how each student used AI tools. They split AI use into two buckets. Functional use: writing essays, solving problems, doing research. Socio-emotional use: chatting for companionship, venting, seeking emotional support from a chatbot. The contrast in findings was sharp. Students who used AI functionally had slightly larger volume in the dorsolateral prefrontal cortex — that's the region just behind your forehead that handles planning and deliberate thinking — plus better-connected hippocampal networks, which are involved in memory. They also had higher GPAs. Students who leaned on AI for emotional connection showed the opposite pattern: worse scores on depression and social anxiety measures, and lower gray matter volume in regions tied to social processing. Here is the catch, and it is a big one. This is a snapshot study — everyone was measured once. We cannot tell whether using AI for emotional support causes these brain differences, or whether people who are already socially anxious simply turn to AI for company. The biology and the behaviour could be running in either direction. Also, only 6.8% of students reported frequent socio-emotional AI use, so we're talking about a small subgroup. The association is real and worth watching. It would be premature, though, to conclude that chatting with a chatbot shrinks your brain.

Glossary

dorsolateral prefrontal cortex — A region at the front of your brain, just behind your forehead, involved in planning, decision-making, and focused thinking.

hippocampal network — A set of brain areas connected to the hippocampus, a structure deep in the brain central to memory and spatial navigation.

gray matter volume — A measure of how much brain tissue exists in a given region, often used as a rough proxy for how heavily that region is developed.

Source: Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use

              02 / 03
            

AI can fake a single depressed patient but gets entire populations wildly wrong

An AI can describe one perfectly plausible depressed patient — and simultaneously describe a whole population that looks nothing like real people.

Imagine asking someone to describe the crowd at a stadium. They might perfectly describe one believable fan — right age, right clothes, plausible history. But if they describe the entire crowd as eerily similar, everyone clustered around 'averagely depressed', with none of the very severe cases and none of the mild ones that actually make up real life — you'd know something was fundamentally broken. That is exactly what a research team found when they ran a large-scale audit of four major AI models — GPT-4o-mini, DeepSeek-V3, Gemini-3-Flash, and GLM-4.7 — used as psychiatric simulators. They generated 28,800 simulated patient profiles and compared them against two large real-world population health datasets, NHANES and NESARC-III. The AI models held up at the individual level: not a single one of the 28,714 cases violated the basic clinical gateway criteria for a diagnosis. Each fake patient sounded real. But at the population level, the simulation collapsed. The models compressed variance — their simulated populations clustered too tightly around average, squeezing out the extreme ends of the real distribution. Depression scores were overestimated by 3.6 to 6.1 points on the standard PHQ-9 scale for most groups. Transgender women's depression severity was severely underestimated. And 36.66% of simulated patients would receive a different diagnosis if you ran the same simulation again. The catch: this paper audits the problem but does not solve it. What makes this urgent is that AI-simulated patients are already being used right now to train clinical tools and test mental health interventions.

Glossary

epidemiological fidelity — How accurately a simulation matches the real distribution of a condition across a population, not just individual cases.

variance compression — When a simulation produces results that are too similar to each other, missing the spread of extreme cases that exist in reality.

PHQ-9 — A standard nine-question survey used to measure depression severity, with scores from 0 to 27.

test-retest reliability — Whether you get the same answer when you run the same test twice — a basic check on consistency.

Source: PsychBench: Auditing Epidemiological Fidelity in Large Language Model Mental Health Simulations

              03 / 03
            

Parental attention cuts kids' odds of trying substances by more than half

A genetic risk score nearly tripled the odds of a child trying substances — but parental monitoring cut those odds by more than half.

If you wanted to forecast which 9-year-old will try alcohol first, what would you look at? A research team working with the ABCD Study — a large ongoing US study that has been following 11,868 children since they were around 9 or 10 years old — ran exactly this analysis over four years of follow-up. Think of it like a weather forecast for risk: certain ingredients make a storm more likely, and certain things act like an umbrella. The genetic ingredient that mattered most was a polygenic risk score — essentially a tally of hundreds of tiny DNA variants — linked to nicotine dependence. Kids with high nicotine genetic risk were nearly three times more likely to initiate any substance. That is a real and notable effect. But here is what the causal part of the analysis — the more rigorous piece — showed: parental monitoring cut the odds of substance initiation by 36 to 67 percent, depending on the substance. That is a larger practical effect than the genetic score. Impulsivity traits and caffeine exposure also increased risk. By the end of follow-up, 36.5% of children had tried alcohol and 39.7% had tried at least one substance. The catch: these are still observational data, even with good causal methods applied. And 'parental monitoring' is a survey measure — it tells you the association exists, not exactly what to do. It doesn't mean hovering. Honestly, nobody yet knows which specific parenting behaviours are driving this effect, and that is the next question worth asking.

Glossary

polygenic risk score — A single number summarising the combined effect of hundreds or thousands of small DNA variants that each nudge risk slightly up or down.

marginal structural model — A statistical technique that tries to estimate what would happen if you intervened on something — closer to a causal answer than a plain correlation.

Cox proportional hazards model — A method for tracking when events happen over time across a group — here, who initiated substance use and when.

Source: Time-Varying Environmental and Polygenic Predictors of Substance Use Initiation in Youth: A Survival and Causal Modeling Study in the ABCD Cohort

The bigger picture

Three stories, one honest through-line: the tools for understanding mental health are getting sharper, but the distance between measuring something and knowing what to do about it is not closing as fast as the press releases suggest. The brain study says that the way you use a tool — as an instrument or as a substitute relationship — produces different biological footprints. The PsychBench audit says that AI's ability to simulate patients is already being deployed in clinical product development, and it is quietly wrong at the population level even when it sounds right one-on-one. And the substance-use study is a reminder that genetic risk, however real, does not determine outcomes — a parenting behaviour with a large, measurable effect size is sitting right there. None of these findings are final. All three are worth tracking. The thread connecting them is that mental health is increasingly being shaped by AI, and we are still in the early innings of understanding the consequences.

What to watch next

The PsychBench team has signalled that a community roadmap paper is forthcoming — that will be worth reading when it lands, since it will likely propose standards for when AI-simulated patients are and aren't acceptable for research. On the ABCD study front, release 6.0 is expected to extend the follow-up window further into adolescence, which is precisely when substance use rates start climbing steeply. And the most important open question from the AI-brain study is longitudinal: do the brain differences predate AI use, or does the behaviour shape the biology? We genuinely do not know yet.