DeepScience — Mental Health

DeepScience · Mental Health · Daily Digest

How you use AI shapes your brain — and your kids.

Three studies today connect AI habits, fake psychiatric patients, and what actually stops teenagers from starting drugs.

            April 24, 2026
          

Three stories today, and I think they fit together in a way that surprised me. We have a brain-scan study on how AI use affects your mental health, a stress-test of AI psychiatric simulations that reveals a population-level blind spot, and a large follow-up study of 11,000 children that quietly hands us one of the most actionable findings in youth mental health this year. Let me walk you through all three.

Today's stories

              01 / 03
            

Using AI as a study tool looks fine. Using it as a friend does not.

222 students had their brains scanned, and the question wasn't how much AI they used — it was what they used it for.

Researchers scanned the brains of 222 university students and asked them a simple survey: do you use AI chatbots to get things done — writing, studying, coding — or do you use them for emotional support and companionship? Think of the difference as using a knife to chop vegetables versus carrying it around for comfort. The tool works either way, but only one of those uses it as a tool. The task-users — the choppers — showed something measurable in their brain scans: larger gray matter volume in the dorsolateral prefrontal cortex, the region tied to planning, focus, and decision-making. They also had higher GPAs. The emotional-support users showed the opposite pattern: more symptoms of depression and social anxiety, and lower gray matter volume in regions associated with social processing. Why does this matter? Because hundreds of millions of people are using AI chatbots daily, in an almost entirely unmonitored way. This study is one of the first to look inside people's heads — literally — and detect a difference that correlates with how they're using the tool. The catch is real, though. This is a cross-sectional study: one snapshot, not a timeline. Researchers cannot tell you which came first. Did task-based use strengthen the prefrontal cortex? Or did people who already had stronger planning brains naturally drift toward task-based use? Equally, maybe people who were already anxious or depressed turned to AI for emotional comfort — rather than the other way around. With 222 students, the sample is small. I'd treat the brain-size numbers as a first signal, not a settled fact.

Glossary

dorsolateral prefrontal cortex — A region toward the front and sides of your brain involved in planning, focus, and working memory — roughly the brain's project manager.

gray matter volume — The amount of brain tissue in a given region, often used as a rough proxy for how active or developed that area is.

cross-sectional study — A study that measures everyone at one point in time, like a photograph — it shows correlations but can't prove what caused what.

Source: Mapping generative AI use in the human brain: divergent neural, academic, and mental health profiles of functional versus socio emotional AI use

              02 / 03
            

AI simulated 29,000 psychiatric patients — and got the crowd completely wrong.

Every single simulated patient followed diagnostic logic perfectly — and yet the entire population was a statistical fiction.

A team built a benchmark called PsychBench and put four major AI models — GPT-4o-mini, DeepSeek-V3, Gemini-3-Flash, and GLM-4.7 — through a demanding test: generate thousands of realistic psychiatric patient profiles, then compare those profiles against what real population surveys actually show. The individual profiles looked great. Across 28,800 simulated cases, not one violated basic diagnostic logic. If a diagnosis requires symptom A and symptom B, the AI got that right every single time. Think of it like a student who writes grammatically flawless sentences — but whose essay, read as a whole, makes no coherent argument. The problem showed up when you zoomed out to the population. DeepSeek-V3 compressed the range of patient severity by 62%, essentially erasing the very mild cases and the very severe ones, producing a suspiciously average crowd. All four models systematically overestimated depression severity — by 3 to 6 points on a standard clinical scale — for most demographic groups. And for transgender women specifically, they underestimated by more than 5 points, missing a documented mental health burden almost entirely. Meanwhile, 36.66% of cases crossed a diagnostic threshold between one test run and the next, even though the correlation between runs looked high. Why does this matter? Researchers and developers are already using AI-generated patient profiles to train mental health chatbots, stress-test clinical tools, and run simulations. If those profiles are statistically wrong, everything built on top of them is built on a fiction. The honest limit: this audit tested specific models with specific prompts. Better prompting or newer models might reduce the bias. Nobody knows yet.

Glossary

variance compression — When a model produces outputs that cluster too tightly around the average, eliminating the extremes that exist in real populations.

diagnostic threshold — The score on a clinical scale at which a patient qualifies for a formal diagnosis — like a cutoff on a thermometer.

epidemiological fidelity — How accurately a simulated population mirrors the real distribution of symptoms, demographics, and severity seen in actual patients.

Source: PsychBench: Auditing Epidemiological Fidelity in Large Language Model Mental Health Simulations

              03 / 03
            

Knowing where your teenager is at 9pm cuts drug initiation risk by two-thirds.

Nearly 12,000 American children were followed from age 9, and by year four, four in ten had tried a substance — so what actually predicted who went first?

A team used data from the ABCD Study — one of the largest child development studies in US history — to track 11,868 children starting at around age nine, checking in annually for four years. By the end, 39.7% had tried at least one substance, mostly alcohol. Researchers looked at two kinds of predictors: genes and environment. Think of it like predicting whether a campfire will spread. Your genetic risk is the wind direction — it matters, and you cannot change it. Parental monitoring is the firebreak — something you can actually build. In the causal analysis, which used a statistical technique called marginal structural models to try to isolate what's actually driving what (not just correlate), strong parental monitoring cut the odds of substance initiation by roughly 33 to 67%. Impulsivity — a measurable trait — and caffeine use both increased the odds significantly. Genetic risk scores for nicotine dependence roughly tripled the odds of trying any substance at all. Why does this matter? It gives prevention programs a cleaner target list. You cannot change a child's genes. You can work on impulsivity through skill-building programs. You can support parental monitoring. Neither idea is new — but this study is unusually rigorous: 11,868 kids, real follow-up time, and a modeling approach that goes beyond simple correlation. The catch: 'causal' here is statistical, not experimental. Parental monitoring might partly be a stand-in for family income or stability. And the study only reaches early adolescence — the riskiest years are still ahead for these kids. The ABCD cohort is still running, and more follow-up waves are coming.

Glossary

polygenic risk score — A single number summarizing hundreds of small genetic variants to estimate your inherited likelihood of a trait or condition.

marginal structural models — A statistical method that tries to isolate the causal effect of a factor by mathematically accounting for other things that might have caused it.

ABCD Study — The Adolescent Brain Cognitive Development Study, a large ongoing US study tracking nearly 12,000 children from childhood into adolescence.

Source: Time-Varying Environmental and Polygenic Predictors of Substance Use Initiation in Youth: A Survival and Causal Modeling Study in the ABCD Cohort

The bigger picture

Three stories today, and they rhyme in an uncomfortable way. One study shows that how you personally use AI shapes your brain and your grades. Another reveals that AI systems tasked with simulating human patients get the individual right and the population completely wrong. And a third quietly hands us one of the most actionable findings in youth mental health this year: a parent who knows where their teenager is cuts substance initiation risk by two-thirds. Here is the position I'd take. Mental health research is getting very good at measuring things — brain scans, Reddit posts, 29,000 simulated patients — and is still struggling to change things at scale. The ABCD study points at a lever that works and that families can actually pull. The AI studies point at a tool being deployed before we understand it, in both directions: as a daily habit shaping real brains, and as a simulation engine producing plausible fictions. These are not separate stories. They are the same tension, showing up in different labs.

What to watch next

The ABCD Study has more follow-up waves coming as these children move through mid-adolescence — the period this study doesn't yet cover — and those results will tell us whether the protective effect of parental monitoring holds as kids get older and harder to monitor. On the AI-simulation side, the PsychBench team has laid down a methodology that other labs can now apply to newer models; if GPT-5-class systems repeat the same population-level distortions, that becomes a much harder problem to explain away. The open question I'd most want answered: does the brain-structure difference seen in the AI use study replicate in a larger sample, and does it show up in longitudinal data?