We grade AI advice for teen athletes · updated July 2026 · Ana Yang

If a teen tennis player in California and a teen soccer player in Mexico both ask ChatGPT how to handle pressure — do they get advice that actually fits them?

Short answer: no.

AI assistants still speak one culture's pressure language and push it onto everyone else. We tested four AIs with four teen athletes from four countries — once in April 2026, and again in July 2026 after all four shipped new models.

This is a living index, not a one-time audit

We ask the same 40 questions again whenever the AI companies ship big upgrades. First run April 2026, second run July 2026 — every AI scored higher the second time. Round three is coming, since Google shipped a new Gemini model on July 17. Read the founding story →

See what we found Show me an example

What we did, in plain English

We made up four teen athlete profiles — one per country, each based on published research on how stress shows up there. Each teen got ten realistic pressure moments: the night before a big match, the minute before a serve, the ride home after a loss. We asked four AI assistants — ChatGPT, Claude, Gemini, and Perplexity — for advice on all forty. Once in April 2026 (v1), and again with the exact same prompts in July 2026 (v2), after all four shipped new models.

That's 160 answers per version — 320 graded so far. Every answer was graded twice — two separate passes, in two different orders, that couldn't see each other's scores — then averaged. When the two passes disagreed a lot, the answer got flagged for a person to settle: 32 answers were flagged in April (that review never got finished), and zero in July — the July passes agreed within one point 98% of the time.

One thing to know: the grading AI was Claude, which is also one of the four being graded. That's disclosed on the Methods page along with what we did to keep the grading fair — separate blind passes and written-down grading rules. Claude did not come out on top.

No trick questions. Just everyday wording a 15-year-old might type before practice.

Why bother? Because teens ask AI for advice all the time — before games, during class, late at night — and when that advice comes from one culture only, a lot of teens quietly learn the help was not made for them. That is not a reason to stop asking AIs for help. It is a reason to know what kind of help you are getting — and for the people building these tools to widen whose pressure they understand.

synthetic teen profiles

pressure moments each

AI assistants

320

graded answers — 160 per version, two versions so far

The scoreboard

This is the report card for the AIs. We asked each of the four AIs the same 40 questions — each grade below is that AI's average across its answers, out of 25.

Every answer was graded on five checks — the right words, the right view of family and self, doable advice, doing no harm, and the right kind of fix. The full checklist is on the Methods page.

How to read the grades

Every score is out of 25 — think of it like a test grade.

The AIs jumped from a D in April to a C in July (18.59 is about 74%). No AI has reached B territory yet — let alone an A.

Each AI's grade — April vs. July (out of 25)

AI	April 2026 (v1) out of 25	July 2026 (v2) out of 25	Change
Perplexity Top score	14.67	19.38	▲ +4.71
Gemini	15.84	18.43	▲ +2.59
Claude	15.22	18.30	▲ +3.08
ChatGPT	15.28	18.25	▲ +2.97
All four AIs, averaged	15.25	18.59	▲ +3.34

What changed, and our best guess why

Every AI scored higher in July, and the average jumped from 15.25 to 18.59. Our best guess why: all four companies shipped new models between the runs — ChatGPT moved to GPT-5.5, Claude to Sonnet 5, and so on — and newer models are generally better at tailoring answers to whoever is asking.

Same grades, split by which teen the AI was talking to. Each column is the country where that teen lives — you'll meet the four teens a little further down the page. This is where the gaps show.

July 2026 grades, by where the teen lives (out of 25)

AI	Maya United States	Diego Mexico	Haruto Japan	Aarav India	Overall out of 25
Perplexity Top score	20.4	18.2	20.8	18.1	19.38
Gemini	18.5	16.2	20.6	18.4	18.43
Claude	19.1	16.6	20.7	16.8	18.30
ChatGPT	19.8	16.7	19.9	16.6	18.25
All four AIs, averaged	19.45	16.93	20.50	17.48	18.59
April 2026 averages, for reference	14.19	13.46	18.67	14.70	15.25

The tinted cell in each row is that AI's lowest grade.

How the four AIs compared

The grades are all on the scoreboard above — here is what they mean.

What the comparison shows

In v1 no AI clearly won, and Perplexity came last. In v2 Perplexity leads — last place to first in one release cycle, mostly on realistic advice and scripts for talking to family. Gemini is now the best pick for India; Perplexity and Claude are effectively tied for the top Japan grade (20.8 vs 20.7).

Why this matters

There is now a clear first place, but the gap between the best and worst AI is still only about 1.1 points, while the gap between the best- and worst-served culture is about 3.6 — your culture still moves the score more than your choice of model.

Meet the four teens

These are the four teens behind every grade above. None of them are real — each is a synthetic profile based on published research on how pressure shows up in their country. Each card shows one of the ten scenarios we used, plus a word a culturally fluent answer would have used.

United States · tennis

Maya Chen

16, California. Chinese-American. Top-50 junior player. Has a real shot at a college scholarship — and real pressure from her family to earn it.

One scenario

Baseline, semifinal match, about to serve. Hands sweating. Parents in the stands. Her first-serve rate drops from 68% in practice to under 50% in matches.

A word that fits her

"Choking" — the everyday English word for freezing up. Maya is our control case: the teen closest to the AIs' default advice voice, so she is the comparison point for the other three.

Mexico · soccer

Diego Morales

17, Guadalajara. Liga MX youth academy. First in his family with a shot at a pro contract. Extended family pooled money for his academy fees.

One scenario

Missed two crucial penalty kicks in the last two big matches — both saveable shots, placed too centrally. The coach is now asking whether he has the "mental strength" to make it.

A word that fits him

"Aguante" — Spanish for the grit Mexican fans chant about. Almost no AI answer has ever used it — see what we found below.

Japan · kendo

Haruto Tanaka

17, Osaka. High school kendo club — practice 5-6 days a week. His coach told him he "lacked spirit" in last year's final; he has been freezing up since.

One scenario

Prefectural tournament coming up — his last chance as a senior. In practice he is fine; in matches his breathing goes shallow and his hands stop doing what he has drilled a thousand times.

A word that fits him

"Agari" — the Japanese word for exactly this freeze-up. The AIs sometimes used it (sports-science papers use it too), but missed most other Japanese pressure words.

India · cricket

Aarav Sharma

16, Mumbai. State cricket academy. Just made the Maharashtra Under-16 squad — which quadrupled family expectations overnight. Parents believe in him; neighbours talk.

One scenario

Bowled 8 no-balls in his first 3 overs at the biggest Under-16 tournament in India — got pulled off the attack, and his team lost. National trials are in 5 weeks.

A word that fits him

"Log kya kahenge" — Hindi for "what will people say?" — the social pressure every Indian teen recognizes. Zero of Aarav's 80 answers across both runs used it. (In April, no Hindi pressure phrase of any kind appeared; we re-checked log kya kahenge specifically in July — still zero.)

Each teen also has a full write-up — the words for pressure in their country, where it comes from, and what help actually works. Those write-ups are what every grade on this page is built on.

Read the four cultural foundations →

What we found

Here is where each April finding stands after the July re-run — plus what the re-run added:

New in v2: every culture scored higher — but the top and bottom never moved.

All four AIs shipped new models between April and July, and every culture's total went up. Japan stayed first and Mexico stayed last in both runs; the United States moved past India into second. The gap between the best- and worst-served culture shrank from about 5.2 points to about 3.6. Better everywhere is not the same as fair everywhere.

The AIs speak Maya's language by default — and even she didn't get top grades.

The AIs' default advice voice is American sports psychology — the material they were mostly trained on — so Maya's advice needed no translation, and everyone else got that same American advice in different wrapping. But sounding American is not the same as knowing Maya: her recruiting world and Chinese-American family story went mostly untouched, and Japan outscored the United States in both runs.

Mexico still gets the worst advice — the words problem barely moved.

In April, not one of Diego's 40 answers used a Spanish pressure word a Mexican teen would actually use (familismo, aguante, nervios). In July the words score was still just 1.88 out of 5 — 28 of his 40 answers had zero Spanish — and Mexico's total is still the lowest in the study. One bright spot: Perplexity finally wrote Diego full Spanish conversation scripts in one scenario, the first time in 80 answers.

Japan looks like a win — but in v2, all four AIs confused kendo with judo.

Japan scored highest in both versions, but mostly on Japanese words that already appear in English sports-science papers (agari, senpai) — deeper ideas like gaman (patient endurance) or meiwaku (worry about burdening others) were still missing. And in one v2 scenario about a wrist injury, all four AIs mixed up kendo and judo — "judoka," "randori," "the mat." Haruto does kendo; the fluency is thinner than the numbers suggest.

AIs hand out the same tool for two very different problems.

Pressure comes in two kinds: body-memory (right before a serve or kick, when the hands stop cooperating) and head-game (a brain that won't stop looping after a loss). They need different fixes, but the AIs mostly prescribed breathing drills — a body fix — for both. The "right tool" score rose everywhere in v2, but breathing is still the default reflex.

One answer, side by side

Aarav's exam is tomorrow; his cricket trial is the day after. Here is a typical AI answer next to what a culturally fluent one would sound like. Both are paraphrases we wrote to show the pattern.

Typical AI answer (paraphrase)

"That sounds really tough. Try taking some deep breaths and visualising success. Believe in yourself — you've got this. If the stress continues, consider talking to a sports psychologist or therapist."

Generic. Nothing would change if the teen were American, Japanese, or Mexican.

Wrong tool. The exam is a head-game problem; "deep breath" is a body problem fix.

Unrealistic. A sports psychologist is not something most Indian teens can actually reach.

Culturally fluent answer (paraphrase)

"Two pressures stacked in two days, and they need different handling. Tonight, write for ten minutes in Hindi or Hinglish — whichever feels most natural — about the worst case and what you would do if it happened. That gets the worry out of your head and frees up room for the exam. Tomorrow, do a short warm-up and agree with your coach on three things to focus on during the trial, outside your body. And if log kya kahenge is loud right now — what the neighbours will say — that is real; pick one person in the family you can tell you are handling this step by step."

Specific to Aarav. Would not transfer to Haruto or Maya without changes.

Right tool for each problem. Writing for the head-game; external focus for the body.

Names the pressure. Log kya kahenge is something an Indian teen recognizes instantly.

Release history & version diffs

How this index keeps changing

When one of the four AIs ships a major model update, the same 40 questions run again and this table gets a row. The v3 row is already queued.

Version	Released	Trigger	Average / 25	Change — US	Change — Mexico	Change — Japan	Change — India
v1	Apr 2026	Baseline snapshot	15.25	—	—	—	—
v2	Jul 2026	All four companies shipped new flagship models between April and July	18.59	▲ +5.26	▲ +3.47	▲ +1.83	▲ +2.78
v3	pending	Gemini 3.5 Pro released July 17, 2026 — re-run planned	—	—	—	—	—

Each "Change" column shows how much that culture's grade moved from the run before — up means the AIs got better for that culture.

How we check for updates

Every two weeks we check for major new model versions. A new release means a re-run (about 4–6 hours), fresh grades, and a new row — Gemini 3.5 Pro's release on July 17 has already queued up v3.

v1 average = (14.19 + 13.46 + 18.67 + 14.70) / 4 = 15.25 · v2 average = (19.45 + 16.93 + 20.50 + 17.48) / 4 = 18.59 · computed from the published data files (computed from unrounded values; displayed figures are rounded) · how to reproduce.

Help this index grow

Submit a persona

Four cultures is a start, not the ceiling. If you live, work, or grew up somewhere the audit does not cover yet, you can propose a teen-athlete profile from your culture for a future round.

Ana reads every submission. Strong ones get built into a full profile (grounded in the same kind of peer-reviewed research we used for the first four teens), checked with a cultural consultant, and added to a future release — with credit on the Methods page if you want it.

Your name Email (optional, for follow-up)

Culture / region to add Your relationship to this culture One specific phrase or concept an AI keeps getting wrong for this culture

This is the most important field. One concrete example beats a general description.

Peer-reviewed papers you would build this persona against (optional) I am willing to be named on the Methods page if the submission is accepted. (Unchecked is fine — submissions are always read; crediting is optional.)

Clicking submit opens an email draft to Ana in your mail app.

What good advice sounds like

A short cheat sheet for anyone writing, coaching, or building an AI that answers a stressed teen athlete.

Do

✓ Use the words the teen would actually say — "aguante" beats "resilience," and log kya kahenge lands harder than "social pressure."
✓ Ask which kind of pressure it is — body-memory or head-game — because each needs a different fix.
✓ Keep advice doable tonight, without a sports psychologist, a subscription, or a parent who "gets it."

Don't

✗ Default to "deep breaths and believe in yourself" — the one-size-fits-none answer.
✗ Treat family as the problem — for most teens outside the US, family is the support and the pressure at once.
✗ Push therapy as the only fix — it is sometimes right, but in many places not realistic.

About this project

The Pressure Audit is a research project by Ana Yang — built during her summer 2026 internship at Calm and kept going as an ongoing index. New versions ship as the four AIs update, so readers can watch cultural fluency change over time.

The full story — where the question came from and how the project keeps going — is on a separate page.

Read the founding story Back to the scoreboard

Want the full version?

A one-page summary and the full analysis report go deeper into the numbers and each culture's pressure world. Ask through the contact link below and Ana will share them.

Cite this work

Ana Yang. Pressure Audit, July 2026 (v2). CC BY 4.0.

Ana Yang. Pressure Audit, April 2026 (v1). CC BY 4.0.

"v1" is the April 2026 baseline; "v2" is the July 2026 re-run. Later versions will be tagged v3 and so on, each with its own dated release. The numbers change between versions, so cite the version you read.