Where this comes from — the research behind the scorecard.

The five questions on the scorecard were not made up. Each one traces back to a piece of published research — papers from psychology, sports science, and cross-cultural studies that have been tested, peer-reviewed, and cited for years. Here is the short version of where the questions come from and how the grading worked.

The author names below are clickable — each one jumps to the matching entry on the literature page, where you can see the full citation and a plain-English description. The research splits into three buckets.

The three research buckets

Bucket A

How culture shapes the way a teen sees themselves

Researchers Markus and Kitayama (1991) and later Vignoles and colleagues (2016) studied thousands of people across dozens of countries. They found that people in some cultures picture themselves mainly as individuals, and people in other cultures picture themselves mainly as part of a family or team. That difference changes how pressure actually feels.

Bucket B

How choking actually works in the brain

Sports psychologist Sian Beilock and colleagues (2005, 2007) and later Marci DeCaro (2011) showed there are two very different kinds of choking. One happens when a trained body movement falls apart because the athlete starts thinking about it too hard. The other happens when worry eats up the mental space needed to solve a problem. Same word, two different breakdowns — and two different fixes.

Bucket C

What pressure actually looks like in each country

Studies by Menon and colleagues (2024, India), Robledo (2022, Mexico), and Ojio (2021) and Noguchi (2022) for Japan each zoom into one country at a time. They show what family expectations feel like for Indian teens, how Mexican families act as a source of both pressure and support, and how stigma around mental health shapes what Japanese teens feel able to say.

How the synthetic personas were built

Maya, Diego, Haruto, and Aarav are synthetic personas — fictional characters modeled after published research. Each persona was built against a specific framework: Vignoles' seven-part model of self (tested across 55 cultural groups in 33 countries), the Beilock research on the two kinds of choking, and the country-specific studies — Menon for India, Robledo for Mexico, Ojio and Noguchi for Japan. That is why their scenarios are not generic "sports pressure" — they name the real tournament, the real family money conversation, the real coach phrase. Anyone who wants to rerun this study can use the same four synthetic personas as a fair comparison.

How the 160 answers were graded

Each of the five scorecard questions traces back to one or more of the papers above. The literature page spells out exactly which paper grounds which question, what a high score looks like, and what a low score looks like — with concrete examples of real AI responses.

1

Two graders, same answer, no talking

Every one of the 160 AI answers was scored twice — once by each grader, working separately. Neither could see the other's scores. This is a standard method in research for catching personal bias.

2

Disagreements got settled out loud

When two graders disagreed by more than one point on a question, they compared notes, went back to the scorecard examples, and agreed on a final number. Both original scores are kept in the workbook so anyone can check the path.

3

The full method is written down

A document called Methods and Ethics v5 lays out the whole pipeline — how the scenarios were written, how the AIs were prompted, how disagreements were reconciled, and the honest limits of what 160 answers can show. The literature page shows how each paper shaped the grading for each question. Both are released under the same open licence as the rest of the project.

Jump back

Read the 10 papers The scoreboard The synthetic personas Main page