An earlier version of this essay appears in educational Horizons Spring 2009

U.S. Students Reported to "Lag" in the TIMSS:
another bulletin from Chicken Little?

Edward G. Rozycki, Ed. D.

50% of the graduates from the finest universities in the United States are in the lower half of their class. – A. Toll Talagee (1923)

edited 7/2/09

Are the Test Average Differences Important?[1]

A recent front page newspaper article,[2] suppurating portent, declares that U.S. students "still lag behind" those of other nations. What evidence is adduced to support this apparent animadversion? The averages of scores by country reported in TIMSS by the National Center on Educational Statistics.

Partial results for grade eight are shown in Table 1 below. [3]

Grade Eight
Average score
TIMSS scale average 500
Korea, Rep. of 597
Singapore 593
Hong Kong SAR 572
Japan 570
England 513
Russian Federation 512
United States 508
Czech Republic 504
Slovenia 501

Table 1

"If it bleeds, it leads," reporters say. But education news, excepting stories of pederasty, mass shootings and other such uplift, is generally seen to be dull, dull, dull. Thus, the infusion of drama into the pedestrian: disaster sells and it is always, says our Chicken Little, imminent.

The TIMSS article is an example of such histrionics. Education news reporters have long indulged the misleading practice of giving test averages public venue without mentioning other important characteristics, e.g. population sizes or score distributions. But what do mere comparisons of averages indicate?

Not much. Consider the set of test scores generated by members of group A, A = {4, 5, 9, 10}. Its average (mean) is 28/4 = 7. Two scores are at or above average, the 9 and the 10.

Group B generates set B = {1, 2, 2, 3, 3, 7, 7, 8, 8, 9} has an average of 50/10 = 5. Five scores are at or above the average of set A.[4]

If we are looking for individuals who score above average, there would be more available from group B than from group A, even though A's group mean is higher.

Expand this little thought experiment to the real world. Given the number of U.S. students studying math compared to the number of such students in, say, Singapore -- TIMSS average 593 -- odds are that the absolute number of students in the US (population over 305 million) getting 593 or higher (not the proportion relative to the entire population) is greater than the absolute number of students achieving that score in Singapore (with a population of about 4.5 million). The differences in average may be statistically significant; it is not clear that they are important.

If an employer were looking to find budding mathematicians, given equal market demand, he or she would no doubt minimize search and hiring costs by looking at populations in the U.S. and the Russian Federation than in Singapore or Korea.

Is There a Mathematics Lag That Matters?

Would it bother you as a teacher if you found out that the mean IQ of one of your classes was 101 whereas the mean IQ of another was 114? Not likely. A typical IQ test has an average (mean) set at 100. The first standard deviation is 115. No teacher with any understanding of IQ testing would take it as very important that one of her classes had a mean IQ of 101, whereas another had a mean IQ of 114 without knowing the distribution of IQ scores in each group. There may be, for example, substantial important overlap in scores among the two classes, as the thought experiment above illustrates. Or a genius or two would really pull the class average up.

In addition, a difference of one standard deviation from the mean would not normally concern even special educators since, barring individual disabilities, IQ does not typically raise issues of Special Education placement unless, as in many school districts, it is below 85 (-1sd) for Special Education eligibility or above 130 (+2 sd) for Gifted Education.

Let's compare the IQ test with the TIMSS, using the Korean TIMSS average of 597 and the U.S. TIMSS average of 508. The average for the TIMSS is set at 500. The standard deviation is 100. So the difference between the Korean average and the U.S. average is 597 - 508 = 89. This is 89/100, roughly, of the first standard deviation (sd) from the mean.

Comparing the two tests proportionally, the Korean IQ would be +.98sd, or roughly scoring 113. The U. S. IQ would be roughly 101. If, as educators, we are not particularly concerned with group differences within the first standard deviation from the mean, how can we be impressed with TIMSS scores that are just that close? Is this the indicator of an ominous "lag"?

Compare the TIMSS with a typical IQ test where the average (mean) is 100 and the first sd is 115. Comparing the two tests proportionally, the Korean IQ would be +.97, or about 114.5 -- round it to 115, since we're estimating. The U. S. IQ would be roughly 101. Would we consider it to be a great discrepancy between students (or groups) if the average IQ of one (group) were 101 and that of the other (group) 115?

Do people – education reporters, for example -- who find such a "lag" worrisome not live on the same planet with the rest of us, where one seventh of the population goes to bed hungry every night, where millions of people face foreclosure on their houses, where unanswered health needs abound and where polar ice caps are melting? Do they even understand the mathematics?

However, let us be tolerant: we live in a "pluralistic" society. Out of respect for the diversity of our fellow citizens we should concede, I suppose, to each his or her own worry for how our youth are educated, whether that worry be over a student's lack of knowledge of synthetic division, or a student's failure to appreciate the cultural significance of Schweppervescence®.

What Does a Test Prove?

I was a math genius for about eight and a half months. My 12th Grade teachers told me so. (My mother was far less impressed.) All the standard tests whispered, "Another David Hilbert!"

The last time I took the Math SAT's was in the Spring of 1960 when, as legend has it, the SAT's had not yet been "watered down" and "really" meant something.[5] I remember preparing by studying problems and memorizing trigonometric identities. I scored 724 out of 800 and won a scholarship that put me through four years of college.

In reality, my demonstrations of learning were little more than a dog-and-pony show. I really did not understand mathematics.[6] When faced with a test, wracked with dubiety, I cleverly manipulated formulae and churned out symbology, mathemagical runes, as though they were some occult incantation. When I got the answers right -- happily more often than not -- I was as surprised as anyone.

I was not unique in my fog. That Fall of 1960 my college Honors Calculus course contained twenty-five other "geniuses" like me: long on technique though short on insight. (Two course members actually turned out to understand something -- it was very perplexing to us "young college men" back there in 1960 that one of the real math geniuses was a female.)

The definition of "limit" in terms of δ(delta) and ε(epsilon) was Greek to us. We bumbled through the course and since "honors students", i.e. "geniuses," couldn't fail, we ended up with B's. Our most educational outcome was that we now understood that, contrary to our "official" designation, geniuses we weren't.

Lifelong Learning via Mediocrity in Mathematics

From that honors course I and my comrades-in-cribbing learned two important life lessons:

1) we were far from being geniuses; "mediocrities" was a more apt description.

2) it didn't really matter how mediocre we really were, since if we were willing to play the academic game -- i.e. manifest frequent displays of deference to those who gave us grades -- allusions to subject matter, especially as they manifested themselves in a certain esoteric garrulousness, could distract from our paucity of real understanding. (And keep up our grade point average.)

Some few of us, however, flying in the face of common sense, of the worldly wisdom of our brothers, and of long academic tradition, really developed interests in the subject matter; even to the point of letting it interfere with our many social and extracurricular obligations. Some even went so far as to become scholars and researchers. The cleverer majority went on to fame, fortune, and newsmedia quotability.

Learning More and Better Math?

In the final analysis, the well-documented innumeracy that the population of the U. S. suffers from will unlikely have been cured even when the TIMSS reports that U. S. students surpass those of the presently higher scoring countries.

Innumeracy and its many associated debilities have been well described by John Allen Paulos in his many publications. It is not merely a school child's disease but is found in also in highly trained professionals who, it appears, have gotten little more than suffering out of their math courses. High-placed public administrators misunderstand what an average is. Other scientifically-trained professionals commonly misinterpret false-positive test information. And general concerns about Bayesian thinking in medical professionals has been long noted. [7]

Despite my many years of teaching, I am still perplexed at the number of intelligent, even mathematically experienced, persons who shy away from mathematical thinking of all but the simplest sort. Is mathematics frightening because, contrary to what Bertrand Russell said, it is hard? Or is mathematics hard, because how it is taught is frightening?


[1]See also Edward G. Rozycki, "Identifying the "At Risk" Student: What is the Concern? at

[2] Kristen A. Graham, "U.S. Students Still Get Mixed Scores," The Philadelphia Inquirer, Dec. 10, 2008, A1.

[3] My abridgment. See TIMSS (Trends in International Mathematics and Science Study), Table 1. Average mathematics scores of fourth- and eighth-grade students, by country: 2007 available at

[4] If the number of scores entered into set B, (maintaining distribution characteristics) were big enough to match the actual proportionate size of the U. S. to Singapore , i.e. 305/4.25 = 71.8, we would have 71.8 x 5 = 359 U.S. students meeting or exceeding the Singapore scores.

[5] Educational legend (?) has it that in 1960's grade inflation appeared judging by the rise of high-school GPA's against constant SAT scores. See Richard J. Barndt, (2001) "Fiscal Policy Effects on Grade Inflation" available from

[6] See, for comparison, "College Freshmen In US And China: Chinese Students Know More Science Facts But Neither Group Especially Skilled In Reasoning" Science Daily available at

[7] See Edward G. Rozycki, "EVERY CHILD ABOVE AVERAGE 
Achieving the Lake Wobegon Vision" available at

Also see Edward G. Rozycki, "Classification Error in Evaluation Practice: 
the impact of the "false positive" on educational practice and policy" available at

Clinicians' math skills are criticized in David M. Eddy, "Probabilistic reasoning in clinical medicine: Problems and opportunities" Chapter 18 pp. 249 - 267 in Daniel Kahneman, Paul Slovic & Amos Tversky (eds.) Judgment under uncertainty: Heuristics and biases. Cambridge University Press 1982. p. 267.