Regression to the mean

From Bogleheads
Jump to navigation Jump to search

Regression to the mean is a term used in statistics. It is a different term, with a completely different meaning, from Mean reversion as used in finance. The term actually originated in population genetics, with Francis Galton, and its original meaning is captured in the title of his 1886 paper, "Regression toward mediocrity in hereditary stature." That is, the children of unusually tall parents tend to be shorter than their parents.

As Galton put it, "When Mid-parents are shorter than mediocrity, their Children tend to be shorter than they. The Deviates of the Children are to those of their Mid-parents as 2 to 3."

Screen Shot 2012-01-03 at 7.36.29 AM.png

The important thing to understand about regression in this sense is that it is not a case of compensation. There is nothing about the genes of a tall person that say "this person is taller than average so we will compensate for that by making the next generation shorter than average." Rather, it is an expression of imperfect correlation between generations, which in turn is an expression of the fact that stature is partially determined by genetics and partially by other factors. If a person is unusually tall, we cannot tell how much of the tallness is explained by genetics and how much by other factors. However, the way the mathematics of a bivariate distribution works, the taller the person is, the greater the likelihood that chance has played some role. A woman who 3" taller than the population average most likely is bearing genes that encode only for a height that is 2" taller than the population average, and the other inch is due to other factors. Her children will have an average height that is only 2" above average, not 3".

An interesting example of regression toward the mean occurred some years ago in Massachusetts, shortly after the introduction of standardized testing in the schools. In addition to releasing the scores, officials calculated and released rankings based on the difference in scores between one year and the next, with the idea of recognizing improvement. A pattern was soon noticed: wealthy high-scoring school systems were showing dismal "improvement" scores and, vice versa, troubled urban schools were showing good "improvement" scores. Various conclusions were drawn from this until statisticians pointed out that this did not imply any causal association, but was merely a classic demonstration of regression toward the mean, and a simple reflection of that fact that standardized test scores are an imprecise measuring tool.

See also