Book Review: The Seven Pillars of Statistical Wisdom


The book is written by Stephen M. Stigler, who serves as professor in the Department of Statistics at the University of Chicago. Through this book, the author presents what he defines as the Seven Pillars of Statistical Wisdom, and emphasizes that these are support pillars – the disciplinary foundation, not the whole edifice, of Statistics. All seven have ancient origins, and the modern discipline has constructed its many-faceted science upon its structure with great ingenuity and with a constant supply of exciting new ideas of splendid promise.

An important question addressed by the author is ‘What is Statistics?‘. As described by professor Stigler, this question was asked as early as 1838 – in reference to the Royal Statistical Society – and it has been asked many times since. The persistence of the question and the variety of answers that have been given over the years are themselves remarkable phenomena. Viewed together, they suggest that the persistent puzzle is due to Statistics not being only a single subject. Statistics has changed dramatically from its earliest years to the present, shifting from a profession that claimed such extreme objectivity that statisticians would only gather data – not analyze them – to a profession that seeks partnership with scientists in all stages of investigation, from planning to analysis.

The Seven Pillars of Statistical Wisdom described on this book are summarized below:

  1. Aggregation – this could just as well be given the nineteenth-century name, “The Combination of Observations”, or even reduced to the simplest example, taking a mean. Those simple names are misleading, in that the author refers to an idea that is now old but was truly revolutionary in an earlier day – and it still is so today, whenever it reaches into a new area of application.
  2. Information – more specifically Information Measurement. It also has a long and interesting intellectual history. The question of when we have enough evidence to be convinced a medical treatment works goes back to the Greeks. The mathematical study of the rate of information is much more recent.
  3. Likelihood – by this, the author means the calibration of inferences with the use of probability. The simplest form for this is in significance testing and the common P-value, but as the name “Likelihood” hints, there is a wealth of associated methods, many related to parametric families or to Fisherman or Bayesian inference.
  4. Intercomparison – it is borrowed from an old paper by Francis Galton. It represents what was also once a radical idea and is now commonplace: that statistical comparisons do not need to be made with respect to an exterior standard but can often be made in terms of interior to the data themselves. The most commonly encountered examples of inter comparisons are Students t-tests and the tests of the analysis of variance.
  5. Regression – after Galton’s revelation of 1885, explained in terms of the bivariate normal distribution. Galton arrived at this attempting to devise a mathematical framework for Charles Darwin’s theory of natural selection, overcoming what appeared to Galton to be an intrinsic contradiction in the theory: selection required increasing diversity, in contradiction to the appearance of the population stability needed for the definition of the species.
  6. Design – as in “Design of Experiments”, but conceived of more broadly, as an ideal that can discipline our thinking in even observational settings. Some elements of design are extremely old. The Old Testament and early Arabic medicine provide examples. Starting in the late nineteenth-century, a new understanding of the topic appeared, as Charles S. Pierce and then Fisher discovered the extraordinary role randomization would play in inference.
  7. Residual – you may suspect this is an evasion, “residual” meaning “everything else”. But the author has a more specific idea in mind. The notion of residual phenomena was common in books on logic from the 1830s on. As one author put it, “Complicated phenomena… may be simplified by subducting the effect of know causes, … leaving … a residual phenomenon to be explained.

A particular story that I would highlight, and which the author presents within the ‘Aggregation’ section, is the one where he speaks about Jorge Luis Borges‘ fantasy short story ‘Funes the Memorius’, published in 1942. In this story, Borges described a man, Irene Funes, who found after an accident that he could remember absolutely everything. He could reconstruct every day in the smallest detail, and he could even later reconstruct the reconstruction, but he was incapable of understanding. Borges wrote “To think is to forget details, generalize, make abstractions. In the teeming world of Funes there were only details“. Aggregation can yield great gains above the individual components. Funes was big data without Statistics.

At a risk of oversimplification, the author summarizes and rephrases these seven pillars as representing the usefulness of seven basic statistical ideas: 1) the value of targeted reduction or compression of data; 2) the diminishing value of an increased amount of data; 3) how to put a probability measuring stick to what we do; 4) how to use internal variation in the data to help in that; 5) how asking questions from different perspectives can lead to revealingly different answers; 6) the essential role of the planning of observations; and 7) how all these ideas can be used in exploring and comparing competing explanations in science.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

<span>%d</span> bloggers like this: