In the latter half of the 19th century, two eminent scientists more or less simultaneously discovered regression to the mean. One was the Austrian physicist Ludwig Boltzmann; the other the English statistician and polymath Francis Galton. Boltzmann’s discovery came by way of devising a probabilistic rationale for the second law of thermodynamics. Galton’s discovery resulted from observing that the children of exceptionally tall parents were usually shorter than their parents. Regression to the mean was apparently unknown to Boltzmann and has escaped the notice of statistical physicists to this day. For almost a century and a half, regression has been mistaken for the second law of thermodynamics, whereas it is merely an artifact of Boltzmann’s model.
In its simplest form, Boltzmann’s probabilistic model is based on the mathematics of coin flipping. If one were to imagine a box of ideal gas whose N molecules are each equally likely to be in either half of a box, the number of molecules in the right half is analogous to the number heads in N coin flips. If the initial distribution of molecules is found to be uneven such that, for instance, all the molecules are in the left half of the box, it is highly likely that the distribution would be more uniform at any subsequent time. According to Boltzmann’s model, this transition would occur instantaneously, not gradually through a progression of ever more probable macrostates, as Boltzmann believed.
Galton spent several years searching for a causal explanation of what he termed “regression towards mediocrity”. He eventually realized that regression could also be observed where causality was not plausible. For instance, he observed that not only were the children of exceptionally tall parents usually shorter than their parents, the parents of exceptionally tall children were usually shorter than their children. This led to the realization that regression to the mean was a statistical artifact of randomness, not a consequence of natural causes. If we look back in time, we see that extremely high numbers of heads are also almost always preceded by lower numbers, and so forth. This time symmetry, which was also noted by Boltzmann, means that regression is indifferent to the “arrow of time”. As Daniel Kahneman observes in Thinking, Fast and Slow, “The fact that you observe regression when you predict an early event from a later event should help convince you that regression does not have a causal explanation. Regression effects are ubiquitous, and so are misguided causal stories to explain them.”
However, Boltzmann surmised that “In nature, the tendency of transformations is always to go from less probable to more probable states.” While this might seem tautological, he was convinced that the irreversible second law of thermodynamics could not be due to the seemingly reversible laws of motion and therefore required further explanation. This rationale was provided by regression to the mean, which he attributed to the “laws of probability” and mistook for the second law of thermodynamics. So while Galton sought physical causes to explain regression, Boltzmann reversed this logic by taking regression to be physically causal. Galton finally realized that regression to the mean was a mathematical truism which required no causal explanation. Boltzmann, having found his “cause”, looked no further. However, the “tendency of transformations” Boltzmann identified is simply an artifact of his probabilistic model and has no causal connection to “nature”.
The data which led to Galton’s discovery consisted of records of inherited biological traits, such as the sizes of successive generations of seeds or the heights of parents and their children. On the other hand, Boltzmann did not actually count molecules, but based his analysis on hypothetical probability distributions. He reasoned that an improbable state was likely to be followed by a more probable state and deduced from his hypothetical data the regression that Galton observed in his empirical data. While Galton’s data resulted from a mixture of deterministic and random influences, the imaginary data generated by Boltzmann’s model was entirely random.
Boltzmann’s model was designed to mimic classical thermodynamics
In formulating his statistical model of the ideal gas, Boltzmann used as his template the second law as established by classical thermodynamics. His notion was that an isolated system in a non-equilibrium state would continuously evolve toward an equilibrium state corresponding to maximum entropy. Once realized, this equilibrium state would persist in perpetuity if undisturbed. Boltzmann envisioned a mathematical analogy based on probability theory. His model differed in that the evolution toward equilibrium was not absolutely certain, although highly probable If the system were found in a “non-uniform” state distribution, then “since there are infinitely many more uniform than non-uniform distributions, the number of states which lead to uniform distributions after a certain time t1 is much greater than the number that leads to non-uniform ones…” The uniform state distributions are identified with equilibrium, so for systems composed of large numbers of particles, the difference between the models is between the absolute certainty of the classical model and the near certainty of the probabilistic model. The similarity of the models suggested to Boltzmann that a causal link existed between empirical observations and the “laws of probability”. This reification of probability had the added benefit of resolving the “Loschmidt paradox” by circumventing the supposed reversibility of the laws of motion in order to explain the asymmetry of the second law.
Flaws in Boltzmann’s model
The model chosen by Boltzmann is based on the multinomial distribution. The simplest form of this distribution is the binomial distribution where the two individual states are equally probable, for example, the two sides of a fair coin or the two halves of a container of an ideal gas composed of identical molecules. Boltzmann’s version of the second law is based on the premise that “there are infinitely many more uniform than non-uniform distributions”. However, this statement is mathematically false and the opposite is true, as first pointed out by John Arbuthnott in 1710. For large N, the probability of the most probable macrostate for the binomial distribution is given by the formula √(2/πN). This approaches zero with increasing N, so if the uniform distribution defines equilibrium, the probability of the system being in equilibrium is vanishingly small for systems composed of large numbers of particles. This contradicts the notion of equilibrium as a stationary state. To see this intuitively, consider flipping a coin a thousand times. The uniform distributions are the many ways in which we might get 500 heads and 500 tails. However, it is apparent that the probability of getting exactly 500 heads is extremely unlikely (less than 1 in 100) and only slightly greater than getting 501 or 499 heads, and so forth. This renders Boltzmann’s definition of equilibrium contradictory.
If one were to construct a time series by repeating this coin-flip experiment many times, the time series of the number of heads would constitute “white noise”. As this time series progresses, it will be observed that extremely high numbers of heads, while rare, will almost always be followed by lower numbers, while extremely low numbers will almost always be followed by higher numbers. By the same token, if we were counting the number of molecules in the right half of a container of ideal gas, we would observe the same tendencies. Since white noise corresponds to a series of statistically independent states, there can be no “gradual” evolution from lower to higher probability “macrostates”, but merely a series of discontinuous “jumps” from one macrostate to another. As with successive coin-flip sequences, one macrostate has no influence on any other. For a long sequence of coin flips, the probability of getting exactly half heads and half tails is very small. This is analogous to the uniform distribution of molecules defined by Boltzmann as equilibrium. So the idea that a non-uniform gas “gradually evolves” toward a stable equilibrium is contradicted on both counts by the mathematical properties inherent in Boltzmann’s model.
For the binomial example of the gas in the box, the probability that the system will move to a more probable macrostate is contingent only upon the current macrostate. Since the most probable macrostate is also the average (mean) macrostate, any tendency toward the most probable macrostate corresponds to regression toward the mean. If the system is currently far from the mean, it will tend to regress toward the mean at any subsequent (or prior) time. However, if the system is currently at or close to the mean, then it will tend to “progress” away from the mean. The point where it is equally likely to move away from or toward the mean is approximately μ ± 0.6745 σ, where μ = N/2 and σ = √N / 2. This progression away from the mean contradicts the notion of the most probable (mean) macrostate as an attractor which leads the system to a stable equilibrium state. Not only is the probability of arriving in the most probable macrostate vanishingly small, the probability of staying there is infinitesimal.
Statistical stationarity
If the Boltzmann definition of equilibrium is untenable, what is the alternative? The Boltzmann model implies a sequence of molecular configurations such that the number of molecules in the right half of the box could be represented as a time series. In this case equilibrium would conventionally be defined by statistical stationarity, which in the case of a binomial distribution would generate white noise with constant mean and standard deviation. From the perspective of orthodox statistics, stationarity is synonymous with equilibrium and the system will remain in a statistical steady-state as long as the boundary conditions remain unchanged. Extreme outliers, while rare, are considered natural variations within the overall distribution, not departures from equilibrium, as in Boltzmann’s interpretation. All accessible microstates would then be included in the definition of equilibrium and the entropy S = k ln W would be based on the total number of microstates W = Wtot = 2N, not the number of microstates in the most probable macrostate W = Wmp = N!/(N/2)!2, as Boltzmann proposed. For large N, Wmp << Wtot, consistent with Arbuthnott’s discovery.
Boltzmann’s approach relies on the premise that a system in isolation will tend toward uniformity over time, an idea consistent with empirical observation and everyday experience. The “fundamental postulate” of statistical mechanics is equiprobability, which means that a molecule has an equal chance of being in any particular location and the probability of a specific configuration of molecules (microstate) is equal to that of each of the other possible configurations. Equiprobability is assumed to hold regardless of the initial state of the system. For instance, if our box of gas is partitioned so that all of the molecules are initially confined to the left half of the box, then the probability of finding a molecule in the right side of the box is zero. However, if the partition is removed, it is assumed that the probability of finding molecule on the right side will immediately equal the probability of finding a molecule on the left side. While this transition is physically implausible for a real gas, it is essential to the idea that the gas expands via a passage through macrostates of increasing probability. Of course, if the location of a molecule is independent of its prior location, there can be no progression and the molecule must “jump” to a new location. By the same token, the new microstate is independent of prior microstates and must jump to a new microstate based solely on a random draw from its binomial probability distribution.
When the partition is removed, it will instantaneously be in a new equilibrium specified by a new mean and standard deviation as the available space doubles. Initially the probability of a molecule being in the right side of the box will be zero. Immediately after the removal of the partition, that probability will equal 1/2. The mean of the distribution of molecules on the right side will increase from zero to N/2 and the standard deviation will increase from zero to √N/2.
The entropy for this case is given by Gibbs as SG = – k Σ pi ln pi, where pi = 2–N for all pi. Upon removal of the barrier, the entropy will increase from 0 to k ln 2N immediately. For the Boltzmann model, the entropy would increase from 0 to k ln Wmp, where Wmp = N!/(N/2)!2 corresponds to the most probable macrostate. Since Wmp determines both the equilibrium entropy and the probability of being in equilibrium PE = Wmp /2N = √(2/πN), there is no way to avoid the conclusion that the probability of finding the system in equilibrium becomes vanishingly small for large N. However, if defined by statistical stationarity, the probability of being in equilibrium is PE = 2N/2N = 1, consistent with observation and common sense.
As Einstein and others have noted, the Boltzmann model is mechanism-free, since it has no explicit connection with laws of motion. As a consequence, relating the second law to regression to the mean has no ontological rationale and relies solely on analogy. Since the relative probability of a macrostate is not an explicit function of time, Boltzmann’s model lacks a viable transport equation. While he proposed his H theorem as a substitute transport equation, the H theorem lacks a physical basis and its purported approach to equilibrium is a mathematical artifact attributable to the arbitrary definition of H. The notion of an equilibrium macrostate that acts as an incremental attractor due to the “laws of probability” is explicitly contradicted by his model.
Revised 10/23/22