|
|
MEDICAL HISTORY PAGE |
|
Year : 2016 | Volume
: 7
| Issue : 2 | Page : 147-149 |
|
From a brewer to the faraday of statistics: William Sealy Gosset
Abhay B Mane
Department of Community Medicine, Smt. Kashibai Navale Medical College, Pune, Maharashtra, India
Date of Web Publication | 30-Jun-2016 |
Correspondence Address: Abhay B Mane Department of Community Medicine, Smt. Kashibai Navale Medical College, Narhe, Pune, Maharashtra India
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/0975-9727.185019
William Sealy Gosset (1876-1937) was an immensely talented statistician to be remembered for his contributions to the development of modern statistics. Better known to the statistical world by his pseudonym, "Student," Gosset's name is associated with the discovery of the t-distribution and its use. He was a brewer of Guinness' Brewery in Dublin, Leinster, Ireland and the pioneer in the analysis of small samples. The most famous work, "The Probable Error of a Mean," published in "Biometrika" made a clear distinction between population parameters and sample estimates of them. He will be remembered as a practical scientist for his discoveries of the frequency distribution of the variance of normal samples and of the ratio of the mean to the standard deviation to the theory of statistics. His life was one full of fruitful scientific ideas that the Student's test of significance finds a unique place in the history of scientific method. Keywords: Fisher, Karl Pearson, probable error of a mean, student, t-distribution
How to cite this article: Mane AB. From a brewer to the faraday of statistics: William Sealy Gosset. Muller J Med Sci Res 2016;7:147-9 |
Introduction | |  |
William Sealy Gosset's work has proven fundamental to statistical inference as practiced today. Better known by his pseudonym, "Student," Gosset's name is associated with the discovery of the t-distribution and its use. [1],[2] He had a profound effect on the practice of statistics in industry and agriculture. He was a chemist and statistician, better known by his pen name "Student." He worked in a beer brewery and his testing of very small patches led him to discover certain small-sample distributions. This led to the development of Student's t-test. In 1908, a fundamentally new approach to the classical problem of the theory of errors was developed. Gosset was led early in his career at Guinness to examine the relationship between the raw materials for beer and the finished product, and this activity naturally led him to learn the tools of statistical analysis. In 1908, the two contributions, as Student's t-distribution and the small sample distribution of Pearson's correlation coefficient, placed him among the great men of the newly emerging field of statistical methodology. [3] He had a profound effect on the practice of statistics in industry and agriculture. The story of this advance is as instructive as it is interesting. This paper presents a detailed account of the development of small-sample approach, which was a pathbreaking contribution as Student's t-test by Gosset.
A Biographical Glimpse | |  |
William Sealy Gosset (187-1937) was born on June 13, 1876 in Canterbury, Kent, England. He was the first of five children of Colonel Frederic Gosset and Agnes Sealy Vidal. He was a very good student and won several scholarships. Gosset was a scholar of Winchester and later New College Oxford where he obtained first classes in mathematical moderations (1897) and chemistry (1899). [1],[2] In 1935, Gosset left Dublin to take over a position of a scientist in a leading position at a new Guinness brewery in London, England. In 1937, he died of a heart attack in Beaconsfield, Buckinghamshire, England at the age of 61 years, still in the employment of Guinness.
A Humble Brewer | |  |
Guinness was interested in agricultural experimentation, and he hired scientists who could apply their expertise to the business. In 1893 at Guinness' Brewery, there was a policy of recruiting brewers with scientific degrees, (although only from Oxford or Cambridge) and it was decided that anyone wishing to make a mark as a brewer in the future must have training in "the application of science (chemistry and bacteriology) to the fermentation industries." Hence, Gosset joined Guinness Brewery in Dublin on October 1, 1899 as a junior brewer and was the fifth scientist to be recruited as a brewer. The recruitment of scientists as brewers brought them very much into research. Given that Gosset had studied mathematics as well as chemistry at Oxford, it was perhaps only natural that he focused his attention to the use of mathematical methods in the working of the brewing process. At Guinness, Gosset applied his statistical knowledge both in the brewery and on the farm to select the best yielding varieties of barley. Problems of this type in experimental brewery led him to turn his attention to the "The error of the mean of a small sample." [4] During this time, mathematicians were not able to answer the problem of determining the error of the mean for a small sample. This was critically important to derive valid results from many experiments in the brewery and that too without using adequate methods of sampling. Hence, in 1904 he wrote an internal report for Guinness on "The Application of the Law of Error to the Work of the Brewery" that emphasized the importance of the use of the probability theory to set exact values based on the results of experiments in the brewery. Another internal report in 1905 entitled "The Pearson Co-efficient of Correlation" was written by him that was also endorsed by the Guinness Board. The internal reports written by Gosset were especially interesting and illustrated a great utility of the new statistical methods introduced in the brewery. Gosset's statistical work helped him become the head brewer, a more interesting title than a professor of statistics. [5]
The 'Faraday of Statistics' on Mean and Correlation Coefficient | |  |
Gosset, the "Faraday of statistics," was a highly influential figure in the development of modern statistical thinking. He used statistics to solve a whole lot of problems connected with brewing, ranging from barley production to yeast fermentation that affected the quality of the product. One problem involved the selection of varieties of barley having maximum yields for given soil types and allowing for the vagaries of climate. His name may not be familiar but his work is known to the statistical world as "Student." To extend his knowledge, Gosset spent 1 year at the biometric laboratory of the leading statistician Karl Pearson at the University College London. Reliable statistics require adequate sample size. He soon realized that Pearson's large-sample theory required refinement if it was to be useful for the small-sample problems arising in brewing. It was in 1908 that he laid the basis for his most famous breakthrough work published in "Biometrika" entitled "The Probable Error of a Mean." [2],[6],[7] He focused primarily on determining the likelihood that a sample mean approximates the mean of the population from which it was drawn. The "probable error" of a mean is a specific estimate of the dispersion of sampling distribution such as the standard error. Estimating this dispersion today is a foundational step of statistical inference to draw inference about a population parameter from a sampled mean. In nearly all researches, both the population mean and variance are unknown. Therefore, we must use the sample variance to specify the sampling distribution of the mean. He confronted the problem of using sample variance to estimate the sampling distribution of the mean to have an error associated with sample variance. Further, this error is more likely to result in the underestimation of population variance because the sampling distribution of the variance is positively skewed. [8] Also, the "probable error" such as the error associated with sampled means, increases as the sample size decreases in case of small-sample research. [9] The unit normal table does not account for either the estimation of population variance or the fact that the error in this estimate depends on sample size. This limitation inspired Gosset to develop a set of valid probability tables for small sample sizes. [10],[11] His fame today rests on a statistical test called Student's t-test.
The statistical methods available ended with a version of the z test for means - Even confidence intervals were not yet available. Gosset faced the problem we noted in using the z test to introduce the reasoning of statistical tests: He did not know the population standard deviation (ó). Moreover, field experiments give only small numbers of observations. Just replacing ó by s in the z statistic and calling the result roughly normal was not accurate enough. So, Gosset asked the key question: What is the exact sampling distribution of the statistic (x-u)/s? He also had the answer to his question and had calculated a table of critical values for his new distribution. [12] We call it the t distribution and the t-test is sometimes called "Student's t-test" in his honor. [13]
He wrote another paper in 1908 entitled "On the Probable Error of a Correlation Coefficient" after a simulated experiment in which he observed 750 sample values of r (sample size 4) from a bivariate normal population with no correlation. [14] Gosset did not actually succeed in deriving a sampling distribution for ñ. Subsequently, R. A. Fisher (1915) derived the sampling distribution of ñ, using a geometrical argument, and this work led to the famous Fisher-Gosset correspondence. [15]
Gosset worked on a variety of statistical problems related to experimentation in agriculture and brewery. [16] He argued actively with other leading statisticians of his time including Karl Pearson, R.A. Fisher and Egon Pearson. [17] Sir Ronald Fisher, [15] a giant among statisticians, called Gosset "the Faraday of statistics," recognizing his ability to grasp general principles and apply them to problems of practical significance.
Why A Pseudonym as 'Student'? | |  |
Gosset's main paper, "The Probable Error of a Mean," was published in 1908. But to protect trade secrets, Guinness would not allow employees to publish the results of the research. They wished to keep the advantages that were gained from employing statisticians a secret from their competitors. Gosset persuaded his bosses that there was nothing in his work that would benefit competitors; they allowed him to publish but under an assumed name "Student". [1],[2],[3],[4],[6],[14] Hence, anyone studying statistics encounters the name "Student" rather than that of the true author of the method. His most famous achievement is now referred to as the Student's t-distribution, which might otherwise have been the Gosset t-distribution.
Conclusion | |  |
Gosset's work has proven fundamental to statistical inference as practiced today. The world of research changed greatly to an era characterized from small-sample research. His work marked the beginning of serious statistical inquiry into small sample inference and forms the basis of the most frequently used statistical test in behavioral science today. It plays a crucial role in statistical analysis, for example, it is used to evaluate the effect of medical treatment when we compare patients taking a new drug with a control group taking a placebo. It was also central to the development of quality control. Though he is considered as a humble brewer, he is a statistical pioneer to be recognized for his breakthrough work.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References | |  |
1. | Pearson ES (Egon Sharpe), Barnard George A (George Alfred), Plackett RL, Gosset William Sealy, d. 1937 Student: a statistical biography of William Sealy Gosset. Clarendon Press; New York: Oxford University Press, Oxford, 1990. |
2. | Pearson ES. "Student" as statistician. Biometrika 1939;30:210-50. |
3. | Tankard JW Jr. The Statistical Pioneers. Cambridge: Schenkman Publishing Co; 1984. p. 106. |
4. | Box JF. Guinness, Gosset, Fisher, and small samples. Stat Sci 1987;2:45-52. |
5. | Moore DS. The Basic Practice of Statistics. 2 nd ed. New York: WH Freeman and Company; 2000. |
6. | "Student". The probable error of a mean. Biometrika 1908;6:1-25. |
7. | "Student". Student′s Collected Papers, (ed. by E.S. Pearson and J. Wishart), with a forward by L. McMullen. Biometrika Office, University College, 1942. |
8. | Pearson ES, Adyanthâya NK. The distribution of frequency constants in small samples from non-normal symmetrical and skew populations. Biometrika 1929;21:259-86. |
9. | Welch BL. "Student" and small sample theory. J Amer Statist Assoc 1958;53:777-88. |
10. | Pfanzagl J, Sheynin O. Studies in the history of probability and statistics XLIV: A forerunner of the t-distribution. Biometrika 1996;83:891-8. |
11. | Eisenhart C. On the transition from "Student′s" z to "Student′s" t. American Statistician 1979;33:6-10. |
12. | "Student". New table for testing the significance of observations. Metron 1925;5:105-8. |
13. | Box JF. Gosset, Fisher, and the t-distribution. American Statistician 1981;35:61-6. |
14. | "Student". Probable error of a correlation coefficient. Biometrika 1908;6:302-10. |
15. | Fisher RA. "Student". Ann Engenics 1939;9:1-9. |
16. | "Student". Statistics in biological research. Nature 1929;124:93. |
17. | McMullen L. "Student" as a man. Biometrika 1939;30:205-10. |
|