Share this page via:

Share on Twitter
Submit to StumbleUponSave on DeliciousDigg ThisSubmit to reddit

Neuromancy

I have completed my PhD in neuroscience, and am looking for the next step in my career. My thesis was an electrophysiological investigation of the sources of input to dopamine releasing neurons to help understand their function. In the meantime, I've worked for an environmental charity, creating teaching materials for 5-11 year old children to accompany a project to promote urban and suburban bee keeping.
More

What’s so significant about p < 0.05?

ResearchBlogging.orgInspired by the statistics classes I’m currently teaching, and a conversation I recently had in the pub with some colleagues (because I’m just that exciting), I’ve been wondering about why p < 0.05 is the most common threshold for statistical significance, at least in the psychological sciences. I realised that the choice of threshold was probably arbitrary to a certain extent, but I thought that maybe it was at least a useful arbitrary value for whatever purpose p values were first used for. I had been teaching about t-tests, so they were on my mind. I knew that the Student’s t-test was created by William Gosset to help quality control at the Guinness brewery (the brewery forced him to publish under a pseudonym – Student – to conceal from competitors that they were using statistics). Perhaps a false positive rate 1 in 20 was considered to be a reasonable error rate in brewery quality control? Apparently not…

The threshold, or indeed any threshold, doesn’t seem to have arisen with Gosset. P values certainly pre-date Gosset and the t-test anyway, but the publication of his tables of the t-statistic (or rather, what he referred to as the z-statistic), and the tables of his colleague Pearson’s χ2 distribution, provided precise p values to 4 decimal places for a given value of t or χ2. Instead, our fixation on p < 0.05 seems to be at least in part due to the issues between Pearson, and another statistician, R.A. Fisher. Fisher had created more statistical tests, and wanted to reproduce Gosset’s tables. However, permission was refused because of financial issues over granting copyright and disagreements over theory between Pearson and Fisher, so Fisher had to re-create the tables. Fisher rearranged the data, and instead of providing exact p values for a value of t he provided t values for values of p.

Sections of Student’s (top) and Fisher’s (bottom) tables. From Clauser (2008) Chance, 21(4):6-11

Although it is apparently “a matter of historical fact that Fisher was the first to have published tables in this form”, there is evidence pre-dating Fisher and Pearson that p values were considered as an indication of findings of further interest, and the threshold of interest was usually around 0.05. Warnings about the overuse of thresholds of significance were also surfacing as early as 1919 — 6 years before Fisher’s tables. So it seems unfair to lay the blame for p values obsession at Fisher’s door, but the publication and widespread use of his tables in a form that focused on round p values seems to have helped to reinforce the habit. Fisher doesn’t appear to have recommended the use of absolute thresholds of significance; he considered p values above 0.2 to be indicative of no effect, but values between 0.05 and 0.2 to be a suggestion that an effect might be detectable with sufficient modification of the experiment. Most of his tables reflected this; they provided values of several test statistics for a range of p values. However, when he produced tables for his newly introduced F statistic, values were only produced for p = 0.05 for simplicity. Although later versions expanded to include other p values, people seemed to have latched on to 0.05 as an important value.

Perhaps because the tables opened up the arcane world of statistics to a wider audience, or maybe because of some historical tendency towards 1 in 20 as an intuitive compromise of sensitivity and false-positives, Fisher’s tables seem to have left us with the one thing that everyone who knows anything about statistics ‘knows’. Maybe if Fisher and Pearson had been on better terms, undergraduate statistics might have been very different…

Clauser, B. (2009). War, enmity, and statistical tables CHANCE, 21 (4), 6-11 DOI: 10.1007/s00144-008-0004-8
Stigler, S. (2009). Fisher and the 5% level CHANCE, 21 (4), 12-12 DOI: 10.1007/s00144-008-0033-3

Also, see Gerald Dalall’s article Why P = 0.05? for more detail, or if you can’t access the papers.

2 comments to What’s so significant about p < 0.05?

  • [...] in Reno: ESA 2011 November 16, 2011Infectious Salmon Anemia: the news keeps coming November 13, 2011What’s so significant about p < 0.05? November 12, 2011We prepare for presentations. November 11, 2011Evolution in General [...]

  • Interesting that advocacy of .2 to .05 right at the outset, which makes total sense.

    I seem to have lost my copy somewhere, but if you haven’t read Robert P Abelson’s Statistics as Principled Argument, a) it’s brilliant and b) I think it talks a bit about the choice of p cut-off somewhere. Oh and c) it’s brilliant.

    Cheers
    Alex

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>