## Definition

In null-hypothesis significance testing, the ** p-value** is the probability of obtaining test results at least as extreme as the results actually observed, under the assumption that the null hypothesis is correct. A very small

*p*-value means that such an extreme observed outcome would be very unlikely under the null hypothesis. Reporting

*p*-values of statistical tests is common practice in academic publications of many quantitative fields.

^{1}

Okay. I ended up not really understanding what p-value really meant.

Until now, after going into data science field, I finally begin to appreciate the meaning of p-value and how it could be used as part of the decision making tools in certain experiments.

Therefore, I decided to **explain p-values in this article and how they could be used in hypothesis testings **to hopefully give you a better and intuitive understanding of p-values.

There are total four sections in this article to give you a full picture from constructing a hypothesis testing to understanding p-value and using that to guide our decision making process. I strongly encourage you to go through all of them to give you a detailed understanding of p-values:^{4}

**Hypothesis Testing****Normal Distribution****What is P-value?****Statistical Significance**

### Transposed Conditional Fallacy

According to the definition, ** p-value** refers to the probability of observing a result given that some hypothesis is true. In a mathematical expression:

*P( Observation| Hypothesis=True )*

However, ** p-value** is erroneously used as a “score” or “index” to assess the true or false of a hypothesis:

*P( Hypothesis = True | Observation )*

Since the two expressions are different, misunderstanding these two conceptions will cause conditional probability fallacy.^{3}

### Misconceptions

P value is often used as a metric to show how significant the results of experiments are. However, the misunderstandings of p-values are prevalent many scientific research and scientific educations. This made the American Statistical Association (ASA) release statement on p-values and statistical significance.^{3}

Informally, p-value is the probability under a specified statistical model that a statistical summary of the data (e.g., the sample mean difference between two compared groups) would be equal to or more extreme than its observed value.

American Statistical Association

These are some common **misconceptions regarding p-values**:^{3}

- P-value is
*not a probability that the null hypothesis is true or the probability that the alternative hypothesis is false* - P-value is
*not the probability that the observed effects were produced by random selection* - The
**0.05**significance level is merely a**convention** - P-value
**does not indicate the size or importance of the observed effect**

### Principles to Have in Mind When Using p-values

Many research use p-value to test the true or false of their model or classify results. However, p-value cannot be a sole evidence that concludes the viability of research or model. *Here are the principles that ASA strongly advise when using p-values:*^{3}

- P-values can indicate how incompatible the data are with a specified statistical model.
- P-values do not measure the probability that the studied hypothesis is true, or the probability that the data were produced by random chance alone
- Scientific conclusions and business or policy decisions should not be based only on whether a p0value passes a specific threshold
- Proper inference requires full reporting and transparency
- A p-value, or statistical significance does not measure the size of an effect or the importance of a result
- By itself, a p-value does not provide a good measure of evidence regarding a model or hypothesis

### Correlation Does Not Show Causation

This is the #1 thing people get wrong over and over. “I took this homeopathic remedy and felt better. See, homeopathy works!” “MRI scans of heavy porn users shows brain damage. See, porn damages your brain!” “Increases in ice cream sales are strongly correlated with drowning accidents. See, ice cream causes drowning!” “My baby got a shot and was diagnosed with autism. See, vaccines cause autism!” “Sales of organic produce are very strongly correlated with autism. See, organic farming causes autism!”

“…many people will accept results that are quite likely due to just pure random chance if they confirm what that person already wants to believe.”^{5}

## Who

If you are going to perform any statistical hypothesis testing, specifically in null hypothesis significance testing, then the p-value is of interest to you.

## What

Given the null hypothesis is true, a p-value is the probability of getting a result as or more extreme than the sample result by random chance alone. If a p-value is lower than our significance level, we reject the null hypothesis. If not, we fail to reject the null hypothesis.

The p-value is used to power both null and alternate hypothesis, It is the probability of obtaining a sample “more extreme” than the one observed in your data assuming the null hypothesis is true.

While performing analysis, if the p-value is found to be greater than 0.05, it reflects the null hypothesis can be accepted. Hence we conclude that there is no difference between the two groups. Here 0.05 is termed as the level of significance(ɑ).

In contrast to this, if the p-value is less than 0.05 then the null hypothesis can be rejected.

It also tells you how likely it is that a result occurred by chance.^{6}

## Why

Since Fisher introduced the p-value in 1925 it has become one of the cornerstones of statistical analysis, and any study that does not have a p-value attached to it is deemed worthless.

So why is it that seemingly important numbers as “our survey found a 57% increase…” is not accompanied by a p-value? Do they have something to hide? You can bet your sweet tush that they do!

Take the example of the cat food commercial, where they claim, “9 out of 10 cats prefer MiaoMix cat food”. To what? Budgie seed? And how many times did they repeat the experiment of putting cats in front of two bowls of food and waiting until 9 out of 10 went to the bowl that they wanted them to go to? If they had reported that ‘90 out of 100 cats preferred…’ it would be much more believable, but it is very much easier to get 9 out of 10 cats to go to the ‘right’ bowl than 90 out of a 100, especially when you spray the ‘wrong’ bowl with cat repellent. You see, sample size matters, and so does the p-value. Where is the p-value? Nowhere to be seen! How many repeats did they conduct on the experiment? No idea – they won’t tell us. And just where did they file the other results from all these repeat experiments? To quote the late, lamented Douglas Adams: They were on public display in the display department beneath the cellar in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard’…

So the next time you see a report, paper, thesis or whatever, take the time to check out the sample size and ask yourself if the sample is large enough to permit any reliable conclusion, then follow up with the question ‘what is the probability that this result is true?’ Look for whether a p-value is reported. If the study is statistically significant, you can bet your mother’s top lip that the authors will include a p-value and anything else that would add weight to their assertions. If they are not there, be very, very suspicious.^{2}

See Theoretical Knowledge Vs Practical Application.

## How

Many of the **References** and **Additional Reading**, websites and **Videos** will assist you with the calculation and understanding the p-value.

As some professors say: “It is intuitively obvious to even the most casual observer.”

## References

^{1} “P-Value – Wikipedia”. 2021. *en.wikipedia.org*. https://en.wikipedia.org/wiki/P-value.

^{2} Baker, Lee. *Truth, Lies & Statistics*. Dundee, Tayside, Scotland: CSI Publishing, 2017.

^{3} “P Value Fallacy And Misconceptions · Jaekeun Lee’s Space “. 2022. *Jaekeun Lee’s Space*. https://agdal1125.github.io/2018/12/04/P-value.html.

^{4} Lee, Admond. “P-Values Explained By Data Scientist”. 2019. *Medium*. https://towardsdatascience.com/p-values-explained-by-data-scientist-f40a746cfc8.

^{5} “What Does The Science Community Popularly Know That The Common Humans Don’t Know?”. 2022. *Quora*. https://www.quora.com/What-does-the-science-community-popularly-know-that-the-common-humans-dont-know/answer/Franklin-Veaux?ch=8&oid=39059142&share=84969b9a&srid=uLYie&target_type=answer.

^{6} Mathew, Abinith. “What Exactly Is P-Value ?” 2021. *Medium*. https://124abhinith.medium.com/what-exactly-is-p-value-d5be00a3bc81.

## Additional Reading

“Ch. 9 Hypothesis Testing with One Sample | openstax”. 2022. *openstax.org*. https://openstax.org/books/introductory-business-statistics/pages/9-introduction.

Now we are down to the bread and butter work of the statistician: developing and testing hypotheses. It is important to put this material in a broader context so that the method by which a hypothesis is formed is understood completely. Using textbook examples often clouds the real source of statistical hypotheses.

Statistical testing is part of a much larger process known as the scientific method. This method was developed more than two centuries ago as the accepted way that new knowledge could be created. Until then, and unfortunately even today, among some, “knowledge” could be created simply by some authority saying something was so, *ipso dicta*. Superstition and conspiracy theories were (are?) accepted uncritically.

The scientific method, briefly, states that only by following a careful and specific process can some assertion be included in the accepted body of knowledge. This process begins with a set of assumptions upon which a theory, sometimes called a model, is built. This theory, if it has any validity, will lead to predictions; what we call hypotheses.

Manfre, Diego. “P-Values: Innocent Until Proven Guilty”. 2022. *Medium*. https://towardsdatascience.com/p-values-innocent-until-proven-guilty-65cb4e93e52c.

The defendant sits speechless on the stand. The jury thinks he is guilty. He can feel it. The last piece of evidence had been devastating. A picture of him with blood on his shirt and a butcher knife in one hand could be enough to convict him. How could he explain that? How can he convince the jury that it was just a coincidence? He was not a murderer, he just happened to be in the wrong place, at the wrong time with the wrong appearance. The defence calls the last witness to the stand. He is a mathematician. His first statement is: “I have calculated the P-value in this situation and it is much lower than 5%”. The jury seems confused. The mathematician explains his calculation. The jury took a closer look at the evidence. At the end of the trial, they agree: the verdict is not guilty.

“Misuse Of P-Values – Wikipedia”. 2022. *en.wikipedia.org*. https://en.wikipedia.org/wiki/Misuse_of_p-values.

“Part B – The Transposed Conditional (1): Prosecutor’s Fallacy”. 2022. *Coursera*. https://www.coursera.org/lecture/challenging-forensic-science/part-b-the-transposed-conditional-1-prosecutors-fallacy-lNh8f.

The aim of this course is to promote critical thinking with regard to forensic science. Today, in general, most people are dazzled by the technical possibilities offered by forensic science. They somewhat live in the illusion that forensic evidence is fool proof and brings factual findings with 100% certainty. This course – given by specialists in the field – goes beyond the conventional image that is promoted through TV series such as CSI. It alerts (without alarming) the public on the limits of the techniques in order to promote a sound administration of forensic science in the criminal justice system. It allows participants to understand the importance of probabilistic reasoning in forensic science, because uncertainty is a constitutive part of forensic science. The course is constructed as a series of causes célèbres that could or have led to miscarriages of justice. Some of these cases have been part of case reviews carried out at the School of Criminal Justice of the University of Lausanne.

Mukherjee, Indranil. “P-Value Simplified For Absolute Beginners”. 2022. *Medium*. https://awkwardgen.medium.com/p-value-simplified-for-absolute-beginners-37f9dac7779c.

“P-Value”. 2022. *Corporate Finance Institute*. https://corporatefinanceinstitute.com/resources/knowledge/other/p-value/.

In statistical hypothesis testing, the p-value (probability value) is a probability measure of finding the observed, or more extreme, results, when the null hypothesis of a given statistical test is true. The p-value is a primary value used to quantify the statistical significance of the results of a hypothesis test.

“P-Values Are Less Diagnostic Than We Like To Think”. 2021. *Medium*. https://medium.com/@tadeg.quillien/p-values-are-less-diagnostic-than-we-like-to-think-430f1fceb363.

Sahoo, Nutan. “P-Values”. 2021. Medium. https://nutan-sahoo.medium.com/p-values-5e434205c105.

p-value — A magical number which tells us whether our hypothesis is feasible based on the sample that we have collected. But what do they actually mean?

Senthil, SJ. “P-Value: How We Know Science Isn’t Fake And Why It’s Failing”. 2022. *Medium*. https://medium.com/@SJsenthil/p-value-how-we-know-science-isnt-fake-and-why-it-s-failing-2a49ebe93400.

The P-value gives a number from 0 to 1, representing the chance that an experiment was a fluke and did not produce actual results. Usually, a P-value of 0.05 (as suggested by R. A. Fisher) is needed for a study to be accredited, but some fields have stricter thresholds.

The P-value considers three things for its value:

1. **The size of the dataset: **Having a larger dataset prevents flukes from making too big of an impact on the experiment’s outcome

2. **The size of the margin: **If an experiment shows a considerable difference between the effect of a drug versus a placebo, then the drug is more likely to be effective

3. **Variance across groups:** If different groups in the experiment have lower variance, then results are less likely to come from flukes

The P-value boils down all these different numbers into a single score, which is the chance that mere coincidence led to the experiment’s results, not genuine correlation.

“What Is A P-Value?” 2021. *Medium*. https://anthonybmasters.medium.com/what-is-a-p-value-32e83c1ba4ff.

**The p-value indicates compatibility between the model and data. **The calculation assumes a particular model holds. The p-value is then a measure of how extreme the observed data is. With low compatibility, this value provides evidence against the hypothesis *or* underlying assumptions.

“What P-Value Tells Us”. 2021. *Investopedia*. https://www.investopedia.com/terms/p/p-value.asp.

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, assuming that the null hypothesis is correct. The p-value is used as an alternative to rejection points to provide the smallest level of significance at which the null hypothesis would be rejected. A smaller p-value means that there is stronger evidence in favor of the alternative hypothesis.

## Videos

This statistics video explains how to use the p-value to solve problems associated with hypothesis testing. When the p-value is less than alpha, you should reject the null hypothesis and vice versa. This video discusses when you should use a one tailed test compared to a two tailed test. It contains two example problems that illustrates how to use the p-value method to determine if you should reject or not reject the null hypothesis.

Learn how to compare a P-value to a significance level to make a conclusion in a significance test.

In this StatQuest we learn how to calculate p-values using both discrete data (like coin tosses) and continuous data (like height measurements). At the end, we explain the differences between 1 and 2-sided p-values and why you should avoid 1-sided p-values if possible.

In statistics, the p-value is the probability of obtaining results at least as extreme as the observed results of a statistical hypothesis test, **assuming that the null hypothesis is correct**.

Today we’re going to begin our three-part unit on p-values. In this episode we’ll talk about Null Hypothesis Significance Testing (or NHST) which is a framework for comparing two sets of information. In NHST we assume that there is no difference between the two things we are observing and and use our p-value as a predetermined cutoff for if something seems sufficiently rare or not to allow us to reject that these two observations are the same. This p-value tells us if something is statistically significant, but as you’ll see that doesn’t necessarily mean the information is significant or meaningful to you.