Probability Distributions

Definition

A probability distribution is a mathematical function that describes the probability of different possible values of a variable. Probability distributions are often depicted using graphs or probability tables.

Common probability distributions include the binomial distribution, Poisson distribution, and uniform distribution. Certain types of probability distributions are used in hypothesis testing, including the standard normal distribution, Student’s t distribution, and the F distribution. ¹

Before delving into probability distributions, there are prerequisite concepts that have to be initially understood: Discrete Data, Continuous Data and Random Variable.

Discrete Data

The term “Discrete” can be defined as “separate”, “distinct”, or “detached”. Discrete data can only take on particular values. Each value is distinct and there’s no grey area in between. Discrete data can be numeric — like numbers of apples — but it can also be categorical — like red or blue, or male or female, or good or bad. There are two questions you can ask yourself when deciding if data is discrete: ²

Can you count it?
Can it be divided into smaller and smaller parts?

Continuous Data

If a data point can take on any value between two specified values, it is considered to be continuous. Continuous data is often measurements on a scale, such as height, weight, and temperature. Continuous data is not restricted to defined separate values, but can occupy any value over a continuous range. Between any two continuous data values, there may be an infinite number of others. Continuous data is always essentially numeric. ²

Random Variable

A random variable or stochastic variable can be conceptualized informally as a variable whose values depend on outcomes of a random phenomenon. It is a way to map outcomes of random processes to numbers. In other words we are quantifying outcomes by mapping them to a number. For example, we can define a random variable X in which X =1 if we toss a fair coin and it lands on heads and X = 0 if it lands on tails. In statistics, we deem phenomenon to be random if individual outcomes are uncertain but there in nonetheless a regular distribution of outcomes in a large number of repetitions. “Random” in statistics is not a synonym for “haphazard” but a description of a kind of order that emerges only in the long run. Random variables are a useful mechanism to assign probabilities to sample outcomes. Suppose that to each point of a sample space we assign a number. We then have a function defined on the sample space. This function, called a random variable, is usually denoted by a capital letter such as X or Y. ²

Who

The following share how probability is used in real-life situations on a regular basis. ¹⁰

Weather Forecasting
Sports Betting
Politics
Sales Forecasting
Health Insurance
Grocery Store Staffing
Natural Disasters
Traffic
Investing
Statistician
Cost Estimator
Insurance Underwriter
Market Research Analyst
Atmospheric Scientists

What

Normal

In a normal distribution, data is symmetrically distributed with no skew. When plotted on a graph, the data follows a bell shape, with most values clustering around a central region and tapering off as they go further away from the center.

Normal distributions are also called Gaussian distributions or bell curves because of their shape. ¹

Scribbr

Instagram Post

Equation That Changed The World (Normal Distribution)

Probably the most widely known and used of all distributions is the Normal Distribution. It transformed how we understand medical trials and how we gamble. It also changed almost all psychological and educational applications of our modern world. Statisticians and scientists use the normal distribution to measure reading ability, job satisfaction, surveys, IQ scores, blood pressure, measurement errors, etc.

Definition: A normal (or Gaussian) distribution is a type of continuous probability distribution for a real-valued random variable. The general form of its probability density function is shown in the image. It fits many human characteristics, such as height, weight, speed etc. The normal distribution is described by two parameters: the Mean, μ, and the StandardDeviation, σ.

The normal distribution has the following characteristics:

1) It is a continuous distribution
2) It is symmetrical about the mean. Each half of the distribution is a mirror image of the other half.
3) It is asymptotic to the horizontal axis. That is, it does not touch the x-axis and it goes on forever in each direction. (See the comments section below).
4) It is unimodal. The normal curve is sometimes called a bell-shaped curve. All the values are “bunched up” in only one portion of the graph – the center of the curve.
5) It is a family of curves. Every unique value of the mean and every unique value of the standard deviation result in a different normal curve.
6) The area under the curve is 1. The area under the curve yields the probabilities, so the total of all probabilities for a normal distribution is 1. Since the distribution is symmetric, the area of the distribution on each side of the mean is 0.5.

Standard Normal

The standard normal distribution, also called the z-distribution, is a special normal distribution where the mean is 0 and the standard deviation is 1.

Any normal distribution can be standardized by converting its values into z-scores. Z-scores tell you how many standard deviations from the mean each value lies.

The standard normal distribution has a mean of 0 and a standard deviation of 1. — *Scribbr*

Converting a normal distribution into a z-distribution allows you to calculate the probability of certain values occurring and to compare different data sets. ³

Normal distribution vs the standard normal distribution

All normal distributions, like the standard normal distribution, are unimodal and symmetrically distributed with a bell-shaped curve. However, a normal distribution can take on any value as its mean and standard deviation. In the standard normal distribution, the mean and standard deviation are always fixed.

Every normal distribution is a version of the standard normal distribution that’s been stretched or squeezed and moved horizontally right or left.

The mean determines where the curve is centered. Increasing the mean moves the curve right, while decreasing it moves the curve left.

The standard deviation stretches or squeezes the curve. A small standard deviation results in a narrow curve, while a large standard deviation leads to a wide curve. ³

The standard normal distribution compared with other normal distributions on a graph — *Scribbr*

Curve	Position or shape (relative to standard normal distribution)
A (M = 0, SD = 1)	Standard normal distribution
B (M = 0, SD = 0.5)	Squeezed, because SD < 1
C (M = 0, SD = 2)	Stretched, because SD > 1
D (M = 1, SD = 1)	Shifted right, because M > 0
E (M = –1, SD = 1)	Shifted left, because M < 0

Chi-Squared Distribution

A chi-square (Χ²) distribution is a continuous probability distribution that is used in many hypothesis tests.

The shape of a chi-square distribution is determined by the parameter k. The graph below shows examples of chi-square distributions with different values of k. ⁵

Chi-Squared Table

The chi-square (Χ²) distribution table is a reference table that lists chi-square critical values. A chi-square critical value is a threshold for statistical significance for certain hypothesis tests and defines confidence intervals for certain parameters.

Chi-square critical values are calculated from chi-square distributions. They’re difficult to calculate by hand, which is why most people use a reference table or statistical software instead. ⁶

You will need a chi-square critical value if you want to: ⁶

Calculate a confidence interval for a population variance or standard deviation
Test whether the variance or standard deviation of a population is equal to a certain value (test of a single variance)
Test whether the frequency distribution of a categorical variable is different from your expectations (chi-square goodness of fit test)
Test whether two categorical variables are related to each other (chi-square test of independence)
Test whether the proportions of two closely related variables are equal (McNemar’s test)

Poisson Distribution

A Poisson distribution is a discrete probability distribution. It gives the probability of an event happening a certain number of times (k) within a given interval of time or space.

The Poisson distribution has only one parameter, λ (lambda), which is the mean number of events. The graph below shows examples of Poisson distributions with different values of λ. ⁴

t Distribution

The t-distribution, also known as Student’s t-distribution, is a way of describing data that follow a bell curve when plotted on a graph, with the greatest number of observations close to the mean and fewer observations in the tails.

It is a type of normal distribution used for smaller sample sizes, where the variance in the data is unknown. ⁷

The t-distribution follows a bell curve, with the most likely observations close to the mean and less likely observations in the tails. — *Scribbr*

In statistics, the t-distribution is most often used to: ⁷

Find the critical values for a confidence interval when the data is approximately normally distributed.
Find the corresponding p-value from a statistical test that uses the t-distribution (t-tests, regression analysis).

t Table

Student’s t table is a reference table that lists critical values of t. Student’s t table is also known as the t table, t-distribution table, t-score table, t-value table, or t-test table.

A critical value of t defines the threshold for significance for certain statistical tests and the upper and lower bounds of confidence intervals for certain estimates. It is most commonly used when:

Testing whether two means are significantly different (two-sample t tests)
Testing whether two variables are significantly related (linear regression or correlation)
Calculating confidence intervals (of means or regression coefficients)

The critical values of t are calculated from Student’s t distribution. Student’s t distribution is the distribution of the test statistic t. The critical values of t are difficult to calculate by hand, which is why most people use a t table or computer software instead. ⁹

The Pre-requisites Required to Using a T-table

The number of tails: We need to know whether our t-test is one-tailed or two-tailed because we will use the respective one-tail or two-tail row to mark the alpha level. The alpha levels are listed at top of the table (0.50, 0.25, 0.20, 0.15…for the one-tail and 1.00, 0.50, 0.40, 0.30…for the two-tails) and as you can see they vary based on whether the t-test is one-tail or two-tails.
Degrees of freedom: The degrees of freedom (df) indicate the number of independent values that can vary in an analysis without breaking any constraints. The degrees of freedom will either be explicitly mentioned in the problem statement or if it is not explicitly mentioned, all you have to do is subtract one from your sample size (n – 1) and what you get will be your df or degrees of freedom.
Alpha level: The alpha level ( α ), also known as the significance level is the probability of rejecting the null hypothesis when it is true. The common alpha levels for t-test are 0.01, 0.05 and 0.10

Once you have all three, all you have to do is pick the respective column for one-tail or two-tail from the table and map the intersection of the values for the degrees of freedom (df) and the alpha level. ⁸

Why

See Theoretical Knowledge Vs Practical Application.

How

Many of the References and Additional Reading websites and Videos will assist you with understanding and applying probability distributions.

As some professors say: “It is intuitively obvious to even the most casual observer.“

References

¹ Turney, Shaun. 2022. “Probability Distribution | Formula, Types, & Examples”. Scribbr. https://www.scribbr.com/statistics/probability-distributions/.

² Dagnechaw, Shiffraw. “What Is A Probability Distribution? And Why They Are Important?” 2020. Medium. https://shiffdag.medium.com/what-is-a-probability-distribution-and-why-they-are-important-dee5e5c1ba99.

³ Bhandari, Pritha. 2020. “The Standard Normal Distribution”. Scribbr. https://www.scribbr.com/statistics/standard-normal-distribution/.

⁴ Turney, Shaun. 2022. “Poisson Distributions | Definition, Formula & Examples”. Scribbr. https://www.scribbr.com/statistics/poisson-distribution/.

⁵ Turney, Shaun. 2022. “Chi-Square (Χ²) Distributions | Definition & Examples”. Scribbr. https://www.scribbr.com/statistics/chi-square-distributions/.

⁶ Turney, Shaun. 2022. “Chi-Square (Χ²) Table | Examples & Downloadable Table”. Scribbr. https://www.scribbr.com/statistics/chi-square-distribution-table/.

⁷ Bevans, Rebecca. 2020. “T-Distribution: What It Is And How To Use It”. Scribbr. https://www.scribbr.com/statistics/t-distribution/.

⁸ “T Table – T Table”. 2022. T Table. https://www.tdistributiontable.com/.

⁹ Turney, Shaun. 2022. “Student’s T Table (Free Download) | Guide & Examples”. Scribbr. https://www.scribbr.com/statistics/students-t-table/.

¹⁰ Zach. 2021. “10 Examples Of Using Probability In Real Life – Statology”. Statology. https://www.statology.org/probability-real-life-examples/.

Stelios Avramidis, and Robert Israel. 2015. “Why Is The Sum Of The Rolls Of Two Dices A Binomial Distribution? What Is Defined As A Success In This Experiment?”. Mathematics Stack Exchange. https://math.stackexchange.com/questions/1204396/why-is-the-sum-of-the-rolls-of-two-dices-a-binomial-distribution-what-is-define.