The Binomial Distribution
- It is a discrete probability distribution.
- The distribution of a random variable X is discrete, if it can assume only a finite or countably infinite number of values.
- Considering u the set of all possible values of X: $$\sum_u Pr \left(X = u\right) = 1 $$
- The binomial distribution is the discrete probability distribution of the number of successes in a sequence of n independent yes/no experiments, each of which yields success with probability p.
- Each success/failure experiment is called a Bernoulli trial.
- The binomial distribution is the basis of the binomial test of statistical significance
- It is frequently used to model the number of successes in a sample of size n drawn with replacement from a population of size N. The replacement after each draws makes the draws independents.
If the probability of a successful trial is p, then the probability of having exactly k successes in n identical independent trials is given by the probability mass function below:
\[\begin{aligned}
f\left(k; n,p \right) = Pr \left(X = k\right) = \binom{n}{k} p^k {\left( 1 - p \right)}^{n-k} \\
\text{for k = 0, 1, 2, ..., n, where} \\
\binom{n}{k} = \frac{n!}{k!(n-k)!}
\end{aligned} \]
The formula can be understood as follows: we want k successes (with probability $p^k$) and n-1 failures (probability ${\left( 1 - n \right)}^{n-k}$). However, the k successes can occur anywhere among the n trials, and there are $ \binom{n}{k}$ different ways of distributing k success in a sequence of n trials.
In each roll, the probability of rolling a particular number, say 2, is 1/6.
Consider the following problem:
One six-sided dice is rolled 15 times. What is the probability of rolling 5 or less 2's?
The probability of rolling 5 or less 2's is the sum of probabilities of rolling 0,1,2,3,4 and 5 2's.
\[\begin{aligned}
Pr \left(X \leq 5\right) = \sum_{k=0}^5 Pr \left(X = k\right)
\end{aligned} \]
Using R density or probability function dbinom() to obtain the probability:
- dbinom() returns the probability of an outcome of a binomial distribution
- The probability of rolling exactly 5 2's is
> dbinom(5, size=15, prob=0.167) [1] 0.06274624
- The probability of rolling 0,1,2,3,4 or 5 2's:
> dbinom(0, size=15, prob=0.167) + + dbinom(1, size=15, prob=0.167) + + dbinom(2, size=15, prob=0.167) + + dbinom(3, size=15, prob=0.167) + + dbinom(4, size=15, prob=0.167) + + dbinom(5, size=15, prob=0.167) [1] 0.9723556
- Alternatively, we can use the cumulative probability function for binomial distribution pbinom().
- $Pr\left(X \leq 5 \right)$
> pbinom(5,size=15, prob=0.167) [1] 0.9723556
- As seen above, the pbinom() function is useful to summing consecutive binomial probabilities.
- Other questions that can be answered include:
- What is the probability of rolling 5 or more 2's? $Pr\left(X \geq 5 \right) $
- $Pr\left(X \geq 5 \right) = 1 - Pr\left(X \leq 4 \right) = 1 - \text{pbinom(4, size=15, prob=0.167) = 0.09039}$
> 1 - pbinom(4, 15, 0.167) [1] 0.09039063What is the probability of rolling more than 4 and less than 8 2's? $Pr\left(4 \leq X \leq 8 \right)$
- $Pr\left(4 \leq X \leq 8 \right) = Pr\left(X \leq 8 \right) - Pr\left(X \leq 5 \right) = \text{pbinom(8, size=15, prob=0.167) - pbinom(5, 15, 0.167) = 0.02720835}$
> pbinom(8, 15, 0.16667) - pbinom(5,15, 0.16667) [1] 0.02720835
- Plotting the probability distribution:
df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167)) plot(df, type="b", xlab="Number (x) of rolls of 2's", ylab= "Pr(x)")Consider n=100 (number of observations), size=15 (number of trials), prob=0.167 (probability of success in each trial). bindat <- rbinom(100, 15, 0.167) hist(bindat, breaks=seq(0,10,1), xlab="N successes")Plotting the area showing the cumulative probability: What is the probability of rolling "at least" 5 2's (5 or more)?
df <- data.frame(x=1:15, prob=dbinom(1:15, 15, prob=0.167)) require(ggplot2) ggplot(data=df, aes(x=x,y=prob)) + geom_line() + geom_ribbon(data=subset(df,x>=5 & x<=15),aes(ymax=prob),ymin=0, fill="red", colour = NA, alpha = 0.5)