The Binomial Distribution and Related Distributions

An overview of the binomial distribution family: Bernoulli, binomial, categorical, and multinomial distributions with their probability mass functions.

Bernoulli Distribution

The Bernoulli distribution is a discrete probability distribution that models a trial with only two possible outcomes (e.g., heads or tails in a coin toss, success or failure). Typically, success is represented as 1 and failure as 0.

  • If the probability of success is $\mu$, then the probability of failure is $1 - \mu$.
  • $\mu$ takes values in the range $0 \le \mu \le 1$.

The probability mass function is: $$ p(x|\mu) = \mu^x (1 - \mu)^{1-x} $$ where $x$ takes values of 0 or 1.

Binomial Distribution

The binomial distribution is a discrete probability distribution that represents the probability of obtaining $r$ successes when independently repeating a Bernoulli trial $m$ times.

The probability mass function is: $$ p(r|m, \mu) = \binom{m}{r} \mu^r (1 - \mu)^{m-r} $$ where $\binom{m}{r} = \frac{m!}{r!(m-r)!}$ is the binomial coefficient.

Categorical Distribution / Multinoulli Distribution

This is a generalization of the Bernoulli distribution to trials where the outcome falls into three or more categories. For example, it can model the outcome of rolling a die once.

  • The probability of each category $j$ occurring is denoted $\mu_j$.
  • The constraint $\sum_{j=1}^k \mu_j = 1$ must be satisfied.

The probability mass function is: $$ p(x|\mu) = \prod_{j=1}^k \mu_j^{x_j} $$ where $x$ is a one-hot vector (e.g., if category $j$ occurs, only its element $x_j$ is 1 and all others are 0).

This distribution is specifically called the categorical distribution when representing the result of a single trial.

Multinomial Distribution

This is a generalization of the binomial distribution to the case where a categorical trial is independently repeated $m$ times, representing how many times each category appears.

The probability mass function is: $$ p(x_1, \dots, x_k | m, \mu_1, \dots, \mu_k) = \frac{m!}{x_1! x_2! \dots x_k!} \mu_1^{x_1} \mu_2^{x_2} \dots \mu_k^{x_k} $$ where $m = \sum_{j=1}^k x_j$ is the total number of trials.

References

  • Taro Tezuka, “Understanding Bayesian Statistics and Machine Learning,” Kodansha (2017)