Negative Binomial Distribution

Introduction

The negative binomial distribution is a discrete probability distribution that models the number of successes in a sequence of independent and identically distributed Bernoulli trials before a specified number of failures occurs. It is a generalization of the geometric distribution and is particularly useful in scenarios where the probability of success is low and the number of trials is high.

Definition and Properties

The negative binomial distribution can be defined in two parameterizations: the number of failures (r) and the probability of success (p), or the number of successes (k) and the probability of success (p). The probability mass function (PMF) for the negative binomial distribution, when parameterized by the number of failures, is given by:

\[ P(X = k) = \binom{k + r - 1}{k} (1 - p)^r p^k \]

where: - \( k \) is the number of successes, - \( r \) is the number of failures, - \( p \) is the probability of success in each trial, - \( \binom{k + r - 1}{k} \) is a binomial coefficient.

The mean (expected value) and variance of the negative binomial distribution are given by:

\[ \text{Mean} = \frac{r(1 - p)}{p} \] \[ \text{Variance} = \frac{r(1 - p)}{p^2} \]

Relationship to Other Distributions

The negative binomial distribution is closely related to several other probability distributions. For instance, it generalizes the geometric distribution, which is a special case where \( r = 1 \). Additionally, it can be seen as a Poisson distribution with a gamma-distributed rate parameter, leading to its use in over-dispersed count data modeling.

Applications

The negative binomial distribution is widely used in various fields such as ecology, epidemiology, and insurance. In ecology, it models the distribution of species in a given area, accounting for over-dispersion. In epidemiology, it is used to model the number of disease outbreaks or occurrences. In insurance, it helps in modeling the number of claims or losses.

Estimation and Inference

Parameter estimation for the negative binomial distribution can be performed using methods such as maximum likelihood estimation (MLE) and method of moments. MLE involves finding the parameter values that maximize the likelihood function, while the method of moments involves equating sample moments to theoretical moments.

Example

Consider a scenario where a researcher is studying the number of plants that survive after being exposed to a certain pesticide. The researcher conducts a series of trials, each with a probability \( p \) of a plant surviving. The negative binomial distribution can be used to model the number of plants that survive before a certain number of plants die.

Generalizations and Extensions

The negative binomial distribution can be extended to the zero-inflated negative binomial distribution, which accounts for excess zeros in the data. Another extension is the negative binomial regression, which models count data with over-dispersion by incorporating covariates.