Hypergeometric distribution

From Canonica AI

Introduction

The hypergeometric distribution is a discrete probability distribution that describes the probability of k successes in n draws from a finite population of size N containing exactly K successes, without replacement. This distribution is particularly useful in scenarios where the sample size is a significant fraction of the total population, making it distinct from the binomial distribution, which assumes replacement.

Definition

Formally, the probability mass function (PMF) of the hypergeometric distribution is given by:

\[ P(X = k) = \frac{\binom{K}{k} \binom{N-K}{n-k}}{\binom{N}{n}} \]

where: - \( \binom{a}{b} \) denotes a binomial coefficient, which is the number of ways to choose b elements from a set of a elements. - \( N \) is the population size. - \( K \) is the number of success states in the population. - \( n \) is the number of draws. - \( k \) is the number of observed successes.

Properties

Mean and Variance

The mean \( \mu \) and variance \( \sigma^2 \) of a hypergeometric distribution are given by:

\[ \mu = n \frac{K}{N} \]

\[ \sigma^2 = n \frac{K}{N} \left(1 - \frac{K}{N}\right) \frac{N - n}{N - 1} \]

These formulas reflect the fact that the hypergeometric distribution accounts for the decreasing population size as samples are drawn without replacement.

Support

The support of the hypergeometric distribution is the set of integers \( k \) for which the PMF is non-zero. Specifically, \( k \) can take values from \( \max(0, n + K - N) \) to \( \min(K, n) \).

Applications

The hypergeometric distribution is widely used in various fields such as:

Quality Control

In quality control, the hypergeometric distribution can be used to model the number of defective items in a sample drawn from a batch. For example, if a factory produces a batch of 1000 items with 50 defective items, the distribution can help determine the probability of finding a certain number of defective items in a sample of 10.

Ecology

Ecologists use the hypergeometric distribution to estimate population sizes and the distribution of species. For instance, when studying a population of animals, researchers might capture, tag, and release a number of individuals, and then recapture a sample to estimate the total population size.

Card Games

In card games, the hypergeometric distribution can be used to calculate the probability of drawing a specific combination of cards from a deck. For example, in poker, it can determine the likelihood of drawing a particular hand.

Calculation Methods

Direct Computation

Direct computation of the hypergeometric PMF involves calculating binomial coefficients, which can be computationally intensive for large populations. Efficient algorithms and libraries are available in various programming languages to handle these calculations.

Approximation

For large populations, the hypergeometric distribution can be approximated by the binomial distribution or the normal distribution. The binomial approximation is suitable when the sample size is small relative to the population size, while the normal approximation is used when both the sample size and the number of successes are large.

Relationship to Other Distributions

Binomial Distribution

The hypergeometric distribution is related to the binomial distribution. While the binomial distribution assumes independent trials with replacement, the hypergeometric distribution deals with dependent trials without replacement. When the population size is large compared to the sample size, the hypergeometric distribution approaches the binomial distribution.

Negative Hypergeometric Distribution

The negative hypergeometric distribution is a variation where the number of failures is fixed, and the number of draws is random. It describes the probability of drawing a certain number of successes before a specified number of failures occurs.

Examples

Consider a scenario where a deck of 52 playing cards contains 4 aces. If 5 cards are drawn without replacement, the probability of drawing exactly 2 aces can be calculated using the hypergeometric distribution.

\[ P(X = 2) = \frac{\binom{4}{2} \binom{48}{3}}{\binom{52}{5}} \]

Calculating the binomial coefficients:

\[ \binom{4}{2} = 6 \] \[ \binom{48}{3} = 17296 \] \[ \binom{52}{5} = 2598960 \]

Thus,

\[ P(X = 2) = \frac{6 \times 17296}{2598960} \approx 0.0399 \]

See Also

Image

A deck of playing cards spread out on a table.
A deck of playing cards spread out on a table.

Categories