Distribution functions

Distribution Functions

In probability theory and statistics, a distribution function, also known as a cumulative distribution function (CDF), is a function that describes the probability that a real-valued random variable X will take a value less than or equal to x. Distribution functions are fundamental in the study of probability distributions and are used to characterize the distribution of random variables.

Definition

Formally, the cumulative distribution function F(x) of a random variable X is defined as:

\[ F(x) = P(X \leq x) \]

where P denotes the probability. The function F(x) gives the probability that the random variable X is less than or equal to the value x. The CDF is a non-decreasing, right-continuous function that ranges from 0 to 1.

Properties

Distribution functions have several important properties:

**Non-decreasing**: F(x) is a non-decreasing function, meaning that if \( x_1 \leq x_2 \), then \( F(x_1) \leq F(x_2) \).
**Right-continuous**: F(x) is right-continuous, meaning that \( \lim_{x \to x_0^+} F(x) = F(x_0) \).
**Limits**: \( \lim_{x \to -\infty} F(x) = 0 \) and \( \lim_{x \to \infty} F(x) = 1 \).

Types of Distribution Functions

Distribution functions can be classified based on the nature of the random variable they describe:

**Discrete Distribution Functions**: For a discrete random variable, the CDF is a step function. Each jump in the function corresponds to a possible value of the random variable.
**Continuous Distribution Functions**: For a continuous random variable, the CDF is a continuous function. The derivative of the CDF, when it exists, is the probability density function (PDF).
**Mixed Distribution Functions**: Some random variables may have both discrete and continuous components, leading to a mixed distribution function.

Examples

Discrete Distribution Example

Consider a discrete random variable X that takes values 1, 2, and 3 with probabilities 0.2, 0.5, and 0.3, respectively. The CDF of X is given by:

\[ F(x) = \begin{cases} 0 & \text{if } x < 1 \\ 0.2 & \text{if } 1 \leq x < 2 \\ 0.7 & \text{if } 2 \leq x < 3 \\ 1 & \text{if } x \geq 3 \end{cases} \]

Continuous Distribution Example

Consider a continuous random variable X with a uniform distribution on the interval [0, 1]. The CDF of X is given by:

\[ F(x) = \begin{cases} 0 & \text{if } x < 0 \\ x & \text{if } 0 \leq x \leq 1 \\ 1 & \text{if } x > 1 \end{cases} \]

Applications

Distribution functions are used in various fields, including:

**Statistics**: To summarize data and make inferences about populations.
**Economics**: To model income distributions and other economic variables.
**Engineering**: In reliability analysis and quality control.
**Finance**: To model asset returns and risk.

Relationship with Other Functions

The CDF is closely related to other functions in probability theory:

**Probability Density Function (PDF)**: For a continuous random variable, the PDF is the derivative of the CDF.
**Survival Function**: The survival function, S(x), is defined as \( S(x) = 1 - F(x) \) and represents the probability that the random variable is greater than x.
**Quantile Function**: The quantile function, Q(p), is the inverse of the CDF and gives the value x such that \( F(x) = p \).

Multivariate Distribution Functions

For multivariate random variables, the joint cumulative distribution function is used. For random variables X and Y, the joint CDF is defined as:

\[ F_{X,Y}(x,y) = P(X \leq x, Y \leq y) \]

The joint CDF provides the probability that X is less than or equal to x and Y is less than or equal to y simultaneously.

Theoretical Considerations

In theoretical contexts, distribution functions are used to study the properties of random variables and their distributions. Important results include:

**Kolmogorov-Smirnov Test**: A non-parametric test that compares a sample distribution with a reference probability distribution.
**Glivenko-Cantelli Theorem**: States that the empirical distribution function converges uniformly to the true distribution function as the sample size increases.

Practical Considerations

In practice, distribution functions are often estimated from data using empirical distribution functions or parametric models. Techniques such as maximum likelihood estimation and Bayesian inference are commonly used to fit distribution functions to data.