Maximum likelihood estimation

Introduction

Maximum likelihood estimation (MLE) is a method of estimating the parameters of a statistical model given observations, by finding the parameter values that maximize the likelihood of making the observed data. MLE can be seen as a special case of the maximum a posteriori estimation (MAP) that assumes a uniform prior distribution of the parameters. This method, introduced by the English statistician Ronald A. Fisher, has found widespread application in a broad range of fields including statistics, machine learning, and economics.

A computer screen showing a statistical model with parameters being adjusted for maximum likelihood estimation.

Mathematical Formulation

Given a set of n independent observations (x1, x2, ..., xn), which are identically distributed and drawn from a probability distribution f(x|θ), where θ is a vector of parameters, the likelihood function L(θ|x) is given by the product of the probability density (or mass) function of the observations:

L(θ|x) = f(x1|θ) * f(x2|θ) * ... * f(xn|θ)

The maximum likelihood estimation θ̂ is the value of θ that maximizes this likelihood function. In practice, it is often more convenient to maximize the natural logarithm of this function, known as the log-likelihood.

Properties of Maximum Likelihood Estimators

Maximum likelihood estimators have several desirable properties:

- Consistency: As the sample size increases, the MLE converges in probability to the true parameter value. - Asymptotic normality: Under certain conditions, the distribution of the MLE approaches a normal distribution as the sample size increases. - Efficiency: Among all consistent estimators, the MLE has the smallest possible variance, a property known as asymptotic efficiency. - Invariance: If θ̂ is the MLE of θ, and g(θ) is any function, then g(θ̂) is the MLE of g(θ).

Applications of Maximum Likelihood Estimation

Maximum likelihood estimation is used in many areas of statistics and machine learning:

- In regression analysis, MLE can be used to estimate the parameters of the underlying distribution of the error terms. - In machine learning, MLE is often used for supervised learning tasks, where the goal is to learn a function that maps inputs to outputs based on example input-output pairs. - In econometrics, MLE is used to estimate the parameters of economic models. - In signal processing, MLE can be used to estimate parameters such as the mean and variance of a signal.

Limitations and Criticisms

While MLE has many desirable properties, it also has some limitations:

- The likelihood function may not always have a maximum. This can occur if the likelihood function is unbounded or if the parameter space is not compact. - The MLE can be sensitive to the choice of the model. If the model is misspecified, the MLE can be biased or inconsistent. - The MLE does not always exist. For example, if the likelihood function is flat (i.e., the same for all values of the parameters), then the MLE is undefined. - The MLE can be computationally intensive to calculate, especially for complex models with many parameters.

Introduction

Mathematical Formulation

Properties of Maximum Likelihood Estimators

Applications of Maximum Likelihood Estimation

Limitations and Criticisms

See Also