Restricted maximum likelihood
Introduction
Restricted Maximum Likelihood (REML) is a statistical technique used primarily in the field of statistics and biostatistics for estimating the variance components in linear mixed models. It is particularly useful in situations where the data are unbalanced or when the sample sizes are small. Unlike the traditional maximum likelihood estimation, REML takes into account the degrees of freedom lost due to the estimation of fixed effects, providing more accurate and less biased estimates of variance components.
Historical Background
The concept of REML was first introduced by Thompson, Patterson, and Harville in the 1960s and 1970s. It was developed as an improvement over the traditional maximum likelihood estimation methods, which often resulted in biased estimates, especially in small sample sizes. The method gained popularity in the 1980s and 1990s with the advent of more powerful computational tools, which allowed for the practical application of REML in complex statistical models.
Theoretical Foundation
REML is grounded in the theory of likelihood and estimation theory. It is based on the principle of maximizing a likelihood function that is derived from a linear transformation of the data, which eliminates the fixed effects. This transformation results in a likelihood that depends only on the variance components, thus providing unbiased estimates.
Likelihood Function
In REML, the likelihood function is constructed by integrating out the fixed effects from the joint likelihood of the data and the fixed effects. This is achieved by applying a linear transformation to the data, which results in a reduced likelihood function that depends solely on the variance components. The REML estimator is then obtained by maximizing this reduced likelihood function.
Variance Components
Variance components are crucial parameters in mixed models, representing the variability attributable to different sources, such as random effects or measurement error. REML provides estimates of these components by maximizing the likelihood of the transformed data, which is independent of the fixed effects.
Computational Methods
The computation of REML estimates involves iterative procedures, as the likelihood equations are typically nonlinear and do not have closed-form solutions. Common algorithms used include the Expectation-Maximization (EM) algorithm, the Newton-Raphson method, and the Fisher scoring method. These algorithms iteratively update the estimates of the variance components until convergence is achieved.
Expectation-Maximization Algorithm
The EM algorithm is a popular choice for computing REML estimates due to its stability and robustness. It involves two steps: the Expectation step (E-step), where the expected value of the log-likelihood is computed given the current estimates, and the Maximization step (M-step), where the parameters are updated to maximize this expected log-likelihood.
Newton-Raphson Method
The Newton-Raphson method is an iterative optimization technique that uses the first and second derivatives of the likelihood function to update the parameter estimates. It is known for its rapid convergence properties, although it requires the computation of the Hessian matrix, which can be computationally intensive.
Fisher Scoring Method
The Fisher scoring method is a variant of the Newton-Raphson method that replaces the Hessian matrix with the Fisher information matrix. This approach often results in more stable convergence, particularly in cases where the likelihood surface is flat or ill-conditioned.
Applications
REML is widely used in various fields, including agriculture, genetics, econometrics, and psychometrics. It is particularly valuable in the analysis of longitudinal data, multilevel models, and random effects models.
Agriculture
In agriculture, REML is used to analyze data from field trials and breeding experiments, where it helps in estimating the genetic and environmental variance components. This information is crucial for selecting superior genotypes and improving crop yields.
Genetics
In genetics, REML is employed to estimate heritability and genetic correlations in quantitative trait loci (QTL) mapping studies. It allows researchers to partition the phenotypic variance into genetic and environmental components, facilitating the identification of genes associated with complex traits.
Econometrics
In econometrics, REML is applied in the analysis of panel data and hierarchical models, where it provides unbiased estimates of the variance components associated with random effects. This is essential for making accurate inferences about the underlying economic processes.
Psychometrics
In psychometrics, REML is used in the estimation of variance components in item response theory (IRT) models and factor analysis. It helps in understanding the sources of variability in test scores and improving the reliability of psychological measurements.
Advantages and Limitations
REML offers several advantages over traditional maximum likelihood estimation, including unbiased estimates of variance components and robustness to small sample sizes. However, it also has limitations, such as increased computational complexity and the need for iterative algorithms.
Advantages
1. **Unbiased Estimates**: REML provides unbiased estimates of variance components by accounting for the loss of degrees of freedom due to the estimation of fixed effects.
2. **Robustness**: REML is robust to unbalanced data and small sample sizes, making it suitable for a wide range of applications.
3. **Flexibility**: REML can be applied to a variety of statistical models, including linear mixed models, generalized linear mixed models, and nonlinear mixed models.
Limitations
1. **Computational Complexity**: The computation of REML estimates requires iterative algorithms, which can be computationally intensive and time-consuming.
2. **Convergence Issues**: The iterative algorithms used in REML estimation may face convergence issues, particularly in complex models with many parameters.
3. **Assumptions**: REML relies on the assumption that the random effects are normally distributed, which may not hold in all applications.
Conclusion
Restricted Maximum Likelihood is a powerful statistical tool for estimating variance components in mixed models. Its ability to provide unbiased estimates and handle unbalanced data makes it an essential technique in various fields, including agriculture, genetics, econometrics, and psychometrics. Despite its computational complexity, REML remains a preferred method for variance component estimation due to its robustness and flexibility.