Linear Least Squares
Introduction
Linear least squares is a statistical method used to determine the best-fitting linear relationship between a set of observed data points and a model. This technique minimizes the sum of the squares of the residuals, which are the differences between the observed values and the values predicted by the linear model. Linear least squares is a fundamental tool in regression analysis, widely applied in various scientific and engineering disciplines to model relationships between variables.
Mathematical Formulation
The linear least squares problem can be expressed in matrix form. Consider a set of observations \((x_1, y_1), (x_2, y_2), \ldots, (x_n, y_n)\), where \(x_i\) are the independent variables and \(y_i\) are the dependent variables. The goal is to find the coefficients \(a\) and \(b\) of the linear equation \(y = ax + b\) that minimize the sum of squared residuals:
\[ S = \sum_{i=1}^{n} (y_i - (ax_i + b))^2 \]
In matrix notation, this can be represented as:
\[ \mathbf{y} = \mathbf{X}\mathbf{\beta} + \mathbf{\epsilon} \]
where \(\mathbf{y}\) is the vector of observed values, \(\mathbf{X}\) is the design matrix containing the independent variables, \(\mathbf{\beta}\) is the vector of coefficients, and \(\mathbf{\epsilon}\) is the vector of residuals.
The least squares estimate of \(\mathbf{\beta}\) is given by:
\[ \mathbf{\hat{\beta}} = (\mathbf{X}^T \mathbf{X})^{-1} \mathbf{X}^T \mathbf{y} \]
This solution assumes that the matrix \(\mathbf{X}^T \mathbf{X}\) is invertible, which is generally the case when the columns of \(\mathbf{X}\) are linearly independent.
Properties and Assumptions
Linear least squares estimation relies on several key assumptions:
1. **Linearity**: The relationship between the dependent and independent variables is linear. 2. **Independence**: The residuals are independent of each other. 3. **Homoscedasticity**: The residuals have constant variance. 4. **Normality**: The residuals are normally distributed.
These assumptions ensure that the least squares estimates are unbiased and have minimum variance among all linear unbiased estimators, a property known as the Gauss-Markov theorem.
Applications
Linear least squares is employed in various fields, including:
- **Econometrics**: To model economic relationships and forecast economic indicators. - **Engineering**: For system identification and control system design. - **Physics**: To fit experimental data to theoretical models. - **Biostatistics**: In clinical trials and epidemiological studies to assess relationships between variables.
Computational Techniques
The computation of linear least squares can be performed using several numerical methods, including:
- **Normal Equations**: Directly solving the matrix equation \((\mathbf{X}^T \mathbf{X}) \mathbf{\hat{\beta}} = \mathbf{X}^T \mathbf{y}\). - **QR Decomposition**: Decomposing the matrix \(\mathbf{X}\) into an orthogonal matrix \(\mathbf{Q}\) and an upper triangular matrix \(\mathbf{R}\), which provides a more stable solution than the normal equations. - **Singular Value Decomposition (SVD)**: Decomposing \(\mathbf{X}\) into matrices \(\mathbf{U}\), \(\mathbf{\Sigma}\), and \(\mathbf{V}^T\), which is particularly useful for ill-conditioned problems.
Extensions and Generalizations
Linear least squares can be extended to handle more complex scenarios:
- **Weighted Least Squares**: Accounts for heteroscedasticity by assigning different weights to different observations. - **Generalized Least Squares**: Addresses correlations between residuals. - **Nonlinear Least Squares**: Used when the relationship between variables is nonlinear, requiring iterative optimization techniques.
Limitations
Despite its widespread use, linear least squares has limitations:
- **Sensitivity to Outliers**: Outliers can disproportionately affect the fit, leading to biased estimates. - **Multicollinearity**: Highly correlated independent variables can inflate the variance of the coefficient estimates. - **Model Specification**: Incorrect model specification can lead to biased and inconsistent estimates.
Conclusion
Linear least squares is a versatile and powerful tool for modeling linear relationships between variables. Its mathematical simplicity and computational efficiency make it a staple in statistical analysis. However, careful consideration of its assumptions and limitations is crucial for accurate and reliable results.