Linear Regression

From Canonica AI

Introduction

Linear regression is a type of statistical model that attempts to depict the relationship between two variables by fitting a linear equation to observed data. The steps to perform multiple linear regression are almost identical to those of simple linear regression. The difference lies in the evaluation of the coefficient and intercept.

History

The method of least squares, which is used in linear regression, was first published by Adrien-Marie Legendre in 1805. The term "regression" was coined by Francis Galton in the 19th century to describe a biological phenomenon. The phenomenon was that the heights of descendants of tall ancestors tend to regress down towards a normal average (a phenomenon also known as regression toward the mean).

Simple and Multiple Linear Regression

Linear regression models are often fitted using the least squares approach, but they can also be fitted in other ways, such as by minimizing the "lack of fit" in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares loss function as in ridge regression.

A scatter plot with a line of best fit demonstrating the concept of linear regression.

Assumptions

Linear regression models have several assumptions. The residuals are assumed to be normally distributed. Independence of observations is assumed. Homoscedasticity is assumed, which is the same variance within our error terms. Multicollinearity is also assumed, which is that the predictors are not too highly correlated with each other.

Applications

Linear regression is widely used in biological, behavioral and social sciences to describe possible relationships between variables. It ranks as one of the most important tools used in these disciplines.

Limitations

Linear regression is not robust against outliers and may produce biased estimates when the errors are heteroscedastic or when the errors are not normally distributed. These issues can be addressed by using robust regression methods, or by using methods that are robust against outliers and which handle heteroscedastic errors, like quantile regression.

See Also