Logistic Regression

Introduction

Logistic regression is a statistical model that in its basic form uses a logistic function to model a binary dependent variable, although many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model (a form of binary regression).

Mathematical Formulation

The logistic model is used to model the probability of a certain class or event existing such as pass/fail, win/lose, alive/dead or healthy/sick. This can be extended to model several classes of events such as determining whether an image contains a cat, dog, lion, etc. Each object being detected in the image would be assigned a probability between 0 and 1 and the sum adding to one.

A visual representation of a logistic regression model

Binary Logistic Regression

Binary logistic regression deals with situations where the outcome for the dependent variable can be one of two possible values, such as presence or absence of a disease. The dependent variable is a binary variable that contains data coded as 1 (yes, success, etc.) or 0 (no, failure, etc.). In other words, the logistic regression model predicts P(Y=1) as a function of X.

Multinomial Logistic Regression

Multinomial logistic regression is used when the dependent variable in question is nominal (i.e. it falls into one of a number of categories, with no order or priority implied). When the number of possible outcomes is only two, it is termed binomial logistic regression.

Ordinal Logistic Regression

Ordinal logistic regression is used when the dependent variable in question is ordinal, i.e. it falls into one of a number of categories, with an order or priority implied. The model's output is a linear combination of the predictors, and can be used as a predictor in a larger model.

Model Building

The process of setting up a logistic regression is very similar to that of multiple linear regression. First, a model is constructed with the predictors and the binary response variable. Then, the model is fit using a logistic function, which is a version of the linear regression model that is adapted for binary response variables.

Model Fitting

The model is fit by finding the values of the parameters that maximize the likelihood of producing the observed data. This is done using a method called maximum likelihood estimation. The result is a set of coefficients that can be used to predict the probability of a given outcome given the values of the predictors.

Model Evaluation

Once the model has been fit, it can be evaluated to determine how well it is likely to predict new data. This is done by examining the goodness-of-fit measures, the statistical significance of the predictors, and the classification accuracy of the model.

Model Interpretation

The coefficients in a logistic regression model are interpreted as the change in the log odds of the outcome for a one unit change in the predictor. This makes them somewhat difficult to interpret, so they are often exponentiated to change them into odds ratios.

Applications

Logistic regression is used in various fields, including machine learning, most medical fields, and social sciences. For example, the Trauma and Injury Severity Score (TRISS), which is widely used to predict mortality in injured patients, was originally developed by Boyd et al. using logistic regression.