Statistical model

From Canonica AI

Introduction

A statistical model is a mathematical construct that embodies a set of statistical assumptions concerning the generation of sample data. It represents, often in considerably idealized form, the data-generating process. The assumptions embodied by a statistical model describe a set of probability distributions, some of which are assumed to adequately approximate the distribution from which a particular data set is sampled.

Types of Statistical Models

Statistical models are typically classified in several ways. A model may be parametric, semiparametric or nonparametric, depending on the explicitness of the assumptions it makes. A model may also be classified as deterministic or stochastic, depending on whether it incorporates elements of randomness.

Parametric Models

Parametric models assume that the data follow some known distribution, the form of which is known but the parameters are unknown. For example, one might assume that the distribution of people's heights is normally distributed, but the mean and variance are unknown.

A photograph of a bell curve, representing a normal distribution.
A photograph of a bell curve, representing a normal distribution.

Semiparametric and Nonparametric Models

Semiparametric models and nonparametric models do not fully rely on specified parameters. Nonparametric models differ from parametric models in that the model structure is not specified a priori but is instead determined from data. The term nonparametric is not meant to imply that such models completely lack parameters but that the number and nature of the parameters are flexible and not fixed in advance.

Deterministic and Stochastic Models

A deterministic model is one in which every set of variable states is uniquely determined by parameters in the model and by sets of previous states of these variables. In contrast, in a stochastic model—as used in mathematical biology, for example—there are built-in elements of randomness.

Model Specification

Model specification involves the identification of the model type and the specific relationships within the model. This includes the selection of appropriate explanatory variables, the specification of functional forms, and the choice of the probability distribution of the random component.

Model Estimation

Model estimation involves the determination of the values of the parameters in the model. This is typically done by applying some estimation criterion, such as the method of maximum likelihood or the method of least squares.

Model Inference

Model inference involves drawing conclusions from the model, such as testing hypotheses about the parameters or predicting future observations. This often involves the use of statistical tests and confidence intervals.

Model Checking

Model checking involves assessing the adequacy of the model as a representation of the data. This typically involves the use of diagnostic tests and graphical diagnostics.

Model Selection

Model selection involves choosing among different models to find the one that best fits the data. This often involves the use of information criteria, such as the Akaike information criterion or the Bayesian information criterion.

Model Use

Once a model has been selected and its parameters estimated, the model can be used for a variety of purposes, such as description, prediction, and control. The specific use of the model will depend on the substantive questions of interest.

See Also