Median (statistics)

From Canonica AI

Definition

In statistics, the median is a value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as the "middle" value. For example, the basic advantage of the median in describing data compared to the mean (often simply described as the "average") is that it is not skewed so much by extremely large or small values, and so it may give a better idea of a "typical" value.

Mathematical Properties

The median has robust properties in the presence of outliers, and is also a measure of location that is interpretable in the original units of measurement. The median is one of the three main measures of central tendency, which are also collectively known as the measures of location, and in some texts the median is referred to as a measure of location. This is a general term that encompasses several descriptive statistics including the arithmetic mean, median, and mode.

Calculation

The calculation of the median depends on whether the data set is ungrouped or grouped and whether the number of observations n is odd or even. For ungrouped data, if n is odd, the median is the value at position (n+1)/2 in the ordered data set. If n is even, the median is the arithmetic mean of the two values at positions n/2 and n/2+1 in the ordered data set. For grouped data, the median is interpolated between the lower class limit of the median group and the lower class limit of the next group.

A set of numerical data points being sorted in ascending order, with the median value being highlighted.
A set of numerical data points being sorted in ascending order, with the median value being highlighted.

Uses

The median is used primarily as a measure of central tendency when one is dealing with ordinal data, such as income levels or ranks. It is also used in the calculation of the interquartile range, a measure of statistical dispersion.

Comparison with Other Statistics

The median is less affected by outliers and skewed data than the mean, and is usually the preferred measure of central tendency when the distribution is not symmetrical. The median is also a more robust estimator of location than the mean, as it is less affected by occasional, arbitrarily large or small values. However, the median is not as mathematically tractable as the mean, and it may be more difficult to use in some types of statistical analyses.

Limitations

While the median provides a measure of central tendency, it does not provide a complete picture of the data. It does not give any information about the spread or dispersion of the data. Also, the median is not defined for all types of distributions. For example, it is not defined for a multimodal distribution.

See Also

Categories