Statistics

From Canonica AI

Introduction

Statistics is a branch of mathematics dealing with the collection, analysis, interpretation, presentation, and organization of data. In applying statistics to a scientific, industrial, or social problem, it is conventional to begin with a statistical population or a statistical model to be studied. Populations can be diverse groups of people or objects such as "all people living in a country" or "every atom composing a crystal". Statistics deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments.

A group of people working on statistical data analysis on their computers.
A group of people working on statistical data analysis on their computers.

History of Statistics

The term statistics, when referring to the scientific discipline, is singular, as in "Statistics is an art." This should not be confused with the word statistic, referring to a quantity (such as mean or median) calculated from a set of data, whose plural is statistics ("this statistic seems wrong" or "the statistics are misleading").

The word statistics comes from the modern Latin phrase statisticum collegium (lecture about state affairs), which gave rise to the Italian word statista (statesman or politician — compare to status) and the German Statistik (originally the analysis of data about the state). It ultimately derives from the Latin status, meaning "state or condition."

In early times, the meaning was restricted to information about states, particularly demographics such as population. This was later extended to include all collections of information of all types, and later still it was extended to include the analysis and interpretation of such data. In modern terms, "statistics" means both sets of collected information, as in national accounts, and analytical work which requires statistical inference.

Ancient scrolls and books, symbolizing the early development of statistics.
Ancient scrolls and books, symbolizing the early development of statistics.

Statistical Methods

Statistical methods can be used to summarize or describe a collection of data; this is called descriptive statistics. This is useful in research, when communicating the results of experiments. In addition, patterns in the data may be modeled in a way that accounts for randomness and uncertainty in the observations, and are then used to draw inferences about the process or population being studied; this is called inferential statistics.

Both descriptive and inferential statistics comprise applied statistics. There is also a discipline called mathematical statistics, which is concerned with the theoretical basis of the subject.

Theoretical statistics concerns both the logical arguments underlying justification of approaches to statistical inference, as well encompassing mathematical statistics. Mathematical statistics includes not only the manipulation of probability distributions necessary for deriving results related to methods of estimation and inference, but also various aspects of computational statistics and the design of experiments.

A mathematical equation written on a chalkboard, representing mathematical statistics.
A mathematical equation written on a chalkboard, representing mathematical statistics.

Data Collection

When census data cannot be collected, statisticians collect data by developing specific experiment designs and survey samples. Representative sampling assures that inferences and conclusions can safely extend from the sample to the population as a whole. An experimental study involves taking measurements of the system under study, manipulating the system, and then taking additional measurements using the same procedure to determine if the manipulation has modified the values of the measurements. In contrast, an observational study does not involve experimental manipulation.

Two main statistical methods are used in data analysis: descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation, and inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation). Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution's central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

People conducting a survey in a public area, representing data collection.
People conducting a survey in a public area, representing data collection.

Applications

Statistics is applicable to a wide variety of academic disciplines, including natural and social sciences, government, and business. Statistical consultants can help organizations and companies that don't have in-house expertise relevant to their particular questions.

A business meeting where statistics are being discussed.
A business meeting where statistics are being discussed.

Misuse

Misuse of statistics can produce subtle, but serious errors in description and interpretation—subtle in the sense that even experienced professionals make such errors, and serious in the sense that they can lead to devastating decision errors. For instance, social policy, medical practice, and the reliability of structures like bridges all rely on the proper use of statistics.

Even when statistical techniques are correctly applied, the results can be difficult to interpret for those lacking expertise. The statistical significance of a trend in the data—which measures the extent to which a trend could be caused by random variation in the sample—may or may not agree with an intuitive sense of its significance. The set of basic statistical skills (and skepticism) that people need to deal with information in their everyday lives properly is referred to as statistical literacy.

A person looking at confusing statistical data on a computer screen.
A person looking at confusing statistical data on a computer screen.

See Also