Resampling (statistics)

Introduction

Resampling is a method of statistical inference that involves drawing repeated samples from the original data samples. It is a non-parametric method of statistical inference. In other words, the method of resampling does not involve the utilization of the general theory of parameters such as mean and standard deviation. The process of resampling generates a unique sampling distribution on the basis of the actual data collected. It uses experimental methods, rather than analytical methods, to generate the unique sampling distribution.

Methodology

Resampling is a very general method, and is often used when the theoretical distribution of a statistic of interest is complicated or unknown. By drawing repeated samples from the observed data, one can mimic the variability of the statistic from the collection of new data sets. The method of resampling is straightforward. It involves repeatedly drawing samples from a training set and refitting a model of interest on each sample in order to obtain additional information about the fitted model. For example, in order to estimate the variability of a linear regression fit, one can repeatedly draw different samples from the training data, fit a linear regression to each new sample, and then examine the extent to which the resulting fits differ. Such an approach may allow us to obtain information about the model that is not readily available from fitting the model only once using the original training sample.

Types of Resampling

There are several types of resampling methods, including but not limited to: bootstrap, jackknife and permutation tests.

Bootstrap

The bootstrap is a widely applicable and extremely powerful statistical tool that can be used to quantify the uncertainty associated with a given estimator or statistical learning method. As an example, the bootstrap can be used to estimate the standard errors of the coefficients from a linear regression fit. Similarly, the bootstrap can be used to assess the variability of the coefficient estimates and predictions from a neural network, or the value of a linear discriminant analysis test statistic.

Jackknife

The jackknife, like the bootstrap, is a method for estimating the variability of an estimator or statistical learning method. Jackknife estimates are found by systematically leaving out one observation at a time from the sample set and calculating the estimate for each subset. This method was a predecessor to the bootstrap.

Permutation Tests

Permutation tests are a group of nonparametric statistics. They offer exact significance tests, even for small samples. Permutation tests are also called exact tests, randomization tests, or re-randomization tests.

Advantages and Disadvantages

Resampling methods have the advantage of being applicable in almost any situation. They are straightforward and easy to understand, and they require fewer assumptions than parametric methods. For instance, they do not assume that the errors are normally distributed, which is a key assumption of many parametric methods.

However, resampling methods do come with their own set of disadvantages. They can be computationally intensive, and they may not always provide accurate results, especially with smaller sample sizes. Despite these potential drawbacks, resampling methods are a valuable tool in the statistician's toolbox.

Applications

Resampling methods are used in a variety of fields. In finance, they can be used to calculate the risk of a given investment strategy. In medicine, they can be used to analyze the results of clinical trials. In machine learning, they are used to estimate the accuracy of predictive models.