Wasserstein distance

Introduction

The Wasserstein distance, also known as the Earth Mover's Distance (EMD), is a measure of the distance between two probability distributions on a given metric space. It is named after the Russian mathematician Leonid Vaserstein, who introduced it in the context of optimal transport theory. This distance is particularly useful in fields such as machine learning, computer vision, and economics, where comparing distributions is essential.

Mathematical Definition

The Wasserstein distance is defined in the context of optimal transport, where the goal is to find the most efficient way to transform one distribution into another. Given two probability measures \(\mu\) and \(\nu\) on a metric space \((X, d)\), the \(p\)-Wasserstein distance is defined as:

\[ W_p(\mu, \nu) = \left( \inf_{\gamma \in \Gamma(\mu, \nu)} \int_{X \times X} d(x, y)^p \, d\gamma(x, y) \right)^{1/p} \]

where \(\Gamma(\mu, \nu)\) is the set of all couplings of \(\mu\) and \(\nu\), and \(d(x, y)\) is the distance between points \(x\) and \(y\) in the space \(X\).

Properties

The Wasserstein distance has several important properties:

**Non-negativity**: \(W_p(\mu, \nu) \geq 0\), with equality if and only if \(\mu = \nu\).
**Symmetry**: \(W_p(\mu, \nu) = W_p(\nu, \mu)\).
**Triangle Inequality**: \(W_p(\mu, \nu) \leq W_p(\mu, \lambda) + W_p(\lambda, \nu)\) for any probability measure \(\lambda\).
**Convexity**: The Wasserstein distance is a convex function of its arguments.

Applications

Machine Learning

In machine learning, the Wasserstein distance is used in various applications such as GANs, where it provides a more stable and meaningful measure of the difference between the generated and real data distributions. The Wasserstein GAN (WGAN) is a popular variant that leverages this distance for improved training stability.

Computer Vision

In computer vision, the Wasserstein distance is employed in tasks such as image retrieval and image segmentation. It allows for the comparison of histograms of pixel intensities or features, providing a robust metric for similarity.

Economics

In economics, the Wasserstein distance is used to compare income distributions, measure inequality, and analyze economic mobility. It provides insights into the cost of transforming one economic state into another, reflecting changes in wealth distribution.

Computational Aspects

Computing the Wasserstein distance can be challenging, especially in high-dimensional spaces. Several algorithms have been developed to approximate this distance efficiently:

**Linear Programming**: The Wasserstein distance can be formulated as a linear programming problem, which is computationally expensive but provides exact solutions.
**Sinkhorn Distance**: An entropic regularization of the Wasserstein distance, known as the Sinkhorn distance, allows for faster computation by solving a regularized optimization problem.
**Approximation Algorithms**: Various approximation methods, such as the Auction Algorithm and the Network Simplex Method, provide efficient solutions for specific cases.

Extensions and Variants

Multi-Marginal Wasserstein Distance

The multi-marginal Wasserstein distance extends the concept to more than two distributions, allowing for the comparison of multiple probability measures simultaneously. This extension is useful in applications such as multi-view learning and collaborative filtering.

Wasserstein Barycenters

Wasserstein barycenters are a generalization of the concept of a mean to the space of probability measures. They provide a way to compute the "average" distribution of a set of probability measures, with applications in clustering and data summarization.

Theoretical Insights

The Wasserstein distance is deeply connected to the theory of optimal transport, which studies the most efficient ways to move mass from one configuration to another. This connection provides a rich theoretical framework for understanding the properties and applications of the Wasserstein distance.