Kolmogorov complexity

Introduction

Kolmogorov complexity, also known as algorithmic complexity, is a measure of the complexity of an object, such as a piece of text, a binary string, or a dataset. It quantifies the amount of information required to describe the object. This concept is named after the Russian mathematician Andrey Kolmogorov, who was one of the pioneers in the field of algorithmic information theory.

Definition

Kolmogorov complexity of a string is defined as the length of the shortest possible description of the string in some fixed universal description language. Formally, the Kolmogorov complexity \( K(x) \) of a string \( x \) with respect to a universal Turing machine \( U \) is the length of the shortest program \( p \) such that \( U(p) = x \). Mathematically, this can be expressed as:

\[ K_U(x) = \min \{ |p| : U(p) = x \} \]

where \( |p| \) denotes the length of the program \( p \). The choice of the universal Turing machine \( U \) is not crucial because the complexities with respect to different universal Turing machines differ by at most a constant.

Properties

Incompressibility

A string is considered incompressible if its Kolmogorov complexity is close to its length. This means that there is no shorter description of the string than the string itself. Most strings are incompressible, which is a consequence of the pigeonhole principle.

Relative Complexity

The relative Kolmogorov complexity \( K(x|y) \) of a string \( x \) given another string \( y \) is the length of the shortest program that outputs \( x \) when \( y \) is provided as an auxiliary input. This can be formally defined as:

\[ K_U(x|y) = \min \{ |p| : U(p, y) = x \} \]

Symmetry of Information

One of the fundamental properties of Kolmogorov complexity is the symmetry of information. For any strings \( x \) and \( y \), the following holds up to an additive constant:

\[ K(x, y) \approx K(x) + K(y|x) \approx K(y) + K(x|y) \]

This property indicates that the complexity of the pair \( (x, y) \) is approximately the sum of the individual complexities of \( x \) and \( y \), given each other.

Applications

Kolmogorov complexity has numerous applications in various fields, including computer science, information theory, and mathematics.

Data Compression

In data compression, Kolmogorov complexity provides a theoretical limit on the compressibility of data. It helps in understanding the efficiency of compression algorithms and the inherent redundancy in data.

Randomness and Pseudorandomness

Kolmogorov complexity is used to define randomness. A string is considered random if its Kolmogorov complexity is high, meaning it cannot be significantly compressed. This concept is crucial in the study of pseudorandom number generators and cryptographic systems.

Machine Learning

In machine learning, Kolmogorov complexity is related to the principle of minimum description length (MDL). It is used to select models that best explain the data with the shortest description, balancing model complexity and data fit.

Limitations

Despite its theoretical significance, Kolmogorov complexity has practical limitations.

Uncomputability

Kolmogorov complexity is uncomputable. There is no algorithm that can compute the exact Kolmogorov complexity of an arbitrary string. This is a consequence of the halting problem, which states that it is impossible to determine whether a given program will halt.

Dependence on Universal Turing Machine

While the choice of the universal Turing machine affects the complexity by at most a constant, this constant can be large in practice. Therefore, the exact value of Kolmogorov complexity can vary depending on the chosen machine.