Kolmogorov complexity
Introduction
Kolmogorov complexity, also known as algorithmic complexity, is a measure of the complexity of an object, such as a piece of text, a binary string, or a dataset. It quantifies the amount of information required to describe the object. This concept is named after the Russian mathematician Andrey Kolmogorov, who was one of the pioneers in the field of algorithmic information theory.
Definition
Kolmogorov complexity of a string is defined as the length of the shortest possible description of the string in some fixed universal description language. Formally, the Kolmogorov complexity \( K(x) \) of a string \( x \) with respect to a universal Turing machine \( U \) is the length of the shortest program \( p \) such that \( U(p) = x \). Mathematically, this can be expressed as:
\[ K_U(x) = \min \{ |p| : U(p) = x \} \]
where \( |p| \) denotes the length of the program \( p \). The choice of the universal Turing machine \( U \) is not crucial because the complexities with respect to different universal Turing machines differ by at most a constant.
Properties
Incompressibility
A string is considered incompressible if its Kolmogorov complexity is close to its length. This means that there is no shorter description of the string than the string itself. Most strings are incompressible, which is a consequence of the pigeonhole principle.
Relative Complexity
The relative Kolmogorov complexity \( K(x|y) \) of a string \( x \) given another string \( y \) is the length of the shortest program that outputs \( x \) when \( y \) is provided as an auxiliary input. This can be formally defined as:
\[ K_U(x|y) = \min \{ |p| : U(p, y) = x \} \]
Symmetry of Information
One of the fundamental properties of Kolmogorov complexity is the symmetry of information. For any strings \( x \) and \( y \), the following holds up to an additive constant:
\[ K(x, y) \approx K(x) + K(y|x) \approx K(y) + K(x|y) \]
This property indicates that the complexity of the pair \( (x, y) \) is approximately the sum of the individual complexities of \( x \) and \( y \), given each other.
Applications
Kolmogorov complexity has numerous applications in various fields, including computer science, information theory, and mathematics.
Data Compression
In data compression, Kolmogorov complexity provides a theoretical limit on the compressibility of data. It helps in understanding the efficiency of compression algorithms and the inherent redundancy in data.
Randomness and Pseudorandomness
Kolmogorov complexity is used to define randomness. A string is considered random if its Kolmogorov complexity is high, meaning it cannot be significantly compressed. This concept is crucial in the study of pseudorandom number generators and cryptographic systems.
Machine Learning
In machine learning, Kolmogorov complexity is related to the principle of minimum description length (MDL). It is used to select models that best explain the data with the shortest description, balancing model complexity and data fit.
Limitations
Despite its theoretical significance, Kolmogorov complexity has practical limitations.
Uncomputability
Kolmogorov complexity is uncomputable. There is no algorithm that can compute the exact Kolmogorov complexity of an arbitrary string. This is a consequence of the halting problem, which states that it is impossible to determine whether a given program will halt.
Dependence on Universal Turing Machine
While the choice of the universal Turing machine affects the complexity by at most a constant, this constant can be large in practice. Therefore, the exact value of Kolmogorov complexity can vary depending on the chosen machine.