Checksum

Overview

A checksum is a value that is computed from a data set of an arbitrary length. This value is used to check the integrity of data being transmitted or stored. Checksums are used in various contexts, from error detection in computer networks to digital forensics in information security 1.

Functionality

Checksums are designed to detect errors introduced during the transmission or storage of data. The data to be checked is divided into blocks of bits, and the checksum is computed by performing a series of arithmetic or logical operations on these blocks. The resulting checksum value is then appended to the data. When the data is retrieved or received, the checksum is recomputed and compared with the stored value. If the two values match, the data is assumed to be error-free. If they do not match, an error is assumed to have occurred 2.

A computer screen displaying a checksum calculation process.

Types of Checksums

There are several types of checksums, each with different characteristics and uses.

Parity Bits

A parity bit is a simple form of checksum used in serial communication protocols. It is a binary digit that is added to a group of binary digits to make the total number of 1-bits either even (even parity) or odd (odd parity). Parity bits are used to detect single bit errors in data transmission 3.

Modular Checksums

Modular checksums are computed by dividing the data into blocks, summing the blocks, and then taking the remainder when the sum is divided by a fixed number. The remainder is the checksum. Modular checksums are used in Internet Protocol (IP) headers 4.

Cyclic Redundancy Check (CRC)

A cyclic redundancy check (CRC) is a type of checksum that is particularly good at detecting common types of errors, such as burst errors. CRCs are computed using binary division with a divisor known as a generator polynomial. CRCs are used in Ethernet and other high-speed data transmission protocols 5.

Cryptographic Hash Functions

Cryptographic hash functions are a type of checksum used in information security. They produce a fixed-size output (the hash) from any size input data. The same input will always produce the same hash, but even a small change in the input will produce a completely different hash. Cryptographic hash functions are used for data integrity checks, digital signatures, and password storage 6.

A computer screen displaying a cyclic redundancy check process.

Checksum Algorithms

Checksum algorithms are the methods used to compute checksums. The choice of algorithm depends on the application and the type of data being checked.

Fletcher's Checksum

Fletcher's checksum is an algorithm that computes a checksum by summing the data in two accumulators, one of which is incremented after each addition. The final checksum is the concatenation of the two accumulator values. Fletcher's checksum is used in the Open Shortest Path First (OSPF) routing protocol 7.

Adler-32

Adler-32 is a checksum algorithm that is faster than a CRC but less reliable. It computes a checksum by summing the data in two 16-bit accumulators, one of which is incremented after each addition. The final checksum is the concatenation of the two accumulator values. Adler-32 is used in the zlib compression library 8.

MD5

MD5 (Message Digest Algorithm 5) is a widely used cryptographic hash function that produces a 128-bit hash value. It is used for a variety of information security applications, including data integrity checks and password storage. However, MD5 is considered to be broken and unsuitable for further use due to vulnerabilities that allow for hash collisions 9.

A computer screen displaying an MD5 hash calculation process.

Limitations and Vulnerabilities

While checksums are a valuable tool for error detection, they are not foolproof. There are several limitations and vulnerabilities associated with their use.

False Positives

A false positive occurs when a checksum indicates that data is error-free when it is not. This can occur if the error introduced into the data results in the same checksum as the original data. The likelihood of a false positive depends on the checksum algorithm and the size of the checksum 10.

Collisions

A collision occurs when two different data sets produce the same checksum. This is a particular concern with cryptographic hash functions, where a collision can allow an attacker to substitute a malicious data set for a legitimate one without detection 11.

Security Vulnerabilities

Checksums are not designed to be secure against intentional manipulation. An attacker can modify the data and the checksum in such a way that the modified data produces the same checksum as the original data. This is known as a checksum collision attack 12.

A computer screen displaying a checksum collision process.

Applications

Checksums are used in a wide range of applications, from error detection in computer networks to digital forensics in information security.

Data Transmission

Checksums are used in data transmission protocols to detect errors introduced during the transmission of data. The sender computes the checksum and appends it to the data. The receiver recomputes the checksum and compares it with the received value to check for errors 13.

Data Storage

Checksums are used in data storage systems to detect errors introduced during the storage or retrieval of data. The checksum is computed when the data is stored and checked when the data is retrieved 14.

Information Security

Checksums, in the form of cryptographic hash functions, are used in information security to check the integrity of data. They are used to detect unauthorized modifications to data and to verify the authenticity of data 15.

A computer screen displaying a cryptographic hash function process.

References

"Checksum." Wikipedia. https://en.wikipedia.org/wiki/Checksum
"Error detection and correction." Wikipedia. https://en.wikipedia.org/wiki/Error_detection_and_correction
"Parity bit." Wikipedia. https://en.wikipedia.org/wiki/Parity_bit
"Internet Protocol." Wikipedia. https://en.wikipedia.org/wiki/Internet_Protocol
"Cyclic redundancy check." Wikipedia. https://en.wikipedia.org/wiki/Cyclic_redundancy_check
"Cryptographic hash function." Wikipedia. https://en.wikipedia.org/wiki/Cryptographic_hash_function
"Fletcher's checksum." Wikipedia. https://en.wikipedia.org/wiki/Fletcher's_checksum
"Adler-32." Wikipedia. https://en.wikipedia.org/wiki/Adler-32
"MD5." Wikipedia. https://en.wikipedia.org/wiki/MD5
"Error detection and correction." Wikipedia. https://en.wikipedia.org/wiki/Error_detection_and_correction
"Collision (computer science)." Wikipedia. https://en.wikipedia.org/wiki/Collision_(computer_science)
"Collision attack." Wikipedia. https://en.wikipedia.org/wiki/Collision_attack
"Data transmission." Wikipedia. https://en.wikipedia.org/wiki/Data_transmission
"Data storage." Wikipedia. https://en.wikipedia.org/wiki/Data_storage
"Information security." Wikipedia. https://en.wikipedia.org/wiki/Information_security