Kruskal's algorithm

From Canonica AI

Overview

Kruskal's algorithm is a greedy algorithm in graph theory that finds a minimum spanning tree for a connected weighted graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. If the graph is not connected, then it finds a minimum spanning forest (a minimum spanning tree for each connected component).

A visual representation of a graph with multiple nodes and edges, demonstrating the process of Kruskal's algorithm.
A visual representation of a graph with multiple nodes and edges, demonstrating the process of Kruskal's algorithm.

History and Development

Kruskal's algorithm was first described by Joseph Kruskal in 1956 in his paper "On the shortest spanning subtree of a graph and the traveling salesman problem". The algorithm was based on the earlier work of Vojtěch Jarník and Robert C. Prim, who developed similar algorithms now known as Prim's algorithm and Jarník's algorithm. However, Kruskal's algorithm differs from these in its sorting of the edges before the actual execution of the algorithm.

Algorithm Description

The algorithm operates by sorting all the edges from the lowest weight to the highest. It then goes through the edges in this order, adding the current edge to the tree if it does not form a cycle with the edges already in the tree. The algorithm continues until all vertices are included in the tree or all edges have been considered.

The steps of Kruskal's algorithm are as follows:

  1. Create a forest F (a set of trees), where each vertex in the graph is a separate tree.
  2. Create a set S containing all the edges in the graph.
  3. While S is nonempty and F is not yet spanning, remove an edge with minimum weight from S. If the removed edge connects two different trees, then add it to the forest F, combining two trees into a single tree.
  4. At the completion of the algorithm, the forest forms a minimum spanning forest of the graph. If the graph is connected, the forest has a single component and forms a minimum spanning tree.

Time Complexity

The time complexity of Kruskal's algorithm can be represented as O(E log E), where E is the number of edges in the graph. This is because the most time-consuming part of the algorithm is sorting all the edges, which can be done in O(E log E) time using a comparison-based sort. The rest of the algorithm runs in linear time, making the overall time complexity of the algorithm O(E log E).

Applications

Kruskal's algorithm has wide applications in network design. It is used in algorithms that are designed to minimize the total length of the wires in electrical circuits, computer networks, and transportation networks. It is also used in clustering algorithms in machine learning and data mining.

See Also