Minimum Spanning Tree

Overview#

Spanning tree of a graph is a connected subgraph with no cycles (acyclic) that includes all the vertices.

A Minimum Spanning Tree (MST) of an edge-weighted graph is a spanning tree whose weight (the sum of the weights of the edges) is no larger than the weight of any other spanning tree.

Hence, a MST is a spanning tree that is:

  • Connected
  • Acyclic
  • Includes all the vertices

Greedy Algorithm#

Cut Property

A cut in a graph is a partition of its vertices into two (non-empty) sets.

A crossing edge connects a vertex in one set with a vertex in the other.

Steps

  • Start with all edges in graph colored grey
  • Find cut with no black crossing edges, color its min-weight edge black
  • Repeat until V-1 edges are colored black

Kruskal's Algorithm#

Steps

  • Consider edges in ascending order of weight (priority queue - min heap of edges by weight)

  • Add next edge to tree T unless doing so creates a cycle

    • Check with union find
    • If we are connecting two vertices in the same set (component), we will obtain a cycle
  • Stop when V-1 edges are attained

Pseudocode#

from collections import dequeue
import heapq
# Pseudoclass - some methods have not been implemented
class KruskalMST:
# g is an edge-weighted graph
def __init__(self, g):
self.mst = deque()
pq = []
# heapq by default creates a min-heap
for edge in g.get_edges():
heapq.heappush(pq, edge)
uf = UnionFind(g.num_of_vertices())
while(len(pq) > 0 && len(self.mst) < g.num_of_vertices() - 1):
edge = heapq.heappop(pq)
v = edge.either()
w = edge.other()
if(!uf.connected(v, w)):
uf.union(v, w)
mst.append(edge)
def get_edges(self):
return self.mst

Time Complexity#

The overall time complexity for Kruskal's MST algorithm is Elog(E) in worst case. As shown in the table below, the dominating factor is E and log(E) which gives rise to the complexity.

The time complexity of Elog(E) can be represented as Elog(V) since number of edges is at most V^2 and hence, log(E) = log(V^2) = 2log(V).

Why is E at most V^2?

  • Imagine a adjacency matrix. At most, every cell is filled, which is V^2.
OperationFrequencyTime per Operation
Build PQ1E (bottom-up construction )
Delete MinElog(E)
UnionVlog*(V)
ConnectedElog*(V)

Note: log**(V) arises due to amortised bound using weight quick union with path compression

As per the table, we can see that the time complexity is dominated by E*log(E) times of delete_min() operation.