Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed edge u v, vertex u comes before v in the ordering. Topological Sorting for a graph is not possible if the graph is not a DAG. The first vertex in topological sorting is always a vertex with an in-degree as 0 (a vertex with no incoming edges). Edges may or may not be weighted.
The linearization of the DAG that ranges in the linearization where the topological order does not matter is sorted lexicographically and is said to be a "stable topological sort".
A DAG will always have at least one valid topological ordering possible. If a graph is cyclic, topological sorting is impossible, and hence, not defined for cyclic graphs. And if a graph is undirected, each edge being bidirectional creates a cycle between the two nodes it connects; hence topological ordering isn’t possible. The graph has to be a DAG for topological ordering to be possible.
The topological sort is used to schedule the jobs from given dependencies among Jobs. The topological sort is used to determine the order of compilation tasks to perform in resolving symbol dependencies in linkers, data serializations, and making files.
Vertex: Nodes are called the vertex.
Edge: The connection between two vertices is called edge.
Degree: This is only applicable to unweighted graphs. This is the number of edges connected to the given vertex.
In-Degree: For a directed graph, this is the number of incoming edges for a given vertex.
Out-Degree: Similar to In-Degree, this is the number of outgoing edges for a given vertex.
Calculation of Indegree: We can use an in-degree array to track indegree values. To find the indegree of all nodes initially, we first initialize all values in the in-degree array to 0 and then go through all the edges and increment the in-degree counter of the destination node of each edge by 1. We do this because the destination node has an edge that’s an incoming edge pointing to it, contributing to its in-degree.
Any node that has no incoming edge and only outgoing edges are called a source. Any node that has only incoming edges and no outgoing edge is called a sink. Topological ordering starts with one of the sources and ends at one of the sinks.
The steps of topological sorting are:
Step 1: First, identify the node with no incoming edges that is indegree of the node is zero.
Step 2: After finding the node, deletes the node from the graph and adds it to the topological sort order.
Step 3: After deleting the node, remove the outgoing edges from the graph.
Step 4: Reduce the indegree by the number of related edges that were removed.
Imagine a task that does not have any prerequisites and is not a blocker for any other tasks. It can be performed at any time regardless of whether other tasks have been completed or not. Thus, it is valid for these nodes to appear in any position in a topologically sorted array. It will depend on the implementation of the algorithm where they eventually end up.
The position of nodes in the final topologically sorted array only needs to satisfy one criterion — for each node, all its dependencies should be found to its left. This condition can be fulfilled by arranging the nodes in multiple ways, as long as we are not placing a node to the right of a dependent node.
For example: Imagine a graph that has N nodes but only one directed edge connecting node A to node B. To topologically sort the nodes, we can arrange the N nodes in any way while only taking care of the constraint that A should lie to the left of B.
Depth-first search (DFS) is a recursive algorithm for traversing a graph. It uses the idea of exhaustive search - it will keep moving deeper into the graph until that particular path is entirely exhausted (in other words, a dead end is found).
DFS starts traversing the graph from a starting node (provided as input to the algorithm). It chooses a path starting from this node, and goes as deep as it can. After exhausting that path, it backtracks until it finds another unexplored path and then explores it recursively.
It keeps recursing until the complete graph has been explored or until the node we were looking for is found.
It is used to solve many interesting problems, such as finding a path in a maze, detecting and breaking deadlocks in operating systems, and many others.
The topological sort time complexity using DFS takes θ(m+n) time, and the insertion of each vertex to the linked list takes O(1) time. Thus, topological sort using DFS takes time θ(m + n). The following algorithm is as follows:
1. Perform DFS to compute finishing times f[v] for each vertex v.
2. As each vertex is finished, insert it to the front of a linked list.
3. Return the linked list of vertices.
How to detect a cycle in a directed graph using DFS?
We need to check if there exists a back edge in a graph. The back edge is an edge from the graph that is not present in the DFS tree after applying DFS, and it connects a node “u” to one of its ancestors. We can keep track of the active nodes in another array while running the DFS. Those nodes that are placed on the stack but have not been removed are considered active.
During DFS, when a new node is visited, all of its ancestors should be active. Also, all of the active nodes must be the new node’s ancestors. So, whenever we explore any new node, if we find an edge directed from the new node to any of the active nodes, we can conclude that the given graph has a cycle.
Proof: Suppose the back edge connects node “u” with node “v” and it goes from “u” to “v.” Without loss of generality, also suppose that “v” was discovered before “u”. As “v” and “u” are both active, there must be a path in the DFS tree from “v” to “u.” That path only uses tree edges. On the other hand, we just discovered another path from “u” to “v” using a back edge. So, in this graph, there exists a path that starts from “u” and comes back to “u.” Hence, it must be a cycle.
Why does DFS use a “stack” data structure?
DFS explores any path as long as possible. When it hits a dead end, it backtracks to the just-previous state. So, we need a data structure that can give us the most recent element (or call). The stack serves this purpose — Last In First Out (LIFO). The queue cannot be used to implement DFS as it is based on the opposite concept — First In First Out (FIFO).
Applications of DFS:
Precedence scheduling:
GOAL: Given a set of tasks to be completed with precedence constraints, in which order should we schedule the tasks?
vertex = task; edge = precedence constraint
Idea: Depth First Search
0→5, 0→2, 0→1, 3→6, 3→5, 3→4, 5→2, 6→4, 6→0, 3→2 , 1→4
- Run Depth-first search.
- Return vertices in reverse postorder.
postorder : 4 1 2 5 0 6 3
topological order: 3 6 0 5 2 1 4
Why does the topological sort algorithm work?
1. First vertex in postorder has outdegree 0.
2. Second-to-last vertex in postorder can only point to the last vertex.
If the graph has a cycle, some vertices will have cyclic dependencies which makes it impossible to find a linear ordering among vertices.
If no directed cycle, DFS-based algorithm finds a topological order.
Breadth-first Search
Breadth-first search (BFS) is a graph search and traverse algorithm used to solve many programming and real-life problems. It follows a simple, level-based approach. We can easily solve many problems in computer science by modeling them in terms of graphs.
For Example: analyzing networks, mapping routes, scheduling, and more.
Kahn's algorithm
Kahn’s algorithm for topological sorting is used mainly when tasks or items have to be ordered, where some tasks or items have to occur before others can.
Kahn’s algorithm works by keeping track of the number of incoming edges into each node (indegree). It repeatedly:
This process continues until no element with zero in-degree can be found. As mentioned earlier, this can happen when the topological sorting is complete and also when a cycle is encountered. For this reason, the algorithm checks if the result of topological sorting contains the same number of elements that were supposed to be sorted:
- If the numbers match, no cycle is encountered, and we print the sorted order.
- If they don’t, a cycle was encountered, which means the topological sorting was not possible. This is conveyed in such a case as appropriate: by returning null or printing a message.
The algorithm is as follows :
Step 1: Store the current in-degree of each node and initialize the count of visited nodes to zero.
Step 2: Take all the nodes with indegree 0 and add them to a queue (Enqueue) or a stack (Push)
Step 3: Remove a node from the queue (Dequeue) / stack (Pop) and:
- Increment the count of visited nodes (by one).
- Reduce indegree by one for all its neighboring nodes.
- If the in-degree of a neighboring node is reduced to zero by the above operation, then add it to the queue/stack.
Step 4: Continue repeating Step 3 until the queue/stack is empty.
Step 5: If the count of visited nodes does not equal the number of nodes in the graph, then a cycle is encountered, and topological sorting is not possible for the graph as it’s not a DAG. Return a message stating that. If it is equal, return the topologically sorted result.
Complexity Analysis:
Time Complexity: O(V+E).
The outer for loop will be executed V number of times and the inner for loop will be executed E number of times.
Auxiliary Space: O(V).
The queue needs to store all the vertices of the graph. So the space required is O(V)
The applications of Kahn’s Algorithm:
Breadth-first Search |
Depth-first Search |
Uses queue data structure to store the nodes to be visited | Uses stack data structure to store the nodes to be visited |
Traverses a graph level-wise | Traverses a graph depth-wise |
More suitable when the target node is closer to the source | More suitable when the target node is away from the source node |
Not suitable for decision-making trees used in games or puzzle | More suitable for games or puzzle problems |
Strengths:
Weaknesses: