1 month

Topological Sorting




Topological sorting for Directed Acyclic Graph (DAG) is a linear ordering of vertices such that for every directed edge u v, vertex u comes before v in the ordering. Topological Sorting for a graph is not possible if the graph is not a DAG. The first vertex in topological sorting is always a vertex with an in-degree as 0 (a vertex with no incoming edges). Edges may or may not be weighted.

The linearization of the DAG that ranges in the linearization where the topological order does not matter is sorted lexicographically and is said to be a "stable topological sort".

 A DAG will always have at least one valid topological ordering possible. If a graph is cyclic, topological sorting is impossible, and hence, not defined for cyclic graphs. And if a graph is undirected, each edge being bidirectional creates a cycle between the two nodes it connects; hence topological ordering isn’t possible. The graph has to be a DAG for topological ordering to be possible.

The topological sort is used to schedule the jobs from given dependencies among Jobs. The topological sort is used to determine the order of compilation tasks to perform in resolving symbol dependencies in linkers, data serializations, and making files.

   Vertex: Nodes are called the vertex.                                                      

   Edge: The connection between two vertices is called edge.                    

  Degree: This is only applicable to unweighted graphs. This is the number of edges connected to the given vertex.

  In-Degree: For a directed graph, this is the number of incoming edges for a given vertex.

  Out-Degree: Similar to In-Degree, this is the number of outgoing edges for a given vertex.

Calculation of Indegree: We can use an in-degree array to track indegree values. To find the indegree of all nodes initially, we first initialize all values in the in-degree array to 0 and then go through all the edges and increment the in-degree counter of the destination node of each edge by 1. We do this because the destination node has an edge that’s an incoming edge pointing to it, contributing to its in-degree. 

Any node that has no incoming edge and only outgoing edges are called a source. Any node that has only incoming edges and no outgoing edge is called a sink. Topological ordering starts with one of the sources and ends at one of the sinks

The steps of topological sorting are:                                                                        

    Step 1: First, identify the node with no incoming edges that is indegree of the node is zero.

    Step 2: After finding the node, deletes the node from the graph and adds it to the topological sort order.

    Step 3: After deleting the node, remove the outgoing edges from the graph.

    Step 4: Reduce the indegree by the number of related edges that were removed.

Imagine a task that does not have any prerequisites and is not a blocker for any other tasks. It can be performed at any time regardless of whether other tasks have been completed or not. Thus, it is valid for these nodes to appear in any position in a topologically sorted array. It will depend on the implementation of the algorithm where they eventually end up.

The position of nodes in the final topologically sorted array only needs to satisfy one criterion —  for each node, all its dependencies should be found to its left. This condition can be fulfilled by arranging the nodes in multiple ways, as long as we are not placing a node to the right of a dependent node. 

For example:  Imagine a graph that has N nodes but only one directed edge connecting node A to node B. To topologically sort the nodes, we can arrange the N nodes in any way while only taking care of the constraint that A should lie to the left of B.

Topological sorting can be done using algorithms like Kahn’s algorithm, depth-first search(DFS), and some parallel algorithms.

 

Topological Sort using DFS

Depth-first search (DFS) is a recursive algorithm for traversing a graph. It uses the idea of exhaustive search -  it will keep moving deeper into the graph until that particular path is entirely exhausted (in other words, a dead end is found). 

DFS starts traversing the graph from a starting node (provided as input to the algorithm). It chooses a path starting from this node, and goes as deep as it can. After exhausting that path, it backtracks until it finds another unexplored path and then explores it recursively. 

It keeps recursing until the complete graph has been explored or until the node we were looking for is found.

It is used to solve many interesting problems, such as finding a path in a maze, detecting and breaking deadlocks in operating systems, and many others. 

The topological sort time complexity using DFS takes θ(m+n) time, and the insertion of each vertex to the linked list takes O(1) time. Thus, topological sort using DFS takes time θ(m + n). The following algorithm is as follows:

1. Perform DFS to compute finishing times f[v] for each vertex v.                   
2. As each vertex is finished, insert it to the front of a linked list.                     
3. Return the linked list of vertices.                                                                  

How to detect a cycle in a directed graph using DFS?

 We need to check if there exists a back edge in a graph. The back edge is an edge from the graph that is not present in the DFS tree after applying DFS, and it connects a node “u” to one of its ancestors. We can keep track of the active nodes in another array while running the DFS. Those nodes that are placed on the stack but have not been removed are considered active. 

During DFS, when a new node is visited, all of its ancestors should be active. Also, all of the active nodes must be the new node’s ancestors. So, whenever we explore any new node, if we find an edge directed from the new node to any of the active nodes, we can conclude that the given graph has a cycle. 

Proof: Suppose the back edge connects node “u” with node “v” and it goes from “u” to “v.” Without loss of generality, also suppose that “v” was discovered before “u”. As “v” and “u” are both active, there must be a path in the DFS tree from “v” to “u.” That path only uses tree edges. On the other hand, we just discovered another path from “u” to “v” using a back edge. So, in this graph, there exists a path that starts from “u” and comes back to “u.” Hence, it must be a cycle. 

Why does DFS use a “stack” data structure?

DFS explores any path as long as possible. When it hits a dead end, it backtracks to the just-previous state. So, we need a data structure that can give us the most recent element (or call). The stack serves this purpose — Last In First Out (LIFO). The queue cannot be used to implement DFS as it is based on the opposite concept — First In First Out (FIFO).

Applications of DFS:                                                 

  • Finding whether two nodes present in a graph belong to the same component
  • Checking if the graph is bipartite                                                                      
  • Finding the topological ordering of the vertices of a graph                               
  • Finding the strongly connected components of a graph                                  
  • Finding the spanning tree of an unweighted graph                                          
  • Finding bridges in graphs                                                                                 
  • Finding articulation points in graphs                                                               
  • Finding a path in a maze and similar puzzles                                                  
  • Analyzing networks and mapping source-destination route                             

Precedence scheduling:                           

GOAL: Given a set of tasks to be completed with precedence constraints, in which order should we schedule the tasks?

vertex = task;    edge = precedence constraint

Idea:   Depth First Search

0→5,   0→2, 0→1,   3→6, 3→5,  3→4,  5→2,  6→4,  6→0,  3→2 ,   1→4              

                        - Run Depth-first search.                                  

                        - Return vertices in reverse postorder.             

postorder :          4 1 2 5 0 6 3                                          

topological order:   3 6 0 5 2 1 4                                       

 Why does the topological sort algorithm work?

            1. First vertex in postorder has outdegree 0.                                   

            2. Second-to-last vertex in postorder can only point to the last vertex.                   

 

If the graph has a cycle, some vertices will have cyclic dependencies which makes it impossible to find a linear ordering among vertices.

If no directed cycle, DFS-based algorithm finds a topological order.                                  

 Breadth-first Search 

Breadth-first search (BFS) is a graph search and traverse algorithm used to solve many programming and real-life problems. It follows a simple, level-based approach. We can easily solve many problems in computer science by modeling them in terms of graphs.

For Example: analyzing networks, mapping routes, scheduling, and more.                                   

BFS Algorithm: 

  • Step 1: Create an empty queue to store the nodes to be visited and a boolean vector to track the visited nodes. Initially, no node is visited.
  • Step 2: Choose a starting node, mark it visited, and add it to the queue.
  • Step 3: Extract the front item of the queue, add it to the BFS order, and traverse through the list of that node's adjacent nodes.
  • Step 4: If a node in that list has not been already visited, mark it visited, and add it to the back of the queue.
  • Step 5: Keep repeating steps 3 and 4 until the queue is empty.

BFS Applications                                                         

  • Finding the shortest path: In an unweighted graph, the shortest path between two nodes is the path with the least number of edges. BFS can be used to find the shortest path between a source node and all the remaining nodes in an unweighted graph.
  • Finding the minimum spanning tree: After some modifications, this algorithm can be used to find the minimum spanning tree for unweighted graphs.
  • Detecting a cycle: This algorithm can be used to detect a cycle in graphs.
  • Finding a path: We can use BFS to check whether a path exists between two nodes.
  • Peer-to-peer networks: In P2P networks, like BitTorrent, BFS is used to find all neighbor nodes.
  • Broadcasting networks: In network broadcasting, broadcasted packets are guided by the BFS algorithm to reach all target nodes.
  • GPS navigation systems: BFS is used to find all neighboring locations.
  • Search engine crawlers: The main idea behind crawlers is to start from the source page, follow all links from that source to other pages, and keep repeating the same until the objective for crawling is completed.
  • Social networking websites: We can find the number of people within a given distance “k” from a person in a social network using BFS.
  • Connected component: We can use BFS to find all the nodes within a connected component of a graph.
  • Bipartite: This algorithm can be used to check if a graph is bipartite.

Kahn's algorithm 

Kahn’s algorithm for topological sorting is used mainly when tasks or items have to be ordered, where some tasks or items have to occur before others can.

Kahn’s algorithm works by keeping track of the number of incoming edges into each node (indegree). It repeatedly:                                                    

  1.  Finds nodes with no incoming edge, that is, nodes with zero in-degree (no dependency).
  2.  Stores the nodes with zero indegrees in a stack/queue and deletes them from the original graph.
  3.  Deletes the edges originating from the nodes stored in step 2. It is achieved by decrementing the in-degree of each node connected to the nodes removed in step 2. 

This process continues until no element with zero in-degree can be found. As mentioned earlier, this can happen when the topological sorting is complete and also when a cycle is encountered. For this reason, the algorithm checks if the result of topological sorting contains the same number of elements that were supposed to be sorted:

        - If the numbers match, no cycle is encountered, and we print the sorted order. 

        - If they don’t, a cycle was encountered, which means the topological sorting was not possible. This is conveyed in such a case as appropriate: by returning null or printing a message.

    The algorithm is as follows :

    Step 1: Store the current in-degree of each node and initialize the count of visited nodes to zero.

    Step 2: Take all the nodes with indegree 0 and add them to a queue (Enqueue) or a stack (Push)

    Step 3: Remove a node from the queue (Dequeue) / stack (Pop) and:
                 - Increment the count of visited nodes (by one).
                 - Reduce indegree by one for all its neighboring nodes.
                 - If the in-degree of a neighboring node is reduced to zero by the above operation, then add it to the queue/stack.

   Step 4: Continue repeating Step 3 until the queue/stack is empty.

   Step 5: If the count of visited nodes does not equal the number of nodes in the graph, then a cycle is encountered, and topological sorting is not possible for the graph as it’s not a DAG. Return a message stating that. If it is equal, return the topologically sorted result.  

Complexity Analysis:                                           

     Time Complexity: O(V+E).                                           
        The outer for loop will be executed V number of times and the inner for loop will be executed E number of times.

     Auxiliary Space: O(V).                                                                        
        The queue needs to store all the vertices of the graph. So the space required is O(V)

 

The applications of Kahn’s Algorithm:

  1. Scheduling jobs, given dependencies some jobs have on some other jobs.
  2. Course arrangement in educational institutions. Finding prerequisites of any job or task.
  3. Detecting deadlocks in operating systems. Finding out if cycles exist in a graph.
  4. Resolving symbol dependencies in linkers. Compile-time build dependencies. Deciding the appropriate order of performing compilation tasks in makefiles.

Breadth-first Search 

Depth-first Search

Uses queue data structure to store the nodes to be visited Uses stack data structure to store the nodes to be visited
Traverses a graph level-wise Traverses a graph depth-wise
More suitable when the target node is closer to the source More suitable when the target node is away from the source node
Not suitable for decision-making trees used in games or puzzle More suitable for games or puzzle problems

Strengths and Weaknesses of Topological Sort

Strengths:                                                                               

  • Requires linear time and linear space to perform.
  • Effective in detecting cyclic dependencies.
  • Can efficiently find feedback loops that should not exist in a combinational circuit.
  • Can be used to find the shortest path between two nodes in a DAG in linear time.

Weaknesses:                                                                                       

  • Topological sort is not possible for a graph that is not directed and acyclic.

 

 


Responses(0)