8 months, 2 weeks

Why asynchronous?




 

Like most WSGI-based web frameworks, Django is synchronous. When a client
requests a web page, the request reaches Django through a view and passes through
various lines of code until the rendered web page is returned. As this communication
waits or blocks until the process executes all this code, it is termed as synchronous.

Some of the typical kinds of asynchronous tasks are:

 

-Sending a single or mass emails/SMS                        
-Calling web services                                                   
-Slow SQL queries                                                       
-Logging activity                                                          
-Media encoding or decoding                                      
-Parsing a large corpus of text                                    
-Web scraping                                                             
-Sending newsletters                                                   
-Machine learning tasks                                              
-Image processing                                                       

There are several pitfalls that you need to be aware of, such as the following:

-Race condition: If two or more threads of code modify the same data, the order in which they get executed can affect the final value. This race can
lead to data being in an undetermined state. Starvation: Indefinite waiting by one thread due to other threads coming in.

-Deadlock: If a thread is waiting for a resource that another thread has locked, and vice versa at the same time, then both threads are stuck in a
deadlock.

-Debugging challenge: It is very hard to reproduce a bug in asynchronous code due to the non-deterministic timing of a multithreaded program. 

-Order preservation: There might be dependencies between sections of code that might not be observed when the execution order varies.

Endpoint callback pattern

When a caller calls a service, it specifies an endpoint to be called when the operation is completed.When used purely as an HTTP callback, it is
called a WebHook.

The process is roughly as follows:
1. The client calls a service through a channel such as REST, RPC, or UDP. It also provides its own endpoint to notify when the result becomes ready.
2. The call returns immediately.
3. When the task is completed, the service calls the defined endpoint to notify the initial sender.

This pattern is quite popular and implemented by various web applications, such as GitHub, PayPal, Twilio, and more. These providers usually have an API to manage subscriptions to these WebHooks, unless you have a broker to perform such mediation.

Publish-subscribe pattern
This pattern is a more general form of the endpoint callback pattern. Here, a broker acts as an intermediary between the actual sender and recipients.

The process of communication is as follows:
1. One or more listeners will inform a broker process that they are interested in subscribing to a topic
2. A publisher will post a message to the broker under the relevant topic
3. The broker dispatches the message to all the subscribers
A broker has the advantage of fully decoupling the sender and receiver in many senses. Additionally, the broker can perform many additional tasks, such as message enrichment, transformation, or filtering. This pattern is quite scalable and, hence, popular in enterprise middleware.

Celery internally uses publish/subscribe mechanisms for several of its backend transports, such as Redis for sending messages.

Polling pattern
Polling, as the name suggests, involves the client periodically checking a service for any new events. This is often the least desirable means of asynchronous communication as polling increases system utilization and becomes difficult to scale.

A polling system works as follows:
1. The client calls a service
2. The call returns immediately with new events or the status of the task
3. The client waits and repeats step two at periodic intervals

Asynchronous solutions for Django
-Celery: Worker threads-based model for handling computation outside the Django process
-asyncio: Python built-in module for concurrently executing multiple tasks within the same thread
-Django Channels: Real-time message queue-like architecture to manage I/O events such as WebSockets

Celery
Celery is a feature-rich asynchronous task queue manager. Here, a task refers to a callable that, when executed, will perform the activity asynchronously. Celery is used in production by several well-known organizations including Instagram and Mozilla, for handling millions of tasks a day.

While installing Celery, you will need to pick and choose various components such as a broker and result store. As Redis works in-memory, if your messages are larger and need persistence, you should use RabbitMQ instead. 
In Django, Celery jobs are usually mentioned in a separate file named tasks.py within the respective app directory.

When a request arrives, you can trigger a Celery task while handling it. The task invocation returns immediately without blocking the process. In fact, the task has not finished execution, but a task message has entered a task queue (or one of the many possible task queues).
Workers are separate processes that monitor the task queue for new tasks and actually execute them. They pick up a task message and send an acknowledgment to the queue so that the message is removed. Then they execute the task. Once completed, the process repeats, and it will try to pick up another task for execution.

A task can also be scheduled to run periodically using what Celery calls a Celery beat process. You can configure it to kick off tasks at certain time intervals, such as every 10 seconds or at the start of a day of the week. This is great for maintenance jobs such as backups or polling the health of a web service.
Celery is well-supported, scalable, and works well with Django, but it might be too cumbersome for trivial asynchronous tasks. In such cases, I would recommend using Django Channels or RQ, a simpler Redis-based task queue.

Celery tasks may be restarted several times, especially if you have enabled late acknowledgments. This makes it important to control the side effects of a
task. Hence, Celery recommends that all tasks should be idempotent. Idempotence is a mathematical property of a function that assures that it will return the same result if invoked with the same arguments, no matter how many times you call it.

However, it is important to understand the difference between an idempotent function and a function having no side effects (a pure or nullipotent function). The side effect of an idempotent will be the same, regardless of whether it was called once or several times.

For example, a task that always places a fresh order when called is not idempotent, but a task that cancels an existing order is idempotent. Operations that only read the state of the world and do not have any side effects are nullipotent.

As Celery architecture relies on tasks being idempotent, it is important to try to study  all the side effects of a non-idempotent task and convert it into an idempotent task.You can do this by either checking whether the tasks have been executed previously (if it was, then abort) or storing the result in a unique location based on the arguments. 

asyncio
asyncio is a co-operative multitasking library available in Python since version 3.6. All asyncio programs are driven by an event loop, which is pretty much an infinite loop that calls all registered coroutines in some order. Each coroutine operates cooperatively by yielding control to fellow coroutines at well-defined places. This is called awaiting. A coroutine is like a special function that can suspend and resume execution. It
works in the same way as lightweight threads.

asyncio versus threads

Firstly, threads need to be synchronized while accessing shared resources, or we will have race conditions. There are several types of synchronization primitives like locks but essentially, they involve waiting, which degrades performance and can cause deadlocks or starvation. 

coroutine has well-defined places where execution is handed over. As a result, you can make changes to a shared state as long as you leave it in a known state. For instance, you can retrieve a field from a database, perform calculations, and overwrite the field without worrying that another coroutine might have interrupted you in between.

Secondly, coroutines are lightweight. Each coroutine needs significantly less memory than a thread. If you can run a maximum of hundreds of threads, you might be able to run tens of thousands of coroutines, given the same memory. Thread switching also takes some time (a few milliseconds). This means you might be able to run more tasks or serve more concurrent users. 

The downsides of coroutines is that you cannot mix blocking and non-blocking code. So once you enter the event loop, the rest of the code must be written in anasynchronous style, even the libraries you use. T

Concurrency and Parallelism

Concurrency is the ability to perform other tasks while you are waiting on the current task.Parallelism is when two or more execution engines are performing a task. Concurrency is a way of structuring your programs, while parallelism refers to how it is executed. Concurrency will help you avoid blocking the processor core while waiting for, say, I/O events, while parallelism will help to distribute work among all the available cores.

Django Channels

Django Channels was originally created to solve the problem of handling asynchronous communication protocols, such as WebSockets. More and
more web applications were providing real-time capabilities such as chat and push notifications. 


Channels is an official Django project, not just for handling WebSockets and other forms of bi-directional communication but also for running background tasks asynchronously.

A client, such as a web browser, sends both HTTP/HTTPS and WebSocket traffic to an Asynchronous Server Gateway Interface (ASGI) server such as Daphene. Like
WSGI, the ASGI specification is a common way for application servers and applications to interact with each other asynchronously.

Like a typical Django application, HTTP traffic is handled synchronously, that is, when the browser sends a request, it waits until it is routed to Django and a response is sent back. However, it gets a lot more interesting when WebSocket traffic happens, because it can be triggered from either direction.
Once a WebSocket connection is established, a browser can send or receive messages. A sent message reaches the protocol type router that determines the next routing handler based on its transport protocol. Hence, you can define a router for HTTP and another for WebSocket messages. These routers are very similar to Django's URL mappers, but map the incoming messages to a consumer (rather than a view). A consumer is like an event handler that reacts to events. It can also send messages back to the browser, thereby containing the logic for a fully bi-directional communication.  Remember that the Django parts are synchronous. A consumer is, in fact, a valid ASGI application.

Channels, currently implemented with a Redis backend, provide an at best one-off guarantee, while Celery provides an at least one-off guarantee. This essentially means that Celery will retry when a delivery fails until it receives a successful acknowledgment. In the case of Channels, it is pretty much fire-and-forget.
Secondly, Channels does not provide information on the status of a task out of the box. We need to build such functionality ourselves, for instance by updating the database. Celery tasks status can be queried and persisted.
You can use Channels instead of Celery for some less critical use cases. However, for a more robust and proven solution, you should rely on Celery.

Idempotent

pure function is a function without side-effects where the output is solely determined by the input - that is, calling f(x) will give the same result no matter how many times you call it.

An idempotent function is one that can be applied multiple times without changing the result - that is, f(f(x)) is the same as f(x). Every pure function is side effect idempotent because pure functions never produce side effects even if they are called more then once. However, return value idempotence means that f(f(x)) = f(x) which is not effected by purity.

 


Responses(0)







Related