Introducing Gradio 5.0
Read MoreIntroducing Gradio 5.0
Read MoreEvery Gradio app comes with a built-in queuing system that can scale to thousands of concurrent users. Because many of your event listeners may involve heavy processing, Gradio automatically creates a queue to handle every event listener in the backend. Every event listener in your app automatically has a queue to process incoming events.
By default, each event listener has its own queue, which handles one request at a time. This can be configured via two arguments:
concurrency_limit
: This sets the maximum number of concurrent executions for an event listener. By default, the limit is 1 unless configured otherwise in Blocks.queue()
. You can also set it to None
for no limit (i.e., an unlimited number of concurrent executions). For example:import gradio as gr
with gr.Blocks() as demo:
prompt = gr.Textbox()
image = gr.Image()
generate_btn = gr.Button("Generate Image")
generate_btn.click(image_gen, prompt, image, concurrency_limit=5)
In the code above, up to 5 requests can be processed simultaneously for this event listener. Additional requests will be queued until a slot becomes available.
If you want to manage multiple event listeners using a shared queue, you can use the concurrency_id
argument:
concurrency_id
: This allows event listeners to share a queue by assigning them the same ID. For example, if your setup has only 2 GPUs but multiple functions require GPU access, you can create a shared queue for all those functions. Here's how that might look:import gradio as gr
with gr.Blocks() as demo:
prompt = gr.Textbox()
image = gr.Image()
generate_btn_1 = gr.Button("Generate Image via model 1")
generate_btn_2 = gr.Button("Generate Image via model 2")
generate_btn_3 = gr.Button("Generate Image via model 3")
generate_btn_1.click(image_gen_1, prompt, image, concurrency_limit=2, concurrency_id="gpu_queue")
generate_btn_2.click(image_gen_2, prompt, image, concurrency_id="gpu_queue")
generate_btn_3.click(image_gen_3, prompt, image, concurrency_id="gpu_queue")
In this example, all three event listeners share a queue identified by "gpu_queue"
. The queue can handle up to 2 concurrent requests at a time, as defined by the concurrency_limit
.
concurrency_limit=None
. This is useful if your function is calling e.g. an external API which handles the rate limiting of requests itself.default_concurrency_limit
parameter in Blocks.queue()
. These configurations make it easy to manage the queuing behavior of your Gradio app.