Scaling Voice AI 10x with Self-Hosted LiveKit

Key Takeaways

Self-hosted LiveKit server, agent workers, and egress — no cloud-tier concurrent session limits, full infrastructure control

Voice AI migrated from a coupled Django WebSocket to a standalone LiveKit agent worker with 5 selectable AI pipeline variants

All database queries over 500ms eliminated after Silk profiling and targeted ORM optimisation

New infrastructure handles 10x peak load through multi-metric autoscaling across API, WebSocket, and Celery tiers

The Challenge

Cuebo's sales simulation platform was built around a Django monolith handling WebSocket voice sessions, API requests, and AI task processing on shared infrastructure. As usage grew, this architecture made it impossible to scale voice AI independently from the API, created query bottlenecks under load, and imposed hard concurrency ceilings from LiveKit Cloud's hosted agent tiers.

The platform required a structured modernisation across four areas:

Voice AI architecture — the Django WebSocket server handled real-time AI sessions directly, coupling AI processing latency to general API availability
Database performance — ORM queries with no profiling produced slow endpoints; queries exceeding 500ms were common under concurrent load
Celery workers — all task types (voice AI processing, notifications, data jobs) competed on shared worker pools with no task isolation
Infrastructure scalability — ECS tasks scaled on fixed rules with no connection-aware or queue-depth-aware triggers, and LiveKit Cloud agent limits capped concurrent voice sessions

Prodinit team brought a rare combination of deep technical expertise and execution speed that allowed us to ship faster, and scale our application's infra under pressure. One of the most trusted and valued members of our journey.

Amanbir Singh

Co-founder & CTO, Cuebo.ai

What We Built

Prodinit redesigned the platform in five parallel workstreams over 12 weeks: a fully self-hosted LiveKit voice AI stack, Django infrastructure modernisation, database and Celery optimisation, ECS autoscaling, and a CI/CD QA pipeline.

Self-Hosted LiveKit Voice AI Stack

The centrepiece of the engagement was migrating voice AI off the Django WebSocket server entirely. Prodinit built and deployed a self-hosted LiveKit stack: the LiveKit server for WebRTC signalling and media routing, standalone Python agent workers (livekit_agents/) implementing a factory pattern for configurable AI pipeline selection, and LiveKit Egress for call recording directly to S3 — all running on Cuebo's own ECS infrastructure.

Self-hosting the full stack removes every LiveKit Cloud tier constraint: no limits on concurrent agent sessions, agent deployments, or egress minutes. The only scaling ceiling is ECS CPU and memory, both of which Prodinit configured to autoscale based on active connection count.

Self-hosted LiveKit architecture: Django control plane, LiveKit server, ECS agent workers, and AI pipeline services

Each pipeline runs a VAD → STT → LLM → TTS chain with stopword detection, max-duration enforcement, and reconnect timeout monitoring. On call end, a background thread handles post-call work: MP4 assembly via imageio-ffmpeg at 15fps, S3 upload, transcript POST, and cleanup POST back to Django.

Django Control Plane and Infrastructure

Django was upgraded to version 5.2 and refactored into a clean control plane for LiveKit operations: LiveKitService for all API calls (room creation, JWT token generation, agent dispatch, egress management), REST endpoints for session lifecycle, webhook handling for participant_joined and room_finished events, and Celery tasks for stuck-call cleanup and health monitoring.

The frontend was migrated to AWS Amplify with automated deployments from Git, and WebSocket and API servers were separated into independently scalable services.

Sentry for application error tracking and PostHog for product analytics were integrated in week 1.

Performance Profiling and Database Optimisation

Prodinit deployed django-silk for SQL query profiling and Flower for Celery task monitoring across the development environment. Every API endpoint and background task was profiled; all queries exceeding 100ms were flagged and all queries exceeding 500ms were eliminated through targeted index additions and ORM refactoring.

Celery workers were reorganised into dedicated pools by task type: voice AI processing tasks routed to high-memory workers, notification tasks to I/O-optimised workers, and data processing tasks to CPU-optimised workers — with exponential backoff on transient failures.

ECS Autoscaling

Custom autoscaling policies were configured across three service tiers:

API server — scale-up on CPU, memory, request count, and response time; minimum 2 tasks always running
WebSocket server — scale-up on active connections and connection establishment rate; gradual scale-down
Celery workers — scale on queue depth, task wait time, and worker CPU
Scheduled scaling — pre-scale 15 minutes before known peak hours; separate weekend and weekday profiles

Results

Prodinit delivered the full 12-week engagement on schedule: self-hosted LiveKit voice AI stack live with 5 configurable AI pipelines, infrastructure capable of handling 10x peak load, all slow database queries eliminated, and the Django control plane fully decoupled from voice AI processing.

10x peak load capacity — multi-metric ECS autoscaling handles traffic spikes without proportional cost increase
0 concurrent session limits — self-hosted LiveKit server, agent workers, and egress remove all LiveKit Cloud tier constraints entirely
5 AI pipeline variants deployed and selectable per session — Deepgram STT, Azure OpenAI GPT-4o Realtime, ElevenLabs TTS, Azure Speech, Claude Sonnet, Sarvam, Google Gemini Live, Google Chirp, and Gemini Chat all integrated via factory pattern dispatch
100% of database queries over 500ms eliminated after Silk profiling across all API endpoints and background tasks
Frontend migrated to AWS Amplify with full CI/CD from Git; WebSocket and API servers independently scalable from week 3
Celery task isolation across 3 dedicated worker pools (high-memory, I/O-optimised, CPU-optimised) with queue-depth autoscaling

Frequently Asked Questions

Why migrate from Django WebSocket to LiveKit for voice AI?

Django's synchronous request model creates latency coupling between voice AI processing and general API availability. LiveKit handles WebRTC signalling and media routing natively, so voice sessions run on a dedicated agent worker independent of Django API load. The decoupling also allows the agent infrastructure to autoscale entirely on its own ECS service tier.

Why self-host the LiveKit server rather than use LiveKit Cloud?

LiveKit Cloud-hosted agents are subject to per-tier limits on concurrent sessions, agent deployments, and egress minutes. Self-hosting the full stack — server, agents, and egress — removes all of these limits. The only ceiling becomes ECS CPU and memory, which Prodinit configured to autoscale based on active connections. For platforms with variable concurrent session demand, this approach scales to 10x load without incurring LiveKit Cloud per-minute billing.

How do the 5 AI pipeline variants work?

When Django dispatches an agent job, it sends a JSON metadata payload containing a `simulation_service` field. The agent worker reads this field and instantiates the matching pipeline class — each with its own STT, LLM, and TTS combination. Switching pipelines requires no code change; it is a configuration value passed per simulation session at runtime.

How does the autoscaling handle voice AI specifically?

The LiveKit agent worker ECS service scales on CPU and memory, which track directly with active concurrent voice sessions. The WebSocket server scales on active connection count and connection establishment rate. Scheduled scaling pre-warms capacity 15 minutes before known peak windows and scales down during predictable low-traffic periods — ensuring voice session quality never degrades under load spikes.

How long does a LiveKit voice AI migration from a Django monolith take?

Based on this engagement, a complete migration — including self-hosted LiveKit server and agent infrastructure, 5 AI pipeline variants, Django control plane refactor, ECS autoscaling, database profiling, and CI/CD pipeline — takes approximately 12 weeks. The highest-risk phase is the A/B rollout to LiveKit, where both the old WebSocket path and the new LiveKit path run in parallel until confidence is established.

LiveKit Voice AI Microservice and ECS Modernization for a Sales Simulation SaaS