WebRTC at Scale: How Mandraki Handles Massive Group Calls
How Mandraki's SFU architecture delivers low-latency, encrypted group video calls — and how it scales to hundreds of concurrent sessions across European infrastructure.
Group video calling is one of those features that appears simple to users but involves considerable engineering complexity beneath the surface. A one-to-one call is relatively straightforward — two peers exchange media directly. But as soon as you add a third participant, the architecture decisions multiply, and by the time you reach fifty participants across multiple concurrent sessions, you are dealing with a genuinely hard distributed systems problem.
This post explains how Mandraki’s real-time media architecture is designed for massive scale — without compromising on encryption or data sovereignty.
Why not peer-to-peer
In a peer-to-peer mesh topology, every participant sends their media stream directly to every other participant. For a call with N participants, each person sends N-1 streams and receives N-1 streams. The total number of connections is N*(N-1), and each participant’s upload bandwidth requirement scales linearly with the number of participants.
This works adequately for three or four participants with good internet connections. Beyond that, it falls apart. A participant with a 5 Mbps upload sending a 1.5 Mbps video stream can only serve three peers before saturating their upload. CPU usage for encoding multiple streams climbs rapidly. And network conditions between arbitrary pairs of participants are unpredictable.
Mesh topology is elegant in theory and impractical at scale.
The SFU approach
A Selective Forwarding Unit (SFU) sits at the centre of the call topology. Each participant sends a single upload stream to the SFU. The SFU then forwards that stream to all other participants. The participant’s upload bandwidth is constant regardless of the number of participants — they always send one stream. The SFU handles the fan-out.
The “selective” part is important. The SFU does not simply broadcast every stream to every participant. It makes intelligent forwarding decisions based on network conditions, participant visibility, and available bandwidth. If a participant’s download connection is constrained, the SFU can forward lower-resolution layers. If a participant is not currently visible in the UI (for example, in a large call where only the active speaker is shown full-size), the SFU can reduce or pause forwarding of their stream.
This is fundamentally different from a Multipoint Control Unit (MCU), which decodes all incoming streams, composites them into a single mixed stream, re-encodes it, and sends the composite to each participant. MCUs are CPU-intensive, add encoding latency, and — critically — require access to the plaintext media, making them incompatible with end-to-end encryption.
An SFU forwards encrypted packets without decoding them. This is what makes SFrame-based end-to-end encryption possible in a group call. The SFU routes ciphertext.
Mandraki’s SFU architecture
Mandraki uses an open-source SFU library integrated directly into our infrastructure. Rather than relying on a monolithic third-party media server, we embed the SFU as a library within our own codebase. This gives us full control over how it integrates with our application layer, signalling protocol, and authentication system.
The SFU uses a multi-worker architecture. Each worker is a separate native process that handles the actual media routing at near-native performance. The coordination layer manages workers, transports, and interfaces with our application logic. This separation keeps the media path fast while the control path remains flexible.
The SFU supports simulcast and SVC (Scalable Video Coding), which are essential for bandwidth-adaptive forwarding. Participants’ browsers encode their video at multiple quality levels simultaneously. The SFU selects the appropriate level for each recipient based on their available bandwidth and the UI context.
Scaling to hundreds of sessions
For large-scale deployments, the architecture is designed around horizontal scaling with intelligent session routing.
Multi-instance SFU deployment. Multiple SFU instances run across availability zones. Each instance registers its capacity with a coordination layer. When a new call is created, the system selects the SFU instance with the most available capacity. Participants in the same call are always routed to the same SFU instance for optimal media routing.
Worker-level parallelism. Each SFU process spawns multiple native workers — typically matching the number of available CPU cores. Each worker can handle multiple call rooms independently. This means a single SFU server with 16 cores can efficiently manage dozens of concurrent calls, each with up to 50 participants.
Bandwidth-adaptive forwarding. For calls with many participants, several optimisations kick in automatically. Active speaker detection reduces the number of high-quality streams that need forwarding. Simulcast layer selection becomes more aggressive, favouring lower layers for non-visible participants. Audio-only fallback is available for participants with severely constrained bandwidth.
European multi-zone deployment. Mandraki’s infrastructure runs across multiple availability zones within the EU on European hyperscale infrastructure. SFU instances are deployed close to users, reducing round-trip latency for media packets. All media routing stays within European borders — no media is ever relayed through non-EU infrastructure.
NAT traversal
WebRTC’s greatest strength — direct peer-to-peer connectivity — is also its greatest challenge. Most devices sit behind NAT (Network Address Translation) firewalls that prevent direct inbound connections. WebRTC uses ICE (Interactive Connectivity Establishment) to discover a viable network path, trying direct connection, STUN-mediated connection, and TURN relay in sequence.
Mandraki runs dedicated TURN servers alongside the SFU infrastructure. TURN acts as a relay of last resort — when a participant cannot establish a direct connection to the SFU (due to restrictive firewalls, symmetric NAT, or corporate proxy servers), the media flows through the TURN server. This adds some latency but ensures connectivity.
The TURN servers support both UDP and TCP, plus TLS for environments that only allow HTTPS traffic. Like everything else in our stack, they run entirely within the EU.
End-to-end encryption at scale
The SFU architecture is specifically chosen because it preserves end-to-end encryption compatibility. Using WebRTC Encoded Transforms and the SFrame protocol (RFC 9605), media frames are encrypted on the sender’s device before reaching the SFU. The SFU forwards the encrypted frames to recipients, who decrypt them locally.
The SFU never has access to plaintext media. It routes ciphertext. This means that even at scale — with dozens of participants and multiple concurrent calls — the encryption guarantees hold. No server in our infrastructure ever sees or processes unencrypted audio or video.
This is a fundamental architectural choice. Many platforms claim encryption but use MCU-based architectures that require server-side decoding. Mandraki’s SFU approach means encryption is not just a feature — it is a structural guarantee.
Monitoring and reliability
WebRTC problems are notoriously difficult to diagnose. We collect client-side telemetry including ICE connection state transitions, selected candidate pair types, round-trip time estimates, packet loss rates, and bandwidth estimates. These metrics are batched and sent to our telemetry endpoint for aggregation.
Server-side, the SFU logs transport events, producer and consumer lifecycle, and bandwidth estimation. Combined with client-side telemetry, this provides a comprehensive view of call quality that allows us to identify and address issues systematically.
Real-time communication at massive scale is a deep engineering challenge. We are continuing to invest in the media infrastructure that makes Mandraki’s calls reliable, low-latency, and encrypted end-to-end — all within sovereign European infrastructure.