WebRTC: P2P, SFU, MCU and All You Need to Know About Them

Want to add live video to your app? WebRTC can do the job, but the setup can make a big difference. Here’s a quick guide to P2P, SFU, and MCU to help you pick the right setup for your project.

What is WebRTC?

WebRTC (Web Real-Time Communication) is a browser-based protocol that enables low-latency video, voice, and data sharing without any external dependencies.

Many applications use WebRTC for real-time communication, such as video conferencing, online gaming, and file sharing. To enable secure and real-time communication, WebRTC combines several technologies to fulfill following purposes:

Signaling — uses SDP (Session Description Protocol) for offer/answer negotiation
Connecting — utilizes STUN/TURN protocols to navigate NATs and firewalls.
Securing — SRTP (Secure Real-time Transport Protocol) is used to encrypt media streams
Communicating — uses RTP (Real-time Transport Protocol) and SCTP for delivering media and data packets.

With all of the above concerns addressed, two (or more) endpoints can begin communicating. However, the right architecture depends entirely on your use case — and that’s where things get interesting.

Peer-to-Peer (P2P) — the simplest setup

The easiest approach we can take is to use WebRTC in a peer-to-peer (P2P) architecture. The setup is simple, it doesn’t require any additional servers, as the peers communicate directly with each other. This comes with tradeoffs, as each peer must be able to connect directly to each other. It’s best suited for one-on-one calls or file sharing. The total number of connections grows exponentially with the number of peers, and the bandwidth gets used up quickly.

MCU (Multipoint Control Unit) — centralized media processing

What if you want to stream the same content to multiple people? That’s where additional server can come in handy!

Multipoint Control Unit (MCU) is a server that can receive media streams from all participants, process them (e.g., by mixing video/audio into a single composite stream), and then send one unified stream back to each user. This optimizes the bandwidth usage, as each participant only needs to send and receive one stream. Additionally, the server can apply layouts, overlays or other effects to the stream. The downside is that heavy server-side processing can be costly and add latency.

SFU (Selective Forwarding Unit) — scalable and flexible

What is SFU in WebRTC? The Selective Forwarding Unit (SFU) is another type of server that can be used to stream between peers. In this scenario, all peers stream directly to the SFU, which selectively forwards individual streams to other peers, based on dynamic rules. This provides more flexibility because we can save bandwidth by deciding which streams to forward at any given time.

For example, you can have a conference with a hundred people, but each participant only receive camera streams of people that are currently speaking. This Selective Forwarding Unit approach is best suited for video conferencing, virtual events or group collaboration tools.

WebRTC SFU vs MCU

Both SFU and MCU use a central server, but they handle streams differently.

An MCU mixes all incoming streams into one and sends the same output to every participant. It reduces client-side work and bandwidth but adds latency and server cost.

On the other hand, an SFU forwards streams selectively — without mixing — so each participant only receives what’s needed. It’s more efficient and scales better.

So, WebRTC SFU vs MCU is a tradeoff between simplicity (MCU) and flexibility (SFU).

Is it hard to implement WebRTC?

In short: yes — WebRTC is a complex technology and implementing it by yourself can feel like going down a rabbit hole. While it hides a lot of complexity under the hood, building production-ready video features involves tough challenges:

Network instability
NAT traversal failures
Dynamic bandwidth adaptation
Voice activity detection
Simulcast support and track prioritization

If your goal is to build a great product, not to become a video infrastructure engineer, there are better options than reinventing the wheel.

Meet Fishjam — a live streaming and video conferencing API

captionless image

If you’re looking for a quick and reliable solution, Fishjam provides a developer-friendly API for live video and conferencing.

Fishjam is an out-of-the-box low-latency live streaming and video conferencing API that comes equipped with React Native and React SDK, allowing you to set up videoconferencing or live streaming in your app in minutes.

With Fishjam, this snippet is all you need to join a call and share your camera:

import { useCamera, useConnection } from "@fishjam-cloud/react-native-sdk";
...
const { prepareCamera } = useCamera();
const { joinRoom } = useConnection();
useEffect(() => {
  (async () => {
    await prepareCamera({ cameraEnabled: true });
    await joinRoom(url, peerToken);
  })();
}, [prepareCamera, joinRoom]);

WebRTC gives you a lot of power, but it’s not easy to get right. If you don’t want to deal with media servers, setup, and edge cases, Fishjam lets you skip the hard parts and start building fast. And if you need help along the way, feel free to reach out to us at projects@swmansion.com.

We’re Software Mansion: multimedia experts, AI explorers, React Native core contributors, community builders, and software development consultants.