WebRTC: P2P, SFU, MCU and All You Need to Know About Them

by Adrian Czerwiec • Jun 3, 2025 • 4 min read
WebRTC: P2P, SFU, MCU and All You Need to Know About Them cover image

Want to add live video to your app? WebRTC can do the job, but the setup can make a big difference. Here’s a quick guide to P2P, SFU, and MCU to help you pick the right setup for your project.

What is WebRTC?

WebRTC (Web Real-Time Communication) is a browser-based protocol that enables low-latency video, voice, and data sharing without any external dependencies.

Many applications use WebRTC for real-time communication, such as video conferencing, online gaming, and file sharing. To enable secure and real-time communication, WebRTC combines several technologies to fulfill following purposes:

  • Signaling — uses SDP (Session Description Protocol) for offer/answer negotiation
  • Connecting — utilizes STUN/TURN protocols to navigate NATs and firewalls.
  • Securing — SRTP (Secure Real-time Transport Protocol) is used to encrypt media streams
  • Communicating — uses RTP (Real-time Transport Protocol) and SCTP for delivering media and data packets.

With all of the above concerns addressed, two (or more) endpoints can begin communicating. However, the right architecture depends entirely on your use case — and that’s where things get interesting.

Peer-to-Peer (P2P) — the simplest setup

The easiest approach we can take is to use WebRTC in a peer-to-peer (P2P) architecture. The setup is simple, it doesn’t require any additional servers, as the peers communicate directly with each other. This comes with tradeoffs, as each peer must be able to connect directly to each other. It’s best suited for one-on-one calls or file sharing. The total number of connections grows exponentially with the number of peers, and the bandwidth gets used up quickly.

MCU (Multipoint Control Unit) — centralized media processing

What if you want to stream the same content to multiple people? That’s where additional server can come in handy!

Multipoint Control Unit (MCU) is a server that can receive media streams from all participants, process them (e.g., by mixing video/audio into a single composite stream), and then send one unified stream back to each user. This optimizes the bandwidth usage, as each participant only needs to send and receive one stream. Additionally, the server can apply layouts, overlays or other effects to the stream. The downside is that heavy server-side processing can be costly and add latency.

SFU (Selective Forwarding Unit) — scalable and flexible

What is SFU in WebRTC? The Selective Forwarding Unit (SFU) is another type of server that can be used to stream between peers. In this scenario, all peers stream directly to the SFU, which selectively forwards individual streams to other peers, based on dynamic rules. This provides more flexibility because we can save bandwidth by deciding which streams to forward at any given time.

For example, you can have a conference with a hundred people, but each participant only receive camera streams of people that are currently speaking. This Selective Forwarding Unit approach is best suited for video conferencing, virtual events or group collaboration tools.

WebRTC SFU vs MCU

Both SFU and MCU use a central server, but they handle streams differently.

An MCU mixes all incoming streams into one and sends the same output to every participant. It reduces client-side work and bandwidth but adds latency and server cost.

On the other hand, an SFU forwards streams selectively — without mixing — so each participant only receives what’s needed. It’s more efficient and scales better.

So, WebRTC SFU vs MCU is a tradeoff between simplicity (MCU) and flexibility (SFU).

Is it hard to implement WebRTC?

In short: yes — WebRTC is a complex technology and implementing it by yourself can feel like going down a rabbit hole. While it hides a lot of complexity under the hood, building production-ready video features involves tough challenges:

  • Network instability
  • NAT traversal failures
  • Dynamic bandwidth adaptation
  • Voice activity detection
  • Simulcast support and track prioritization

If your goal is to build a great product, not to become a video infrastructure engineer, there are better options than reinventing the wheel.

Meet Fishjam — a live streaming and video conferencing API

If you’re looking for a quick and reliable solution, Fishjam provides a developer-friendly API for live video and conferencing.

Fishjam is an out-of-the-box low-latency live streaming and video conferencing API that comes equipped with React Native and React SDK, allowing you to set up videoconferencing or live streaming in your app in minutes.

With Fishjam, this snippet is all you need to join a call and share your camera:

import { useCamera, useConnection } from "@fishjam-cloud/react-native-sdk";
...
const { prepareCamera } = useCamera();
const { joinRoom } = useConnection();

useEffect(() => {
(async () => {
await prepareCamera({ cameraEnabled: true });
await joinRoom(url, peerToken);
})();
}, [prepareCamera, joinRoom]);

WebRTC gives you a lot of power, but it’s not easy to get right. If you don’t want to deal with media servers, setup, and edge cases, Fishjam lets you skip the hard parts and start building fast. And if you need help along the way, feel free to reach out to us at projects@swmansion.com.

We’re Software Mansion: multimedia experts, AI explorers, React Native core contributors, community builders, and software development consultants.

Related Articles

WebRTC vs HLS — Which One Is Better for Your Streaming Project? cover image

WebRTC vs HLS — Which One Is Better for Your Streaming Project?

by Piotr Wodecki • Oct 6, 2025 • 7 min read

WebRTC vs. HLS — Which One Is Better for Your Streaming Project? Are you a developer who just got tasked with implementing streaming in your app? Perhaps you are wondering whether it’s worth the engineering effort at all? In this article, we’ll guide you through making the right choice between two most common streaming technologies: WebRTC vs. HLS. TL;DR Use WebRTC for latency-sensitive use cases, and HLS for everything else. Instead of spending months building your custom WebRTC solutions and pulling your hair out — use providers. If your application requires sub-second latency for interactive experiences (video calls, remote control, live auctions), WebRTC should be your choice. For traditional streaming scenarios where a few seconds of delay is acceptable, HLS will save you significant time, money, and engineering headaches. Understanding WebRTC and HLS Before we get into the pros and cons, let’s dive into how WebRTC and HLS actually work. That way, you can make better decisions, even in situations this article doesn’t cover. WebRTC — the real-time champion WebRTC is a peer-to-peer set of standards and technologies designed from the ground up for real-time communication. It uses UDP transport (via RTP/RTCP), implements sophisticated congestion control, and includes built-in mechanisms for NAT traversal. WebRTC establishes direct connections between peers when possible, falling back to relay servers (TURN) when necessary. The standard handles everything from codec negotiation to adaptive bitrate streaming, making it incredibly powerful but also complex. It includes mandatory encryption (DTLS-SRTP), supports both audio and video streams, and can even handle arbitrary data channels. However, this complexity is both its strength and its weakness. As WebRTC is a peer-to-peer protocol, it works great for one-on-one calls, but fails to scale to larger meetings by itself. This is where media servers like SFUs (Selective Forwarding Units) come into play. You can find more details in the linked article, but long story short, as the number of calls increases, it becomes harder to manage on your own. In short, WebRTC is great for low latency, but you should consider using managed providers for this purpose, as the raw technology may be too complex for you to handle. To learn more about WebRTC and SFU , check our article. HLS — scalable workhorse HLS takes a fundamentally different approach. Instead of maintaining persistent connections, it segments video into small chunks (typically 2–15 seconds) and serves them over standard HTTP. Clients download a manifest file that lists available segments and quality levels, then fetches segments as needed. This HTTP-based approach means HLS works excellently with existing CDN infrastructure, passes through firewalls and proxies without issues, and scales to millions of viewers with standard web infrastructure. The simplicity of “just files over HTTP” makes HLS remarkably robust and widely compatible. The immediate downside is that the latency and synchronization vary widely depending on the chosen chunk size. This can be a dealbreaker for most applications that require users to be in sync and have the freshest video feed possible. The only transcoding that needs to happen is when the video is initially received. RTMP is usually chosen for this purpose, which requires transcoding into HLS chunks for distribution. This is not that big of a deal, as the computation required scales linearly with the number of streams, not the viewers, as with WebRTC. What to consider when choosing between WebRTC and HLS? WebRTC and HLS are almost polar opposites in their characteristics, which makes choosing between them straightforward, provided you understand your requirements. Latency — the defining difference HLS (2–30+ seconds) The segment-based approach of HLS introduces unavoidable latency. Even with optimized low-latency HLS (LL-HLS), you’re looking at 2–5 seconds minimum. Traditional HLS implementations often have 15–30 seconds of delay due to segment size, buffering requirements, and CDN caching. WebRTC (< 500ms) WebRTC delivers sub-second latency consistently, often achieving glass-to-glass delays under 200ms in optimal conditions. This isn’t just “nice to have” for interactive applications — it’s essential. Try having a conversation with even 2 seconds of delay, and you’ll understand why video conferencing platforms universally choose WebRTC. Synchronization — keeping viewers together HLS can have significant drift between viewers Because each viewer fetches segments independently, HLS streams naturally drift apart. Two viewers watching the same “live” stream might be 10–20 seconds apart. Implementing true synchronization requires additional complexity, like synchronized playback timestamps or external coordination mechanisms. WebRTC can keep everyone on the same page WebRTC’s real-time nature means all participants experience the stream simultaneously. Synchronization happens naturally without additional engineering effort. This is critical for applications such as watch parties, live auctions, or interactive broadcasts , where viewers need to experience events simultaneously. How do they scale? HLS works well with modern infrastructure HLS scales effortlessly to millions of concurrent viewers using standard CDN infrastructure. Adding viewers is simply a matter of serving files to more people — no different from scaling a website. CDNs handle this automatically with edge caching, making global distribution straightforward and cost-effective. The only other thing that you need to handle is transcoding to HLS, as you’ll probably receive the data via RTMP. You can sidestep this issue entirely by using an external service provider, but doing that on your own shouldn’t be too hard. WebRTC is challenging to do on your own WebRTC’s peer-to-peer nature doesn’t naturally scale beyond small groups. Broadcasting to many viewers requires either mesh networks, SFU servers (Selective Forwarding Units), or MCU servers (Multipoint Control Units). Each approach requires significant infrastructure and careful capacity planning. Scaling WebRTC to hundreds of viewers is a very challenging task, and you should know that implementing it on your own can cost you lots of money and dev time. WebRTC scalability is mostly a solved issue, and it’s worth paying for a service that handles this complexity for you. WebRTC vs. HLS — what about the costs? HLS is as cheap as it gets HLS leverages commodity CDN bandwidth, typically costing $0.01–0.05 per GB . The infrastructure can be as simple as a single HTTP server. Operational overhead is minimal since it’s essentially static file serving. WebRTC can get expensive WebRTC requires specialized media servers that process streams in real-time. You’re paying for compute, not just bandwidth. Costs can be 10–100x higher than HLS for the same number of viewers. Additionally, you need STUN/TURN servers for NAT traversal, which adds complexity and cost. The engineering expertise required to operate WebRTC infrastructure reliably is also a significant hidden cost. Paid services usually charge per minute of connection of a single peer and can range from $0.001 to $0.01 per peer connection-minute, which is significantly higher than HLS. There is one special case to mention here — when you just need one-on-one calls. In this case, you can get the system working basically for free. Many applications showcase this, from free P2P file transfer, video calls, to streamer utilities. As long as you don’t need too many participants and server-side control over the call, you can just use the browser’s WebRTC implementation. Ease of implementation and developer experience HLS is as simple as HTTP Implementing HLS playback requires a few lines of code using standard video players. Most platforms have native support (iOS, Android, smart TVs). The streaming pipeline is well-understood: encode, segment, upload to CDN, done. Debugging involves checking HTTP requests and examining manifest files. WebRTC — challenging without abstractions WebRTC implementation, on the other hand, is really complex. You may need to debug: ICE candidate gathering and exchange STUN/TURN server configuration Codec negotiations and SDP offer/answer dance Network topology changes and reconnection logic Browser-specific quirks and compatibility issues Media server scaling and load balancing Experienced teams often spend months or years getting WebRTC right. It’s definitely not a walk in the park. On the other hand, the difficulty of using a set of SDKs for an existing WebRTC provider simply depends on their quality, and can be as easy as an HLS implementation. Making the decision — choose a framework tailored to your project HLS and WebRTC implementation HLS implementation is straightforward enough to build yourself. Depending on your use case and business model, you may still opt for a more managed solution, especially because of the transcoding required on the ingest point from RTMP to HLS. It’s hard to go wrong with any reputable service here, but they may have subtle differences that you should consider. The most important factors to consider when implementing HLS are the chunk size and the CDN provider. Both will directly affect the latency, synchronization, and cost of the solution. We would recommend you steer clear of LL-HLS, as support for this standard can be lacking across platforms, and tuning the segment size will suffice most of the time. But what about WebRTC? Trust me, you don’t want to build your own WebRTC infrastructure. The time and expertise required to make it work, even from existing open-source software, can quickly become expensive. Instead, use managed providers. Infrastructure providers handle the complexity of WebRTC behind abstractions that are much easier to use. When looking for WebRTC providers, you should consider their reliability, developer experience, costs, and any additional features that you may need. Fishjam — making WebRTC truly accessible to developers At Software Mansion, we’ve built Fishjam to make WebRTC easy to use for everyone. We have combined knowledge from our multimedia-focused projects, such as Membrane , Smelter , and Elixir WebRTC , and React Native open-source libraries like Reanimated , Screens , and Gesture Handler to craft the best solution, allowing you to add real-time streaming to your mobile and web applications. The journey to release Fishjam took us two years, and it took a lot of existing knowledge from many experienced people. After all that, we can confidently say that WebRTC is haaard , but the good news is that you don’t have to experience that yourself. Fishjam features: Easy-to-use SDKs Seamless integration with AI voice agents, transcription, and moderation services First-party integration with Smelter for programmable transitions, overlays, and interactive experiences (see our demo and the article about it) All at a very fair price You can learn more and try out Fishjam for free at fishjam.io . And if you need any help along the way, don’t hesitate to contact us at contact@fishjam.io. We’re Software Mansion : multimedia experts, AI explorers, React Native core contributors, community builders, and software development consultants. WebRTC vs HLS — Which One Is Better for Your Streaming Project? was originally published in Software Mansion on Medium, where people are continuing the conversation by highlighting and responding to this story.

Building Interactive Streaming Apps: WebRTC, WHIP & WHEP Explained cover image

Building Interactive Streaming Apps: WebRTC, WHIP & WHEP Explained

by Karol Konkol • Jul 8, 2025 • 5 min read

Live streaming has become extremely popular, especially for broadcasting live sports and streaming on platforms like YouTube and Twitch. Today, it’s easy to watch major events online, and that’s starting to take over from traditional forms like watching on TV. The vast majority of streaming platforms use a technology called HTTP Adaptive Streaming (HAS), which enables delivering video to millions of viewers with only a few seconds of latency. But if you want truly real-time, interactive experience, there are a few things to consider. Going beyond traditional TV — WebRTC for new applications If you’re trying to move past the limitations of traditional TV and build something better — with significantly less delay — or if you want to use live streaming in totally new ways, there are plenty of options to consider. Just think about creators being able to interact with the audience in real time, or people joining an online auction and actually participating as it happens. Protocols like HLS (HTTP Live Streaming) or DASH (Dynamic Adaptive Streaming over HTTP) are ideal for video-on-demand or mass broadcasts. However, when low latency streaming is critical and when you can’t afford a delay of even one second, HLS and DASH typically can’t keep up. Fortunately, there is the WebRTC standard, which is perfectly suited for applications that require ultra-low latency and almost perfect synchronization. What is WebRTC and why does it matter? WebRTC was designed for direct communication with very low latency — around 200 milliseconds — allowing natural interactions between participants. The standard is supported by browsers, which simplifies the integration of WebRTC-based solutions. It’s worth keeping in mind that achieving such low latency makes WebRTC a complex solution that incorporates many different protocols such as ICE, TURN, STUN, DTLS, SDP, and many others. This, in turn, often translates into troublesome implementation, debugging, and usage. Plus, WebRTC doesn’t actually specify how connections are established — so, theoretically, they could even involve carrier pigeons. As a result, each implementation of a WebRTC-based solution varies, which adds complexity and means each streaming server needs its own approach to setting up connections. WHIP & WHEP protocols — simplifying WebRTC for broadcasting In broadcasting scenarios, the connection establishment has been standardized by the WHIP and WHEP protocols, which clearly define how the process should work. These protocols rely on the HTTP protocol and strictly specify the sequence of messages sent to establish connections, making the process easier. WHIP (WebRTC-HTTP ingestion protocol) focuses on stream ingestion , allowing broadcasters to send live media to WebRTC-compatible servers by leveraging HTTP(S) for signaling. This simplifies the setup for sending live streams over the internet. On the other end, WHEP (WebRTC-HTTP egress protocol) deals with the distribution aspect , enabling straightforward delivery of live streams from WebRTC servers to the end-users. This helps reduce latency and improve quality of service by keeping the streaming experience stable and consistent. By leveraging WHIP and WHEP, you can use the same implementation, applications, or players to receive streams from different providers. This standardization eliminates the need for custom solutions for each provider, lowering development costs and technical barriers for both broadcasters and content creators. As a result, these protocols offer a streamlined framework for efficiently managing the entire live streaming lifecycle — from broadcast to viewer. Building a live streaming app: should you create your own infrastructure or use third-party providers? When creating a streaming product, you have to decide whether to build your own infrastructure or rely on an external provider. Setting up and maintaining your own system can be complex and expensive, so many choose to go with third-party providers instead. However, if you decide to build your own infrastructure, the first challenge is dealing with the WebRTC standard, which can be quite complex when it comes to transmitting multimedia. You basically have two options here: Implementing the standard yourself, which gives you full control over the library and its features. However, this scenario requires having a dedicated and skilled team for implementation and maintenance. Using an existing library and building a multimedia server based on it. Although this is a simpler scenario, it still requires a skilled team capable of effectively debugging WebRTC. And that’s often quite challenging. Once your multimedia server is up and running, new challenges pop up — like keeping the quality high, scaling smoothly, and keeping latency low. This means managing geolocation, routing streams across clusters and instances, scaling your multimedia servers properly, and handling all the usual headaches of running a distributed infrastructure. Sounds complex? Well, because it is. But external providers can take these burdens off your shoulders by delivering reliable, high-quality multimedia services and offering tools that make it easier to integrate WebRTC into your custom apps. WebRTC for broadcasting made simple with Fishjam One of the tools that makes live streaming and broadcasting easier is Fishjam. It’s a live-streaming and video conferencing API designed to seamlessly integrate WebRTC into your application, removing the complexity of custom WebRTC implementations and the need for deep multimedia expertise. With ready-to-use React and React Native SDKs, adding live streaming to your app is simple and user-friendly. Plus, thanks to integration with tools like Smelter , it offers much more than just basic streaming. To sum up, Fishjam lets you skip the hard parts and start building fast. Don’t wait, check the website, and be sure to give it a try! And if you need help along the way, feel free to reach out to us at projects@swmansion.com . Want to learn more about Fishjam and its capabilities? Check our video: https://medium.com/media/676f4cfd4d963050a698d65fa0a3c21a/href To explore other applications of WebRTC, read our article on Real-Time Gesture Recognition ! We’re Software Mansion : multimedia experts, AI explorers, React Native core contributors, community builders, and software development consultants. Building Interactive Streaming Apps: WebRTC, WHIP & WHEP Explained was originally published in Software Mansion on Medium, where people are continuing the conversation by highlighting and responding to this story.