Behind bakuhantam.tv — Building a Real-Time Live Streaming Platform

I just shipped bakuhantam.tv — a live pay-per-view streaming platform for an Indonesian combat sports event. It went live for its first real event on June 4th, and watching the viewer counter climb while nothing fell over was one of the more satisfying moments I’ve had this year.

The bakuhantam.tv watch page during the event — the live player on the left and the real-time superchat rail of donations on the right

What bakuhantam.tv Actually Is

At a high level, bakuhantam.tv does three things:

Streams live fights to a browser audience on mixed Indonesian mobile networks.
Sells access to those fights through a prepaid coin system, paid via a local Indonesian payment gateway.
Lets the audience participate — donate to fighters, vote in coin-backed polls, and watch a donation milestone ladder fill up in real time.

There’s a viewer-facing site, an admin CMS, an API, and the streaming pipeline — each on its own subdomain, with the public site at bakuhantam.tv. Let me walk through the streaming pipeline first, because that’s the heart of it.

The Streaming Pipeline

The path from a camera at the venue to a phone in someone’s hand looks like this:

OBS → RTMP/SRT → MediaMTX → FFmpeg (ABR transcode) → ./hls/ → Go uploader → Cloudflare R2 → CDN → Viewers

The unusual decision here is the last third: I serve live video off object storage. There’s no origin server streaming to viewers. FFmpeg writes HLS segments to a local folder, a small Go program uploads each file to Cloudflare R2 the instant it’s stable, and viewers pull the playlist and segments straight from R2 through Cloudflare’s CDN.

It scales horizontally without me doing anything — R2 and the CDN absorb the viewer load, not my box. But it only works if the uploader is very careful, because object storage has no concept of “this file is still being written.”

Ingest from a Remote Island

The venue for the first event was about as hostile to live streaming as it gets. It was billed as the “world’s first island fight” — held on a remote island — and the only connectivity out there is high-latency and drops a lot of packets. Big round-trip times, big loss. MediaMTX accepts the stream over either RTMP or SRT, and the conventional wisdom says SRT — purpose-built for exactly these lossy, long-haul links — should be the obvious pick.

In practice, we streamed the venue over RTMP, and it held up better than SRT did. Here’s why: RTMP rides on TCP, and TCP already retransmits lost packets for you. The island link was lossy and high-latency, but it still had enough upload headroom that TCP could quietly absorb all those retransmissions and keep a steady stream flowing. SRT gives you finer manual control over the retransmission budget — which is exactly what you want when the pipe is both lossy and starved for bandwidth — but when there’s spare bandwidth to spend on recovery, plain RTMP over TCP turned out to be the more stable and far simpler choice.

rtmp://<host>:<port>/<app>/<stream-key>

The lesson: don’t assume SRT automatically wins just because the network looks bad on paper. On a lossy-but-not-bandwidth-starved link — even from a remote island with ugly latency and loss numbers — RTMP held up better than we expected. Test both on the actual venue connection before the event; we did, and it’s why we went with RTMP.

Transcoding, and the H.264 Level Trap

For a public Indonesian audience on everything from flagship phones to budget Androids on 3G, adaptive bitrate isn’t optional. FFmpeg re-encodes the incoming feed to a 720p rendition tuned to sit between 3 and 3.5 Mbps.

The non-obvious bit: I pin the H.264 level to 3.1.

-level:v 3.1
-b:v 3000k
-maxrate:v 3500k
-bufsize:v 6000k

H.264 has profiles (feature sets) and levels (resource ceilings). If you don’t pin the level and a complex scene spikes the bitrate, the encoder can quietly bump the stream into Level 4.0+, which older and cheaper decoders simply refuse to play. The viewer gets audio and a black screen, and you have no idea why. Level 3.1 comfortably covers a 720p30 stream at this bitrate and is supported by basically every decoder shipped since 2008. Lock it and the whole class of “works on my phone, black screen on theirs” bugs disappears.

The other small thing that bit me: audio/video desync when the venue uses a separate audio desk. The fix is a one-line audio filter:

aresample=async=1:first_pts=0

async=1 pads or drops audio samples when the audio clock drifts from the video clock, and first_pts=0 anchors the first sample to zero so audio and video share an origin. Without it, the two slowly slide apart over a multi-hour broadcast.

The Go Uploader

This is the program I’m proudest of, because its entire job is to never, ever make a mistake that a viewer would notice. A few of the things it does:

It waits for files to finish writing. FFmpeg is actively writing segment files. If the uploader PUTs a half-written segment, the player stalls. So before uploading a segment it records the file’s size and mtime, waits 200ms, and re-checks. Only if nothing changed does it upload. If the file is still growing, it drops it and picks it up on the next scan.

It never publishes a playlist that points at a missing segment. This is the single most important rule. A .m3u8 manifest references segments by name; if the manifest lands in R2 before the segments it names, viewers get 404s and the player breaks. So the uploader gates every manifest — it parses the playlist, and refuses to upload it until every segment it references is already confirmed in R2:

// A playlist is only uploaded once every file it references is already in R2,
// so the CDN never serves a manifest pointing at a missing segment.
if missing := unsatisfiedRefs(job.RelPath, data, state); len(missing) > 0 {
    waited := state.markDeferred(job.RelPath, now)
    if waited < cfg.ManifestMaxDefer {
        return nil // defer; next scan retries
    }
    // liveness valve: after the deadline, publish anyway
}

There’s a “liveness valve” in there because the long DVR playlist (the full event history, for rewinding) can reference a segment that rotated off local disk before it uploaded — if I gated that strictly, it would block forever. So after a deadline it publishes anyway and accepts one possible 404 rather than freezing the stream.

It belt-and-suspenders the file watching. It uses fsnotify for millisecond detection of new files, and it rescans the whole HLS directory every 2 seconds as a safety net, because filesystem events get dropped — especially on Docker bind-mounts. If fsnotify misses something, the scan catches it. Files already uploaded are skipped via a state file, so the rescan is nearly free.

It survives crashes. Every uploaded object is recorded in a state.json (keyed by path, size, mtime) that’s flushed every few seconds with an atomic temp-file-then-rename. If the container restarts mid-event, the new process reads the state and resumes instead of re-uploading thousands of segments.

It sets cache headers per file type. This is what makes serving live video off a CDN actually work:

File	Cache-Control	Why
Live manifest	`max-age=1`	rewritten constantly; must refresh fast
Standard manifest	`max-age=2`	rewritten each segment
Segments	`max-age=31536000, immutable`	written once, never change

Segments are immutable, so they cache forever and the CDN hit rate stays north of 99%. Only the tiny manifest files are ever re-fetched. That’s the whole trick to CDN-backed live streaming.

Not Streaming to an Empty Room

Here’s a failure mode that’s easy to miss: someone opens a paid replay, then walks away. hls.js will happily buffer minutes ahead and keep fetching segments for hours, even though nobody is watching.

So the player has a “play-safe” guard:

const PLAYSAFE_AFK_MS = 45 * 60_000   // 45 min idle → prompt
const PLAYSAFE_HIDDEN_MS = 2 * 60_000 // 2 min tab hidden → pause

If the tab is hidden for two minutes, or there’s been no interaction for 45, it pauses and calls hls.stopLoad() to stop prefetching segments entirely, then shows a “Masih nonton?” (“Still watching?”) prompt. The player only does work when there’s actually someone watching.

The Coin Economy and Getting Paid

Access to fights is sold in prepaid coins. You buy coins, you spend coins to unlock a fight, donate to a fighter, or vote in a poll. Payments go through DOKU, a major Indonesian payment gateway that handles local methods like QRIS, virtual accounts, and e-wallets — the things people here actually pay with. Credit cards are an afterthought in this market.

Two design decisions carried most of the weight here.

An append-only ledger, with a denormalized balance. Every coin movement is written as an immutable row in a coin_ledger table — the source of truth and a full audit trail. The user’s coinBalance column is just a cached number for fast reads. Both are mutated inside the same locked transaction, so they can never drift:

const entry = em.create(CoinLedger, {
  user,
  delta: tx.coins,
  balanceAfter: user.coinBalance,
  type: CoinLedgerType.PURCHASE,
  referenceId: tx.id,
});
await em.persistAndFlush([user, tx, entry]);

This is the classic accounting pattern, and it’s worth the small extra write. When someone messages “I paid and didn’t get my coins,” I can reconstruct exactly what happened from the ledger.

An idempotent payment webhook. Payment gateways retry webhooks, and they sometimes deliver the same one twice. If your handler isn’t idempotent, you double-credit coins — which means giving away money. The DOKU webhook locks the user row, checks whether the transaction is already marked paid, and bails out early if so:

if (tx.status === CoinTransactionStatus.PAID) {
  return { ok: true, idempotent: true } // already applied, do nothing
}
// else: lock user row, credit coins, append ledger entry, commit

The signature on the webhook is verified with a constant-time comparison, and the timestamp is checked against a ±5-minute window to reject replays. The payment is real money; the verification has to be boring and correct.

One smaller thing I’m glad I did: DOKU’s free-text fields reject a surprising number of characters (no spaces, no parentheses, no ampersands). A stray character in a customer’s name returns a 400 mid-checkout — the worst possible time to fail. So everything user-supplied gets sanitized down to DOKU’s allowlist before the request goes out. Defensive, ugly, never fails.

On-Stream Donation Alerts, Saweria-Style

The most-loved feature is a “superchat”: donate with a message and it appears as an alert right on the broadcast — the moment Indonesian audiences know from Saweria and YouTube Super Chat. Bigger donations map to higher tiers with a bigger, longer-lived alert.

The interesting part is the delivery. When a donation lands, the backend fans it out over a websocket to a public superchat:new room, and anything subscribed updates instantly:

socket.on('connect', () => socket.emit('superchat:join'))
socket.on('superchat:new', (sc) => cb.current(sc))

The alert is rendered by a dedicated, chromeless overlay page — its own route, separate from the main app — that the streaming software loads as a browser source and composites on top of the video. So the alert is burned into the broadcast everyone sees, not just something in the donor’s own browser. The overlay takes query params like holdMs (how long an alert stays up) and only shows donations that arrive after it loads, so reconnecting mid-event doesn’t replay old alerts.

A couple of details that keep it robust: the socket is read-only (it only ever emits the join), and every consumer keeps a 2-second polling fallback, so a dropped websocket degrades to “slightly delayed” instead of “broken.” Plain donations and superchats are also kept in separate queues, so the same donation never shows up in two overlays at once.

Client-Side Load Balancing

The whole audience arrives in the same few minutes, so on event night the API runs on a pool of identical nodes — api1.bakuhantam.tv, api2.bakuhantam.tv, api3.bakuhantam.tv, and so on — rather than a single box. Rather than put a load balancer in front, I let the client pick — similar in spirit to how the big streaming services route you to whichever edge can serve you best instead of funnelling everyone through one door.

When high-availability mode is on, the app probes every node’s health endpoint in parallel on page load, and routes itself to the first healthy one. So a viewer hitting bakuhantam.tv might be quietly served by api2.bakuhantam.tv while their neighbour lands on api3.bakuhantam.tv:

const HA_PROBE_TIMEOUT_MS = 3000 // hard cap per node
const HA_SPREAD_WINDOW_MS = 600  // grace after the first reply

The details that make it robust:

It returns as soon as a node answers, after a short grace window so a couple of peers can reply too — then it spreads clients across all of them instead of stampeding the single fastest node.
Every probe self-caps at 3 seconds, so one slow or hung node never delays the page render. A dead node just loses the race.
It never caches the result. Probing fresh on every load is cheap, and a cache would be actively harmful — it could pin a client to a node that later goes down. Fresh probing self-corrects.
If nothing answers, it falls back to the default endpoint rather than failing.

The upshot: a node can fall over mid-event and almost nobody notices. New visitors are routed around it automatically, and there’s no single load balancer to become its own point of failure.

Built for the Local Audience

Small details decide whether a platform feels native or foreign. Phone parsing, for example, has a deliberate fallback: a bare local number is coerced to the Indonesian international format, because that’s how people here type their numbers and how older accounts were stored. Meeting users where they are beats being technically correct and locking them out.

What I’d Tell Anyone Building Something Like This

The lesson that kept repeating across bakuhantam.tv: a live event has no second take. You can’t roll back a broadcast. So the entire system is built around graceful degradation — publish the manifest even if one segment is missing, recover the player in place instead of restarting it, let a stuck user re-check their own status. Every component asks “what’s the least-bad thing to do if my dependency is down right now?” and does that, rather than failing hard.

The other lesson is that serving live HLS off object storage is a genuinely great architecture for events — horizontally scalable and operationally simple — if you put the care into the uploader. All the cleverness lives in that one small Go program making sure the CDN never sees an inconsistent state.

If you’re building something in this space — live streaming, a payments-backed product, or anything that has to not fall over during a one-shot event — I’m available for freelance work. Get in touch.