Technical Deep DivesJune 8, 20266 min read

We Run the Model in Your Browser Instead of Our Server. Here's the Real Tradeoff.

Client-side background removal is a set of concrete engineering tradeoffs, not just a privacy slogan. What you gain (privacy, zero marginal cost, offline) and what we pay (download size, device variance) and how we manage each.

"100% in your browser" is easy to put on a landing page. The interesting part is what that architecture actually costs us as builders — because every one of those costs is a decision a server-based competitor never has to make.

What client-side buys you

Your images never leave the device. There is no upload step, so there's no copy of your photo on our infrastructure to leak, subpoena, or accidentally log. For product shots under embargo or personal photos, that's the whole point.
Zero marginal cost per image. A server tool pays GPU-seconds for every background removed. We pay once to ship the model file; your device does the compute. That's why we can offer it free without a "10 images/month" gate.
It works offline. Once the model is cached, the tool runs on a plane.

What it costs us — and how we handle each

1. The model has to be downloaded. A server keeps the weights; we have to send them. Our fast model is ~80 MB; best-quality (BiRefNet) is far heavier. We mitigate with: download-on-demand (you only fetch best-quality if you choose it), Cache Storage so it's a one-time cost, and honest size labels in the consent UI. During this sprint we caught our own catalog mislabeling the SD-Turbo text-encoder as 1.7 GB when it's actually ~650 MB (and the U-Net as 640 MB when it's ~1.65 GB) — the sizes were transposed. Wrong sizes erode trust and break the download progress bar, so we fixed them against the real byte counts.

2. Device variance is now our problem. A server runs one known GPU. We run on whatever the visitor has — including machines with no real GPU, where WebGPU exposes a software fallback adapter that's slower than WASM. So we detect adapter quality and refuse silently-degraded GPU runs, routing to a multi-threaded WASM path instead.

3. A failed download looks different. On a server, a truncated model fetch fails loudly in your own logs. In a browser, ONNX Runtime will happily build a session with no inputs from a partial file, and the error only surfaces three tools downstream as a cryptic undefined. We added byte-length validation before session creation and a hard assertion that the session actually has input and output names — so a dropped connection fails at the source with a clear message, not as a mystery later.

The tradeoff, stated plainly

Server-side is simpler to operate and gives you one predictable runtime. Client-side gives the user privacy and free unlimited use, and hands the builder a harder job: model delivery, device detection, and failure handling all move into the browser. We think that trade is worth it — but only if you do the unglamorous parts (integrity checks, EP telemetry, honest sizing) instead of just shipping the weights and hoping.

client-side ml privacy webgpu model delivery architecture

Found this useful?

← Back to Blog