How AI Speaker Diarisation Works

Split a transcript by speaker. Useful for interviews and podcasts — labels every line "Speaker 1 / Speaker 2 / ...".

Step-by-step process

1
Add your audio
Drop or select the audio you want to process — it stays on your device.
2
Run the model in-browser
AI Speaker Diarisation loads its model (~100 MB) once, caches it, then runs locally in a worker. No upload.
3
Download the srt
Preview the result and download the srt. Re-run with different settings anytime.

100% private by design. All processing runs in your browser — your files never leave your device. No account required.

Frequently asked questions

How many speakers?

It separates voices using activity detection + speaker embeddings; best with a few distinct speakers.

Is it free to use?

Yes — AI Speaker Diarisation is completely free. No account, no watermark, no credits, and no usage limits.

Do my files or prompts ever leave my device?

No. Everything runs locally in your browser via WebAssembly/WebGPU — there is no server that receives your files, prompts, or results.

Which browser and hardware do I need?

A modern browser. Chrome and Edge get WebGPU acceleration for the fastest results; Firefox and Safari run via WebAssembly. The model (~100 MB) downloads once, then is cached for offline use.

Can I use the results commercially?

Yes. You own everything you create — NSS makes no claim to the images, videos, or text you process or export.

Does it work on mobile?

Lightweight tools run on phones; heavier models prefer a desktop with a GPU. The tool picks the best path for your device and falls back gracefully where needed.

Where can I see a step-by-step guide?

Yes — there is a full walkthrough at /how-it-works/ai-speakers.

← See all tools explained

Step-by-step process

Add your audio

Run the model in-browser

Download the srt

Frequently asked questions

Related tools