AI Speaker Diarisation

Split a transcript by speaker. Useful for interviews and podcasts — labels every line "Speaker 1 / Speaker 2 / ...".

Model size: ~100 MB·Typical speed: ~8.0 s·Tier: standardBeta

Mode	Speed	Quality	Best for
VAD + embeddings	Fast	Who-spoke-when labels	Interview/podcast diarisation

AI Speaker Diarisation needs the standard AI tier (0.1 GB model). You’re currently opted into none.

Quick start

Drop audio files here — add as many as you like

or click to browse · processed one at a time · output srt, json

Tier opt-in required before this capability runs.

Want this capability inside the main editor with layers, history, and the full AI panel?

Open in Image Editor →Open in Video Editor →

How it works

1
Add your audio
Drop or select the audio you want to process — it stays on your device.
2
Run the model in-browser
AI Speaker Diarisation loads its model (~100 MB) once, caches it, then runs locally in a worker. No upload.
3
Download the srt
Preview the result and download the srt. Re-run with different settings anytime.

Common use cases

Labelling speakers in interviewsPodcast diarisationMeeting transcripts by speakerTwo-person dialogue editing

Why it’s different

100% Private

Every model runs in your browser. Your files never leave your device — nothing is uploaded to a server.

True Alpha Channel

Exports preserve a real straight-alpha transparency channel (PNG / WebP / AVIF), not a baked-on background.

Free Forever

No account, no watermark, no credits. Open the tool and use it.

Works Offline

After the model downloads once it is cached, so the tool keeps working with no connection.

FAQ

How many speakers?

It separates voices using activity detection + speaker embeddings; best with a few distinct speakers.

Is it free to use?

Yes — AI Speaker Diarisation is completely free. No account, no watermark, no credits, and no usage limits.

Do my files or prompts ever leave my device?

No. Everything runs locally in your browser via WebAssembly/WebGPU — there is no server that receives your files, prompts, or results.

Which browser and hardware do I need?

A modern browser. Chrome and Edge get WebGPU acceleration for the fastest results; Firefox and Safari run via WebAssembly. The model (~100 MB) downloads once, then is cached for offline use.

Can I use the results commercially?

Yes. You own everything you create — NSS makes no claim to the images, videos, or text you process or export.

Does it work on mobile?

Lightweight tools run on phones; heavier models prefer a desktop with a GPU. The tool picks the best path for your device and falls back gracefully where needed.

Where can I see a step-by-step guide?

Yes — there is a full walkthrough at /how-it-works/ai-speakers.

Ready to try AI Speaker Diarisation?

Free, private, no signup — runs right in your browser.