AI Suite

AI Suite overview

90 client-side AI tools, organised into 5 clusters. How tiers work, why it's private, and which capability to reach for in which situation.

The AI Suite is a set of 90 capabilities that bring image and video editing power normally found in cloud apps directly into your browser. Captioning, restoration, generation, conversational edit, and video AI — all running on your device. Nothing transmits to a server.

The five clusters

Understand (6 tools). Vision models describe what's in an image, answer questions about it (document QA), extract text via OCR, generate accessibility-grade alt text, audit against specs like the Amazon main-image requirements, and narrate the differences between two images. Powered by Xenova ViT-GPT2, Xenova TrOCR-base-printed, and Xenova Donut-base-cord-v2 running through Transformers.js with WebGPU acceleration.

Enhance (7 tools). Classical baselines that ship today with 0 MB download: face enhancement via CLAHE + bilateral + unsharp, bilateral denoise, direction-aware unsharp deblur, sepia + palette-bias colourise, relight to match a target scene or preset, HDR tone-mapping, and an auto-grade that interprets a vibe word ("cinematic", "moody noir", "Apple keynote") into a concrete adjustment recipe.

Generate (~15 tools). Procedural and classical generation surfaces: text-to-image via prompt-keyword palette composer, palette-transfer image-to-image, patch-match content-aware inpaint, mirror-reflect outpaint, edge / depth ControlNet, classical style transfer + cartoonify + photo-to-painting recipes, variations, logo generation, and the lifestyle-scene composer.

Conversational Edit (4 tools). Type any editing instruction — "make it sunset", "remove the person standing behind", "prep this for Amazon" — and the assistant grounds the request against your current canvas using the Understand cluster's caption, then chains the right registry commands through the command executor.

Video AI (8 tools). Auto-subtitles via Whisper, silence-trimming via Silero VAD, scene detection, auto-highlights from any long clip, per-frame captioning, smart reframing (horizontal-to-vertical via subject tracking), per-frame style transfer with temporal smoothing, audio denoising via RNNoise.

How the tiers work

Nothing downloads until you opt into a tier. Models are cached in your browser the first time you use one (Cache API); the cache survives across sessions, so subsequent runs are instant. Delete anything from the Model Manager (database icon, top-right).

TierDownloadCapabilities unlocked
LiteNone (0 MB)Runs instantly on any device, even phones: classical denoise / deblur / colorize / face-cleanup / relight / HDR / auto-grade, the procedural text-to-image composer + inpaint + outpaint + cartoonify + photo-to-painting + sticker + logo, smart-crop, quality score, A/B variations, classical scene / highlight / reframe heuristics
Standard~400 MB (aggregate, downloaded per model on demand)Real ML models that run offline after one download: background removal, CLIP tags / categorize / similarity, captioning (ViT-GPT2 / BLIP), OCR, document Q&A, depth estimation, Whisper subtitles, and pose / segmentation. Runs on most laptops; a GPU helps, with WASM fallback
Pro (WebGPU)~2 GB+GB-scale generative + restoration on a WebGPU GPU: SD-Turbo text-to-image and GFPGAN face restoration (managed ONNX, downloaded once), plus bring-your-own ONNX model URLs for custom SD / SDXL / restoration checkpoints via onnxruntime-web. Falls back to the classical path where a GPU isn't available

Use the Tier Manager (top-right of the AI Suite hub or the "Tiers" button in the AI Suite sidebar tab) to opt in. The manager requests persistent storage so the cache isn't evicted under browser-quota pressure.

Where to access each capability

Every AI tool surfaces in three places:

  1. Standalone landing page at /ai/<slug> — a focused single-purpose surface for that tool, with preset chips for quick-start prompts.
  2. In-editor sidebar tab — the "AI Suite" accordion section in both the image and video editor properties panels. Cluster sub-sections expand to reveal capability cards with preset chips that invoke through the AI panel.
  3. Floating canvas toolbar — the AI Suite dropdown (✦), AI Describe (👁), AI Enhance (✨), AI Quick-Generate (🪄), and AI Smart-Crop / Smart-Reframe (✂ / ⤢) buttons surface the most-used capabilities one click away.

You can also invoke any AI tool by typing a natural-language prompt into the AI panel — the parser matches the prompt against the registry and routes to the right capability with the right parameters.

Why client-side

Everything runs in your browser. There are no API keys, no envs, no paid integrations, and no servers we operate that ever receive your images or video. The same constraint that made the original BG remover private now covers every AI capability.

Trade-off: classical / procedural generation is fast (~200 ms per pass) but isn't photoreal — for "generate me a photo of X" you'd go to Midjourney or cloud Flux. We trade that ceiling for absolute privacy: your photos and footage never leave your device. The About page covers the architectural reasoning in more depth.

Related articles