AI Suite
AI Conversational cluster
Edit by natural-language prompt, click-to-replace objects, restore old photos, "looks like" editorial recipes. Vision-grounded LLM tool-calling.
The Conversational cluster is the "thin layer of intelligence over everything else" tier. It doesn't ship its own ML model — instead it combines vision (cluster 1) + the WebLLM tier + the full registry of existing commands into a natural-language editing experience.
When you type "make this look like sunset" into the AI panel, what happens is:
- The vision worker runs AI Describe on the current canvas to ground the LLM in what's actually visible.
- The visionContext helper builds a structured snapshot — dimensions, active filter, background type, layer count, detected subject category.
- The WebLLM (Phi-3 mini or Llama 1B or Qwen 0.5B, your tier) gets the user prompt + grounded context + the tool list.
- The LLM picks a sequence of registry commands and parameters.
- The command executor runs them in order, with each result flowing to the next step where applicable.
The whole chain runs in your browser. The pre-loaded LLM is what makes this feel responsive — there's no network round-trip per step.
The four tools
AI Edit by Prompt. The headline experience. Free-text editing instructions. Examples that work today:
- "Make it sunset" → temperature +30, contrast +15, filter Cinematic 60%.
- "Remove the person standing behind the subject" → vision identifies the person, R-SAM masks them, inpaint regenerates the area.
- "Prep this for Amazon" → cluster 1 audit against Amazon spec, then fix the failing checks one at a time (remove background, crop to fill frame, set white background, save as JPG).
- "Crop for Instagram square" → smart-crop to 1:1 keeping subject centred.
If your instruction is too vague ("make it better"), the LLM returns a clarifying question rather than guessing.
AI Object Replace. Click an object, type what to replace it with. The three-step chain (vision identifies → R-SAM masks → inpaint regenerates) is composed into a single user-facing capability. Results land as new image layers so the original is preserved.
AI Restore Old Photo. A purpose-built recipe
that chains denoise → face-restore → optional colorize → mild sharpen
→ light tone-map. The recipe tunes intermediate strengths based on what
the vision model detects: heavy noise → stronger denoise; faded sepia →
enable colorize; sharp scan → skip colorize. Five recipe levels: light,
full, restore-and-colorize, family-portrait, document-scan.
AI Looks Like. Curated editorial-style recipes
that go beyond freeform Auto-Grade. Eight presets — apple-ad,
vogue-cover, national-geographic, pinterest-aesthetic,
wedding-magazine, travel-blog, food-magazine, tech-review.
Each preset is a hand-tuned grade + filter + optional crop hint so the
output is consistent rather than random.
Vision-context grounding
The conversational layer is only as good as its grounding. The
visionContext helper (src/lib/ai/visionContext.ts) builds a structured
snapshot every time the AI panel invokes a conversational command:
Editor: image
Canvas: 1920x1080
Has background mask: yes
Background: lifestyle
Filter: cinematic @ 60%
Adjust: b=110 c=120 s=95 t=20
Layers: 3 (active: image)
Vision: A young woman in a navy turtleneck stands in front of a window…
That block goes into the LLM's system prompt alongside the tool catalog. You can disable grounding (faster, less accurate) from the AI panel settings.
Cross-cutting platform features
The Conversational AS-phase also ships a set of background AI behaviours that improve the editor regardless of which tool you're using:
- AI command auto-complete — as you type in the AI panel, suggestions surface from the registry + tool guidance KB.
- AI undo prediction — when you undo, the AI suggests likely next operations based on the history.
- AI workflow recorder — record manual steps; offer to convert them into a saved recipe + share via collab session.
- AI batch suggester — when you do the same operation 3 times in a row, prompt "Apply to all 12 items in your queue?"
- AI accessibility audit — runs at export time and surfaces alt-text
- contrast + content warnings before download.
Tier required
Lite tier covers the entire cluster today. Object Replace uses the classical patch-match inpaint pipeline; Restore Photo chains the classical denoise + face-enhance + colourise modules. For richer learned generation, the Pro tier is live: GFPGAN face restoration runs as a managed ONNX model on a WebGPU device, and you can bring your own ONNX model URL (e.g. an SD-based inpaint checkpoint) to run via onnxruntime-web. Without a GPU / Pro opt-in, the classical path is used automatically.