AI Suite

AI Conversational cluster

Edit by natural-language prompt, click-to-replace objects, restore old photos, "looks like" editorial recipes. Vision-grounded LLM tool-calling.

The Conversational cluster is the "thin layer of intelligence over everything else" tier. It doesn't ship its own ML model — instead it combines vision (cluster 1) + the WebLLM tier + the full registry of existing commands into a natural-language editing experience.

When you type "make this look like sunset" into the AI panel, what happens is:

  1. The vision worker runs AI Describe on the current canvas to ground the LLM in what's actually visible.
  2. The visionContext helper builds a structured snapshot — dimensions, active filter, background type, layer count, detected subject category.
  3. The WebLLM (Phi-3 mini or Llama 1B or Qwen 0.5B, your tier) gets the user prompt + grounded context + the tool list.
  4. The LLM picks a sequence of registry commands and parameters.
  5. The command executor runs them in order, with each result flowing to the next step where applicable.

The whole chain runs in your browser. The pre-loaded LLM is what makes this feel responsive — there's no network round-trip per step.

The four tools

AI Edit by Prompt. The headline experience. Free-text editing instructions. Examples that work today:

  • "Make it sunset" → temperature +30, contrast +15, filter Cinematic 60%.
  • "Remove the person standing behind the subject" → vision identifies the person, R-SAM masks them, inpaint regenerates the area.
  • "Prep this for Amazon" → cluster 1 audit against Amazon spec, then fix the failing checks one at a time (remove background, crop to fill frame, set white background, save as JPG).
  • "Crop for Instagram square" → smart-crop to 1:1 keeping subject centred.

If your instruction is too vague ("make it better"), the LLM returns a clarifying question rather than guessing.

AI Object Replace. Click an object, type what to replace it with. The three-step chain (vision identifies → R-SAM masks → inpaint regenerates) is composed into a single user-facing capability. Results land as new image layers so the original is preserved.

AI Restore Old Photo. A purpose-built recipe that chains denoise → face-restore → optional colorize → mild sharpen → light tone-map. The recipe tunes intermediate strengths based on what the vision model detects: heavy noise → stronger denoise; faded sepia → enable colorize; sharp scan → skip colorize. Five recipe levels: light, full, restore-and-colorize, family-portrait, document-scan.

AI Looks Like. Curated editorial-style recipes that go beyond freeform Auto-Grade. Eight presets — apple-ad, vogue-cover, national-geographic, pinterest-aesthetic, wedding-magazine, travel-blog, food-magazine, tech-review. Each preset is a hand-tuned grade + filter + optional crop hint so the output is consistent rather than random.

Vision-context grounding

The conversational layer is only as good as its grounding. The visionContext helper (src/lib/ai/visionContext.ts) builds a structured snapshot every time the AI panel invokes a conversational command:

Editor: image
Canvas: 1920x1080
Has background mask: yes
Background: lifestyle
Filter: cinematic @ 60%
Adjust: b=110 c=120 s=95 t=20
Layers: 3 (active: image)
Vision: A young woman in a navy turtleneck stands in front of a window…

That block goes into the LLM's system prompt alongside the tool catalog. You can disable grounding (faster, less accurate) from the AI panel settings.

Cross-cutting platform features

The Conversational AS-phase also ships a set of background AI behaviours that improve the editor regardless of which tool you're using:

  • AI command auto-complete — as you type in the AI panel, suggestions surface from the registry + tool guidance KB.
  • AI undo prediction — when you undo, the AI suggests likely next operations based on the history.
  • AI workflow recorder — record manual steps; offer to convert them into a saved recipe + share via collab session.
  • AI batch suggester — when you do the same operation 3 times in a row, prompt "Apply to all 12 items in your queue?"
  • AI accessibility audit — runs at export time and surfaces alt-text
    • contrast + content warnings before download.

Tier required

Lite tier covers the entire cluster today. Object Replace uses the classical patch-match inpaint pipeline; Restore Photo chains the classical denoise + face-enhance + colourise modules. For richer learned generation, the Pro tier is live: GFPGAN face restoration runs as a managed ONNX model on a WebGPU device, and you can bring your own ONNX model URL (e.g. an SD-based inpaint checkpoint) to run via onnxruntime-web. Without a GPU / Pro opt-in, the classical path is used automatically.