AI Suite
AI Generate cluster
Text-to-image (classical composer on Lite, SD-Turbo diffusion on Pro), image-to-image, inpaint, outpaint, ControlNet (Sobel/depth), style transfer, variations — plus bring-your-own ONNX on Pro.
The Generate cluster covers the "create new pixels from a prompt" surface. Every tool runs under the Lite tier with classical / procedural algorithms — no model download, no server round-trip — and the headline generators additionally have a real learned path on the Pro tier.
Honest note: classical generation is not photoreal. The procedural text-to-image composer produces atmospheric scenes derived from the palette your prompt evokes — beautiful as backdrops and mood references, but it will not render "a golden retriever on a beach" the way Stable Diffusion does. For photoreal generation, opt into the Pro tier: on a WebGPU device, text-to-image runs SD-Turbo (a real diffusion model, ~1 GB, downloaded once and cached) entirely in your browser. You can also point the Pro tier at your own hosted ONNX model URL, which runs via onnxruntime-web. When no GPU / Pro opt-in is present, the tools fall back to the classical composer rather than failing.
The 15 tools
Headline generation (6).
- AI Scene Designer (text-to-image) — procedural palette + gradient composition seeded by your prompt. ~200 ms, 0 MB.
- AI Generate Background — scoped variant with a scene-quality wrapper around your prompt so the output composites cleanly behind cutouts.
- AI Image-to-Image — palette-transfer stylisation: bias an existing image's colours toward what your prompt evokes.
- AI Inpaint — patch-match content-aware fill in a masked region. Same classical algorithm Adobe shipped as Content-Aware Fill for 13 years.
- AI Outpaint — mirror-reflect canvas extension with edge feathering. Real and useful for modest extensions; seam-free at the join.
- AI Variations — N alternates via seed-permuted
procedural composition. Optional filter axis (
lighting,colour,composition).
Conditioned generation (4).
- AI Sketch-to-Image — Sobel edge detection on your sketch + palette-transfer stylisation at low strength to preserve composition.
- AI ControlNet Pose — strong-edge silhouette from your reference + palette-prompted backdrop. (True MediaPipe pose detection is planned alongside the rigs library.)
- AI ControlNet Depth — Xenova depth-anything (~50 MB) drives a depth-respecting palette colourisation.
- AI ControlNet Edge — Sobel edge map preserves composition while the palette completely changes the look.
Style transfer + structured (5).
- AI Style Transfer — palette-transfer driven by your free-text style prompt. Fast (200–500 ms) and predictable.
- AI Cartoonify — bilateral smooth + posterise + edge overlay with four sub-variants (flat, comic, anime, pixar-like).
- AI Photo-to-Painting — five painting medium recipes (watercolor, oil, pencil, pastel, charcoal), each a different classical filter chain.
- AI Logo / Icon Generator — procedural composition
- centred radial alpha mask + optional palette bias. Six style modes.
- AI Sticker Generator — turn any cutout into a Telegram / Discord / Slack-ready sticker with auto-padding, rounded corners, optional sheen.
Tier required
Lite tier covers the entire cluster — 0 MB download for every tool except ControlNet Depth (Standard tier, ~50 MB for depth-anything).
For learned generation, the Pro tier is live: on a WebGPU device, text-to-image runs SD-Turbo in-browser (~1 GB, cached after first download), and you can supply your own hosted ONNX model URL to run any compatible model via onnxruntime-web. The heaviest specialist tools (AnimateDiff, Wav2Lip-style lipsync) still surface a clear "needs a hosted model URL" message rather than pretending to work, so you always know whether you're getting the classical or the learned path. Manage Pro opt-in and downloaded models from the Model Manager (database icon, top-right).
What classical generation does well vs. doesn't
| Task | Classical handles it | Needs learned model |
|---|---|---|
| Atmospheric backdrop / mood scene | ✓ | — |
| Composite-friendly empty backdrop | ✓ | — |
| Palette-shift stylisation | ✓ | — |
| Cartoon / pencil / watercolor / oil / charcoal looks | ✓ | — |
| Inpainting small holes / object removal | ✓ | — |
| Mirror canvas extension | ✓ | — |
| Logo / icon / sticker post-process | ✓ | — |
| Photoreal "generate a dog on a beach" | — | ✓ (Pro tier) |
| Style transfer with reference image content preserved | partial | ✓ (Pro tier) |
| Coherent video frame-by-frame style | — | ✓ (Pro tier) |
What stays out of scope
The plan's generation matrix has honest exclusions even at the Pro tier:
- Stable Video Diffusion (SVD) image-to-video — model size + VRAM
- CogVideoX text-to-video — decoder isn't WASM-friendly
- True video-to-video coherent restyle at > 4 fps — per-frame inference is prohibitive
- 3D model generation from a single image — TripoSR / LRM not WebGPU-compatible
These appear in the FAQ as "needs cloud GPU compute we don't operate" and would require a server-side path that violates our free-forever / no-envs constraint.