AI Suite

AI Generate cluster

Text-to-image (classical composer on Lite, SD-Turbo diffusion on Pro), image-to-image, inpaint, outpaint, ControlNet (Sobel/depth), style transfer, variations — plus bring-your-own ONNX on Pro.

The Generate cluster covers the "create new pixels from a prompt" surface. Every tool runs under the Lite tier with classical / procedural algorithms — no model download, no server round-trip — and the headline generators additionally have a real learned path on the Pro tier.

Honest note: classical generation is not photoreal. The procedural text-to-image composer produces atmospheric scenes derived from the palette your prompt evokes — beautiful as backdrops and mood references, but it will not render "a golden retriever on a beach" the way Stable Diffusion does. For photoreal generation, opt into the Pro tier: on a WebGPU device, text-to-image runs SD-Turbo (a real diffusion model, ~1 GB, downloaded once and cached) entirely in your browser. You can also point the Pro tier at your own hosted ONNX model URL, which runs via onnxruntime-web. When no GPU / Pro opt-in is present, the tools fall back to the classical composer rather than failing.

The 15 tools

Headline generation (6).

  • AI Scene Designer (text-to-image) — procedural palette + gradient composition seeded by your prompt. ~200 ms, 0 MB.
  • AI Generate Background — scoped variant with a scene-quality wrapper around your prompt so the output composites cleanly behind cutouts.
  • AI Image-to-Image — palette-transfer stylisation: bias an existing image's colours toward what your prompt evokes.
  • AI Inpaint — patch-match content-aware fill in a masked region. Same classical algorithm Adobe shipped as Content-Aware Fill for 13 years.
  • AI Outpaint — mirror-reflect canvas extension with edge feathering. Real and useful for modest extensions; seam-free at the join.
  • AI Variations — N alternates via seed-permuted procedural composition. Optional filter axis (lighting, colour, composition).

Conditioned generation (4).

  • AI Sketch-to-Image — Sobel edge detection on your sketch + palette-transfer stylisation at low strength to preserve composition.
  • AI ControlNet Pose — strong-edge silhouette from your reference + palette-prompted backdrop. (True MediaPipe pose detection is planned alongside the rigs library.)
  • AI ControlNet Depth — Xenova depth-anything (~50 MB) drives a depth-respecting palette colourisation.
  • AI ControlNet Edge — Sobel edge map preserves composition while the palette completely changes the look.

Style transfer + structured (5).

  • AI Style Transfer — palette-transfer driven by your free-text style prompt. Fast (200–500 ms) and predictable.
  • AI Cartoonify — bilateral smooth + posterise + edge overlay with four sub-variants (flat, comic, anime, pixar-like).
  • AI Photo-to-Painting — five painting medium recipes (watercolor, oil, pencil, pastel, charcoal), each a different classical filter chain.
  • AI Logo / Icon Generator — procedural composition
    • centred radial alpha mask + optional palette bias. Six style modes.
  • AI Sticker Generator — turn any cutout into a Telegram / Discord / Slack-ready sticker with auto-padding, rounded corners, optional sheen.

Tier required

Lite tier covers the entire cluster — 0 MB download for every tool except ControlNet Depth (Standard tier, ~50 MB for depth-anything).

For learned generation, the Pro tier is live: on a WebGPU device, text-to-image runs SD-Turbo in-browser (~1 GB, cached after first download), and you can supply your own hosted ONNX model URL to run any compatible model via onnxruntime-web. The heaviest specialist tools (AnimateDiff, Wav2Lip-style lipsync) still surface a clear "needs a hosted model URL" message rather than pretending to work, so you always know whether you're getting the classical or the learned path. Manage Pro opt-in and downloaded models from the Model Manager (database icon, top-right).

What classical generation does well vs. doesn't

TaskClassical handles itNeeds learned model
Atmospheric backdrop / mood scene
Composite-friendly empty backdrop
Palette-shift stylisation
Cartoon / pencil / watercolor / oil / charcoal looks
Inpainting small holes / object removal
Mirror canvas extension
Logo / icon / sticker post-process
Photoreal "generate a dog on a beach"✓ (Pro tier)
Style transfer with reference image content preservedpartial✓ (Pro tier)
Coherent video frame-by-frame style✓ (Pro tier)

What stays out of scope

The plan's generation matrix has honest exclusions even at the Pro tier:

  • Stable Video Diffusion (SVD) image-to-video — model size + VRAM
  • CogVideoX text-to-video — decoder isn't WASM-friendly
  • True video-to-video coherent restyle at > 4 fps — per-frame inference is prohibitive
  • 3D model generation from a single image — TripoSR / LRM not WebGPU-compatible

These appear in the FAQ as "needs cloud GPU compute we don't operate" and would require a server-side path that violates our free-forever / no-envs constraint.