The shift from prompting to agentic workflows
→ Four tools that now plan the work for you + a first look at a brand new image model
Welcome back to Visually AI!
Today’s reading time is 6 minutes.
In today’s edition:
Agent workflows on Runway, Gemini Omni, MiniMax Hub & Lovart
A first look at Krea 2
Image & Video Prompts
Agent workflows quietly replacing the prompt box
If you have been keeping up to date with releases lately, you will have noticed multiple platforms switching towards an agent workflow.
They stop asking the user to write prompts. Instead, describe the outcome you want; and an agent plans the steps, picks the right models, and assembles the asset.
Of course, you can still choose to prompt yourself, add references if necessary (which I encourage) but it is an interesting development lately in the world of AI Image & Video.
Here are some of the platforms and how they have implemented it:
Runway Agent
Runway’s headline launch of the year. It sits on top of the full Runway stack: Gen-4.5, Aleph, Act-Two, the audio apps; and turns the whole thing into a conversation. You describe a multi-scene video in one message. The agent proposes the concept, locks visual direction, generates the shots, picks the right model for each scene, lays in all audio and hands you the final cut.
The workflow:
- Upload your assets
- Describe the type of video you want
- Choose aspect ratio, duration, audio preference, resolution
If you don’t provide assets, the agent generates additional ones that fit your theme. You can make edits throughout the chat and after generation, including timeline edits, so you’re not locked to the first cut.
Pair it with Ad Concepter (the March launch) when you need 3–5 ad directions before committing to a full build.
Example result:
Gemini Omni
On May 19 at Google I/O, DeepMind released Gemini Omni — their first step toward a model that can create anything from any input, starting with video.
What you can do:
Define a character once. Place them in any scene and they stay consistent across locations, actions and lighting.
Apply styles, motion, or effects by feeding in reference images — or just describing it in natural language.
Reimagine your own footage. Take a video you shot, ask Omni to change the environment, add new objects, or transform the action entirely.
The first model in the family, Gemini Omni Flash, is live now in the Gemini app, Google Flow, and YouTube Shorts. API access rolls out in the coming weeks.
MiniMax Hub
The Hub is a native desktop app (Mac and Windows) that does everything you’d usually do across five browser tabs in one window. Image gen, video gen, scripts, voiceover and edits all inside a single creative agent that learns your process. Local files stay on your machine.
The interface gives you two views on the same project and you toggle between them:
- Free Canvas — drag, arrange, iterate visually. The mental space for exploration.
- Workflow — the agent auto-connects nodes by reference, so every asset traces back to its source. The mental space for production.
Another bonus is parallel sessions across tabs. Run a video gen in one tab. Draft a script in another. Reference files across both. The agent doesn’t lose context on either, and you can see how many tasks are running in the top right of the window.
See the agent in action:
Lovart
Lovart opened to everyone in April. A proprietary engine routes across 20+ image and video models: Nano Banana Pro for character consistency and precise edits, Seedream 5 Lite for fast base generation, Flux 2 for artistic and physical-realism styles, GPT-Image-2 for text-heavy work, Veo 3.1 and Seedance 2.0 for video. You never pick (unless you want too). The agent picks.
The workflow runs on an infinite ChatCanvas in three modes: Talk (describe the intent), Tab (browse the variant grid), Tune (Spot edits — text as layers, Touch Edit for hands and object removal).
The new PDF ingest means you can drop a brand guidelines doc and it’ll pull palette, fonts and tone automatically. Particularly useful for sponsor work where the brief lands as a deck. In addition, they also have skills inside, for you to choose from.
Here is a result from a Lovart agent workflow for a product ad:
Krea 2 has launched
On May 12, Krea shipped Krea 2. This is their first foundation image model built in house. They were clear: no baked-in aesthetic.
Midjourney, Flux, Seedream; every major model has an opinion on what “good” looks like. Krea 2 refuses one. The headline feature is Moodboards: upload your reference images and Krea 2 blends both their style and their underlying concepts into your generation.
The workflow inverts: prompts get shorter, references do the work. Build a moodboard for palette, one for lighting, one for composition, one for subject; then write a short prompt and let the boards bias the model. Moodboards are shareable via public link, which makes them properly collaborative.
They also have style transfer implemented into the model. Have a style you like but can’t quite name? Krea 2 have you covered by just dragging and dropping.
In Contra Labs’ style-fidelity benchmark, Krea 2 Large landed within 0.14 points of GPT Image 2.
🖼️ Image Prompts
Prompt: Human figure dissolving into vertical pixel sort columns, neon magenta and cyan, CRT scanlines, chromatic aberration, dark void background
Prompt: Stocky alien mechanic character, full body, vibrant cel-shaded illustration, clean line art, comic book style, flat colours, stark background
🎥 Video Prompts
Prompt: Photorealistic underwater sci-fi archaeology film set entirely inside a single colossal submerged temple ruin beneath Veridion. The environment is an enormous flooded sanctuary with towering gold-and-stone pillars, coral-covered statues, bioluminescent algae, drifting sediment, schools of fish, and soft cyan caustic light rays descending from the distant ocean surface above. The entire sequence remains in this one connected location for stronger spatial continuity and immersion.
Main character: Nala, a laced human marine biologist and underwater archaeologist. She is visibly non-human. Distinct adaptive gill structures along both sides of her neck rhythmically open and close while breathing underwater. Thin glowing biomechanical lines run beneath her skin along her jawline, neck, and spine. Subtle illuminated circuitry pulses softly under the skin when she scans the ruins or approaches the relic. No scuba tank. She breathes naturally underwater through her engineered gills. She wears a sleek dark blue aquatic exploration suit with lightweight utility harness, compact tools, integrated tech nodes, and small bioluminescent accents.
00:00–00:03 — Wide cinematic reveal of the submerged temple sanctuary. Nala slowly swims between gigantic pillars and ancient statues while schools of fish pass through the structure. Camera drifts behind her as sunlight beams ripple through the water. Her gills visibly expand and contract while breathing underwater.
00:03–00:06 — Medium tracking shot as Nala studies glowing inscriptions on a massive pillar using a holographic scanner. Close-up moments reveal translucent biomechanical lines beneath the skin of her neck and face softly pulsing cyan. Tiny bubbles escape naturally from her gill structures while suspended particles drift through the water.
00:06–00:09 — Nala enters the temple’s central chamber. The architecture grows more elaborate with enormous statues, coral growth, and glowing algae woven into the ruins. Ancient mechanisms faintly activate as she approaches the center. Cyan energy spreads across the floor and columns.
00:09–00:12 — In the heart of the sanctuary, Nala discovers a glowing relic resting inside a circular pedestal surrounded by statues. As she reaches toward it, her laced circuitry brightens beneath her skin, synchronizing with the relic’s energy pulses. Sediment rises from the floor and fish scatter through shafts of light.
00:12–00:15 — Close-up of Nala holding the relic underwater. The glow illuminates her face, revealing the full detail of her adaptive gills and integrated laced features. Ancient bioluminescent energy flows through the surrounding architecture as the temple slowly awakens. Final slow pullback reveals Nala suspended in the vast flooded sanctuary surrounded by reactivated ruins.
Natural underwater motion with realistic buoyancy, drag, drifting hair movement, suspended particles, refracted light, and cinematic volumetric caustics throughout. Tone is mysterious, grounded, ancient, and hopeful. Avoid fantasy armor, dry interiors, or human scuba-diver behavior. Nala should feel evolved specifically for underwater life.
Image Reference:
Result:
🎨 Explore my Portfolio
I’ve had some amazing opportunities to work on a variety of projects recently, and I don’t share everything in this newsletter or online. And you know I love testing tools to share.
Take a look at my portfolio to get a quick glimpse of my work:
🚀 My Recent Top AI Tools Picks
Krea 2 - for a new moodboard-led and style transfer image generation model
Runway Agent - for a one-prompt multi-scene video
MiniMax Hub - desktop friendly app for agentic workflows using multiple models
Thank you for reading.
I hope you have a creative week!
Heather Cooper







