AI is Evolving Fast – The Latest LLMs, Video Models & Breakthrough Tools

Breakthroughs in multimodal search, next-gen coding assistants, and stunning text-to-video tech. Here’s what’s new:

Mar 03, 2025

Welcome to Visually AI!

The last couple of years have been overwhelming with the lightspeed pace of generative AI developments, but the past two weeks made me feel like I was drowning and struggling to keep up!

I have been traveling and working full-time on an incredible project (I hope to tell you more soon!), so I will try to share the news I’ve been able to keep up with :)

LLM News

Grok 3 Officially Launched

Grok 3 officially launched on February 17, 2025, after delays from its anticipated 2024 debut.

You don’t have to use it on 𝕏 - there is an iOS app here and Android coming soon.

New Features

DeepSearch: Real-time web & 𝕏 (Twitter) search for clearer, up-to-date answers.
Think Mode: Step-by-step problem-solving for complex queries.
Multimodal Abilities: Can analyze images; image generation (Aurora) is coming soon.

I have used Grok 3’s DeepSearch feature for various questions, including 3D mesh tool functionality and generative AI object removal in videos. I found it thorough, and I appreciate the ability to see the model’s thought process.

Here’s an example with my question:

“What is the average length of time for films to be released on streaming platforms after they premiere in theaters?”

Claude 3.7 Sonnet & Claude Code

Claude 3.7 Sonnet, Anthropic’s latest AI, just dropped this week and it’s a big deal with its new “extended thinking” mode that tackles tough tasks like coding and math step-by-step. People have been using it for things such as pairing it and the new experimental Claude Code with AI coding tools like Cursor or Bolt for coding, debugging, and full-stack app development.

Claude 3.7 Sonnet is available now on Claude.ai.

Benchmark table comparing frontier reasoning models — Image credit: https://www.anthropic.com/news/claude-3-7-sonnet

OpenAI GPT-4.5

OpenAI’s GPT-4.5 hit the scene a few days ago, as a research preview, bringing a beefier knowledge base and smoother chats for tasks like writing and coding.

It will be available first to ChatGPT Pro users, with a rollout to others soon after. It features real-time search and file uploads, but don’t expect voice or video support just yet.

Image credit: https://openai.com/index/introducing-gpt-4-5/

Alibaba Wan-2.1 video goes open source

The latest Chinese video model is incredible with complex motion and prompt adherence with amazing video quality with text to video and image to video capabilities.

It’s now open source and available on the native support in ComfyUI, Hugging Face, Replicate, Fal, and more.

You can also try the model quickly on the Alibaba Cloud Tongyi platform (with a Chinese phone number), Freepik, Nim, Krea, Imagine, and developer platforms such as Replicate, Fal, and Hugging Face.

I’ll write more about Wan 2.1 soon, but here’s a great example of the text to video results - upscaled from 480p and sound added on Nim:

📽️ Video Model Comparison

I do these comparisons frequently to measure the improvements in different models for text or image to video prompts. I hope it is helpful for you, as well!

I included 6 models for an image to video comparison:

Pika 2.1 (I will do one with Pika’s new 2.2 model soon)
Adobe Firefly Video
Runway Gen-3
Kling 1.6
Luma Ray2
Hailuo I2V-01

This time I used an image generated with Magnific's new Fluid model ( Google DeepMind's Imagen + Mystic 2.5 ), and the same prompt in each model x 2 generations and chose my favorites below:

"Low-angle dolly shot along the temple's edge, capturing the contrast between classical architecture and wild nature. Orange-tinted waves crash against rocks below as crimson clouds streak across the sky. Shot on anamorphic lenses, ultra-detailed textures, dramatic atmospheric lighting."

📸 AI Snapshots

Google announced Gemini Coding Assist is available for developers worldwide for free, with a personal Gmail account. It includes the highest available usage limits and code review assistance - install in VS Code, GitHub, or JetBrains IDEs.

Pika introduced Pikaswaps, which lets you easily change areas of videos with a text prompt and painting or describing the area you want to change. You can also upload an image for a specific object reference. I’ve had great results with describing the area and replacement with text. Here’s an example where I described a “yellow background” to change to "A [a beach at sunset] visible in the background":

Now you can use an image reference in Runway Frames. This feature is useful because it generates a detailed prompt from your image and you can capture different styles using Frames style presets and adjusting the Aesthetic Range for more variety in results. Here is a quick example using my Midjourney image in Frames:

I was honored to be included in a recent showcase from Kling AI with several other Creator Partners:

I2V-01 Director Mode available to everyone now on Hailuo AI. It gives you precise camera control with a range of motion presets or your own prompt descriptions with image to video generations. Here is my example with a Midjourney image and prompt: "[Push in] on the fierce samurai [Shake]"

🌎 AI Developments All Over The Globe

Google Veo 2 Integration in YouTube Shorts: Google rolled out DeepMind’s Veo 2, a state-of-the-art video generation model, to YouTube Shorts creators in select countries (U.S., Canada, Australia, New Zealand). It generates 1080p cinematic clips from text prompts, with enhanced realism and SynthID watermarking, outpacing competitors like Sora in some human evaluations.

ByteDance’s OmniHuman-1 and Phantom Unveiled: ByteDance introduced OmniHuman-1, an AI model that creates lifelike videos from a single photo, trained on 18,700+ hours of video data, excelling in motion and audio sync. Alongside it, their "Phantom" system ensures subject consistency across video clips, sparking both excitement and deepfake concerns.

UK AI Regulation on Child Abuse Imagery: The UK Home Office introduced laws banning possession or distribution of AI tools generating child sexual abuse imagery, a global first. Triggered by a quadrupling of cases in 2024, it underscores the urgency of tackling generative AI’s dark side.

🚀 My Recent Top AI Tools Picks

a0.dev: Quickly create mobile apps with React in minutes by chatting with the AI builder.

Demo: Zero-Shot Depth Estimation with DPT + 3D Point Cloud: This Hugging Face demo is a variation from the original DPT Demo. It uses the DPT model to predict the depth of an image and then uses 3D Point Cloud to create a 3D object.

Imagine: AI platform has Kling, Hailuo, Runway, Alibaba’s Tongyi, and Hunyuan Video - plus model training and more.

Nim: Full AI image and video platform with tools I’ve been using more and more, like MMAudio sound, video upscale, lip sync, video inpainting, and others.

Midjourney Style Reference Codes w/ Examples

Included:

• 33 --sref Codes

• 396 Downloadable Images

• 132 Prompts

DOWNLOAD YOUR GUIDE!

🖼️ Image Prompts

Prompt: White ceramic coffee cup on rustic wooden table, steam rising, scattered coffee beans, morning light casting long shadows. Professional product photography, epic cinematic lighting

White ceramic coffee cup on rustic wooden table, steam rising, scattered coffee beans, morning light casting long shadows. Professional product photography, epic cinematic lighting — Generated on Midjourney

Prompt: Dawn breaks over a misty Pacific Northwest forest. Wide establishing shot from elevated position, ancient Douglas firs pierce through rolling fog banks. Golden sunbeams filter through branches. Cinematic depth, moody atmosphere, shot on ARRI digital.

Dawn breaks over a misty Pacific Northwest forest. Wide establishing shot from elevated position, ancient Douglas firs pierce through rolling fog banks. Golden sunbeams filter through branches. Cinematic depth, moody atmosphere, shot on ARRI digital. — Generated with FLUX Pro Ultra

Prompt: isometric view of van Gogh starry night painting

🎥 Video Prompts

Image to Video Prompt: Intimate portrait sequence in a gothic cathedral at magic hour. Camera gracefully orbits a young woman with intricate braids and leather jacket, moving through pools of golden light cast by ancient chandeliers. Dramatic depth of field highlights architectural details in the soaring vaulted background. Cinematic lighting, precise focus pull, ultra-smooth motion.

Kling AI 1.6

Image to Video Prompt: Slow dolly shot. Camera gracefully moves from behind a hooded figure meditating on the edge of a cyberpunk cityscape. Brilliant fuchsia and cerulean sunset bathes everything in otherworldly light. Holographic symbols spiral outward as the figure gestures, while massive crystalline towers gleam in the distance. Hyper-detailed photorealism, dreamlike atmosphere.

Luma Ray2 Img2Video

Text to Video Prompt: [Tracking shot, Pedestal down] Following a falcon as it dives through a narrow canyon, wings nearly touching the red rock walls [Pull out]. Dynamic motion, epic scale.

Hailuo AI I2V-01 Director Mode

Thank you for reading.

Have a creative week!