This week in AI: 3D from images, video tools, and more

From 3D worlds to consistent characters, explore this week’s AI trends

Heather Cooper

Dec 09, 2024

Total reading time around 5 minutes.

Welcome to Visually AI!

🔮AI News this Week

Another busy AI news week, so I organized it into categories:

Image to 3D
AI Video
AI Image Models & Tools
AI Assistants / LLMs
AI Creative Workflow: Luma AI Boards

Don’t worry - not too much detail!

Image to 3D

Google DeepMind's Genie 2: Create interactive 3D worlds from a single 2D image, offering potential applications in game development and virtual environments.

World Labs: Brand new image to 3D world model lets you navigate inside a 2D image.

These new image to 3D products might sound the same, but they have some key differences:

I’ve had early access to test World Labs and it’s incredible. I uploaded images generated by Luma’s Photon model to create an immersive scene that I could explore virtually. You can record your movements in the environment with keyframes and download the video in a first person point of view.

I ran some results through Runway Gen-3 video-to-video tool to add style to my 3D experience:

AI Video

Runway Act One Character Reference: New update lets you transpose your character’s performance directly onto existing videos without the need to reshoot. It’s another step towards character consistency, scene continuity, and efficiency.

Example from Runway:

Tencent's Hunyuan Video: Tencent launched an open-source AI model named Hunyuan Video, which supports text-to-video and image-to-video generation, along with other features like facial performance capture. You can try the text to image model on Fal and Replicate, or download to install locally.

Here’s a few of my results - text prompts only:

Hailuo MiniMax: I2V-01-Live model creates smooth and vivid motion in 2D illustrations. I tried it on my Midjourney image, generated a voice with Hailuo’s new text to voice tool, and lip synced with Runway’s Lip Sync (Hailuo doesn’t have a lip sync feature yet):

AI Image Models & Tools

Luma’s Photon: New text-to-image model in their new Dream Machine platform, offering 800% faster image generation and reduced costs. Photon supports precise text rendering, multi-image prompting, and character consistency from single references. Available in Dream Machine and API.

Some of my Photon results:

Magnific AI Mystic Editorial Portrait: New model generates extremely detailed, realistic portraits - available on Magnific.

Magnific Editorial Portraits:

Leonardo AI Flow State: New image model features infinite generation from a single prompt with options to customize styles to your preference.

Here’s an example of results I got with a single prompt using Flow State:

AI Assistants / LLMs

ChatGPT Pro: Priced at $200 per month, offers unlimited access to OpenAI's most advanced models—including o1, o1-mini, GPT-4o, and Advanced Voice - and features o1 pro mode, which utilizes additional computing power to enhance responses for complex queries.

Microsoft Copilot Vision: Now in limited preview for Pro subscribers, allows the AI assistant to access and interact with the content of web pages you're viewing in Edge, providing contextual assistance and insights.

Anthropic’s MCP: Anthropic's Model Context Protocol (MCP) is an open standard that enables AI assistants to securely and efficiently connect with various data sources, including integration with GitHub, web search, local file management, and database analysis. All Claude users can install pre-built MCP servers though the Claude Desktop app (I did this the other day, using the quickstart guide).

💡AI Creative Workflow: Luma AI Dream Machine Boards

I generated a 30-second continuous video on the new Dream Machine.

Everything in the same board with Photon-generated images, ideas from Luma, keyframes and extended videos.

My process:

I started on an existing board with a similar theme and asked for a futuristic spaceship:

I brainstormed and asked for more details on the ship:

Then I asked for alien planet landscapes:

Finally I used a previously generated image in the same board as a reference to combine with the landscape images for a consistent look I animated some of the images throughout the process, and used a few to extend with keyframes:

I love being able to build on the same theme uninterrupted, to brainstorm and modify styles. I can quickly generate a new image or video using previous ones as references.

To use images from your Board as keyframes, drag the image onto the keyframe box and select "Keyframe" to extend the video:

Here’s the full video with music from Epidemic Sound:

🛠️ This Week’s AI Tools

Runner H: AI agent that navigates web interfaces and acts out your instructions. (link)

PageOn: AI-powered presentation builder. (link)

Consistent Character AI: Generate character sheets with full pose control. (link)

Codeium: A free AI coding assistant designed for individual developers, featuring an experimental chat mode leveraging GPT-4 for interactive coding assistance, providing immediate feedback and suggestions. (link)

Image to 3D Asset with TRELLIS: Microsoft's 3D model that generates high-quality 3D assets in formats like Radiance Fields, 3D Gaussians, and meshes. (Hugging Face demo link)

Lummi: Free stock photos and royalty free images, including curated picks from AI artists. (link)

📱 Recently On 𝕏

Pinned by Comfy UI:

Midjourney Style Reference Codes w/ Examples

Included:

• 33 --sref Codes

• 396 Downloadable Images

• 132 Prompts

DOWNLOAD YOUR GUIDE!

🖼️ Image Prompts

Prompt: A red sports car speeding along a brightly lit urban roadway at night. The foreground focuses on a smartphone capturing the car as it races past, showing a sharp and vibrant image of the scene. Surrounding neon streaks and modern architecture enhance the sense of motion and energy.

A red sports car speeding along a brightly lit urban roadway at night. The foreground focuses on a smartphone capturing the car as it races past, showing a sharp and vibrant image of the scene. Surrounding neon streaks and modern architecture enhance the sense of motion and energy. — Generated with FLUX1.1 [pro] Ultra

Prompt: A man wearing a sleek black turtleneck and tailored chinos, leaning on a barstool in a retro café, with soft, diffused lighting emphasizing the texture of his clothes and warm tones in the atmosphere.

A man wearing a sleek black turtleneck and tailored chinos, leaning on a barstool in a retro café, with soft, diffused lighting emphasizing the texture of his clothes and warm tones in the atmosphere. — Generated with FLUX1.1 [pro] Ultra

🎬Video Prompt

Prompt: POV racing down a snowy mountain slope, powder spraying into lens, weaving between snow-laden pine trees, dramatic mountain peaks ahead. Dynamic motion, natural lighting

Here’s a great comparison of 4 models, using a text prompt only:

Thank you for reading.

Have a creative week!

Discussion about this post

Ready for more?