Claude computer use & Genmo's Mochi 1 video debut

Claude’s beta unlocks real-time computer tasks

Heather Cooper

Oct 25, 2024

Total reading time around 6 minutes.

Welcome to Visually AI!

There were so many updates and launches this week, I lost track and wasn’t able to cover them all.

Some of the things I’ll cover in this issue:

Claude computer use
Genmo’s new Mochi 1 video model
Stable Diffusion 3.5, Ideogram’s Canvas
Midjourney’s new image editor
ChatGPT web search
Krea Video Extend

I’ll try to catch up next week with Runway’s Act-One, Canva’s Droptober surprises, and several things I missed!

🔮AI News this Week

Claude’s computer use

Image credit: Anthropic.com/news/3-5-models-and-computer-use

Anthropic's Claude now features a "computer use" capability, enabling it to perform tasks on a user's computer in real-time. This experimental feature is in public beta and designed to handle routine activities like scheduling, form completion, and web browsing.

Key Features

Real-time interaction: Executes tasks directly on a user's computer, navigating interfaces, typing, and browsing.
Automated workflows: Aims to enhance productivity by managing routine tasks autonomously.
Beta stage: Requires user oversight and suggested security measures like virtual machines and domain restrictions.
Potential errors: May misinterpret clicks or data retrieval, reinforcing the need for human supervision.
Positioning: Part of a broader trend towards AI agents capable of managing complex workflows with minimal intervention.

Updated Sonnet & new Haiku

Anthropic announced an upgraded Claude 3.5 Sonnet and the new Claude 3.5 Haiku. Sonnet’s most noticeable improvement is in coding, while Haiku matches the performance of Claude 3 Opus. (link)

Genmo Mochi 1 AI Video

Genmo’s Mochi 1 is a state-of-the-art, open-source video generation model designed for high-quality outputs with strong prompt adherence.

It’s freely available under the Apache 2.0 license, enabling both personal and commercial use. With backing from major investors like NEA, WndrCo, and The House Fund, Mochi 1 aims to close the gap between open and proprietary video models.

It offers smooth motion dynamics, with a 480p base version accessible via Genmo’s Playground. Enhanced versions and new features, including higher resolution, are expected soon.

Try Mochi 1 now in the Playground. Code available on GitHub, and preview on Hugging Face.

Here’s a few of my first results:

💻 AI Mastery at Your Own Pace

I get it - learning AI can be a lot. New tools every day, new terms to understand, and it can get overwhelming fast.

That’s why my one-on-one AI training sessions are designed to meet you where you are, no matter what your starting point is. We’ll go at your pace, whether you want to create stunning images, train a model, or just learn how to make your workflow smoother.

Want a training that's about you and for you? Book a consultation.

1-1 AI Visual Training

📸AI Snapshots

You can search the web inside ChatGPT with the GPT-4o model, with realtime results, cited sources, and web previews:

Stability AI released its latest image models, Stable Diffusion 3.5 Large and Stable Diffusion 3.5 Large Turbo, with the Medium version scheduled for release on October 29th. These models are free for commercial and non-commercial use under Stability AI Community License.

You can use Stable Diffusion 3.5 Large on Replicate, Fal, Hugging Face, Civitai, or download the code on GitHub.

Wide shot of a cityscape at night, portrait of a person at sunset with golden light, mid-century modern house at sunset — Generated on Stabile Diffusion 3.5 Large

ComfyUI announced a new, one-click desktop app called ComfyUI V1! Supports Windows (running NVIDIA), macOS (Apple silicon), and Linux. It will include the ComfyUI Manager, automatic updates, and it’s only 200MB. An open beta will be rolling out soon - sign up on the waitlist here.

Sebastian Kamph has a great tutorial and walkthrough of the new ComfyUI on YouTube.

Google announced new NotebookLM customization features for Audio Overviews to guide the AI hosts level of expertise and focus for the conversation. Businesses can apply for the NotebookLM Business pilot program here.

NotebookLM is such a great tool and I continue to find ways to use it. In this example, I used the transcript from a 1-on-1 AI training session to instruct the AI hosts to talk about video prompting and editing, and my client’s ongoing projects and goals. I shared the recording with them as a resource:

Averi’s AI Marketing Manager launched after a year in stealth, and it Integrates advanced AI (AGM-1), with a vetted expert network to streamline and enhance all aspects of marketing strategy and execution within a single, efficient platform. The technology was built by a veteran team from Google, Twitter, OpenAI and BCG, and raised a $2m pre-seed round from Right Side and Singularity Capital.

And it’s free with a Pro plan coming later this year, including an API. (link)

Ideogram launched Canvas in beta - create, edit, and expand images seamlessly with Magic Fill and Extend:

Midjourney's new image editor allows users to upload and modify images by editing, expanding, or retexturing them with text prompts. It’s currently available to members who have generated at least 10,000 images, annual subscribers, and those who have been monthly subscribers for the past 12 months.

Here’s an example using the Retexture feature to change the surfacing, lighting, and materials in my image’s composition:

Krea AI launched the Video Extend feature, which lets you upload videos and images to extend with AI. It has starting and ending frames, and you can even combine videos:

🛠️ This Week’s AI Tools

ElevenLabs Voice Design: Finally, you can generate custom voices based on your text prompt with specific ages, tones, styles, etc. If you are like me, you might need to delete voices from your current library to generate new, custom voices. (link)

STORM: Stanford University’s free, public app automates comprehensive research and report generation using AI to create Wikipedia-style articles with citations from web sources. (link)

STORM is incredible. I asked it to write a research paper about the use of AI in DNA analysis and predicting genetic traits - got a comprehensive article with citations:

Napkin AI: Transforms your text into visual charts, graphs, and diagrams. (link)

I just discovered and used Napkin AI for the first time and mind completely blown:

🖼️ Image Prompts

Prompt: Cosmetic serum bottle with a metallic cap, reflecting sunset hues of orange, pink, and purple. Set on a glossy surface with a colorful gradient in the background.

Prompt: Colorful, iridescent fish swimming near the ocean surface, with sunlight rays penetrating the water. Scales glisten in blue, purple, orange, and pink hues. Clear water with light reflections, creating a vibrant, shimmering effect.

🎬Video Prompt

Prompt: Low-angle shot of a towering skyscraper at night, with dramatic lighting emphasizing its imposing presence

Sometimes, you can get great results with no prompt. If you have an image you’d like to animate, but don’t know how to describe what you want to see - try it with no prompt and let the AI do its work!

These are results with my Midjourney image and no prompt:

Thank you for reading, and have a creative week!