OpenAI's Sora for AI video generation: Here's what you need to know

Feb 19, 2024

Total reading time around 6 minutes.

Welcome to Visually AI!

🔮AI News This Week

This week was a rollercoaster of generative AI news, but OpenAI’s reveal of their ground-breaking text-to-video tool eclipsed just about everything else!

🎬Sora: OpenAI’s Text-to-Video Model

Meet Sora, the latest AI video model from OpenAI:

What makes Sora unique?

• Videos up to 1 minute

• Complex scenes with multiple characters

• Can maintain character & scene consistency

Open AI’s technical report goes much deeper into the mechanics of Sora’s video generation capabilities. You can read more details here.

How does Sora work?

Sora creates videos by first compressing raw video into a lower-dimensional latent space and then decomposing it into spacetime patches, similar to how large language models use text tokens.

“Turning visual data into patches” - OpenAI Sora Research Paper

It uses a diffusion transformer approach to predict "clean" patches from noisy input, allowing it to generate videos of various durations, aspect ratios, and resolutions.

You can see the different aspect ratios on the clips in the video below:

Sora can be prompted with text, images, or videos to create content that accurately follows user prompts, making it versatile for different video editing tasks.

Improved language understanding for text input

OpenAI used a “re-captioning technique” used with DALL•E 3 in Sora for improved language understanding, after being trained on “highly descriptive video captions.”

Here is an example of the results from this method. You can see the underlined words and phrases can be mixed and matched for different output:

Transforming videos with text prompts

OpenAI used SDEdit to give Sora the ability to change the style and setting of the original video in real time.

The technical report allows you to click the text prompt above the video output on the right side:

Seamless transitions between two videos

Sora can transition between two, separate input videos to create a new output with a combination of styles and composition from each.

This is an example - the middle video is a combination of the first and last video:

Availability and more examples

Sora is not available to the public, while OpenAI allows “red teamers to assess critical areas for harms or risks.” But, they have been sharing additional mind-blowing videos, including public requests on 𝕏.

This video was posted by Aditya Ramesh on 𝕏:

Prompt: "pov footage of an ant navigating the inside of an ant nest"

(Note: people commented the ant doesn’t have 6 legs, and that is true. I’m sharing this as an example of what Sora can generate based on that extremely short prompt.)

Sora can generate multiple scenes simultaneously - side-by-side:

Credit: Bill Peebles on 𝕏, who stated, “This is a single video sample from Sora. We didn't stitch this together; Sora decided it wanted to have five different viewpoints all at once!”

Sora can generate videos based on a sequence of events. It’s not close to perfect, but it’s pretty good.

Another example from Bill Peebles on 𝕏 where Bill explained: “For this video, I asked that a golden retriever and samoyed should walk through NYC, then a taxi should stop to let the dogs pass a crosswalk, then they should walk past a pretzel and hot dog stand, and finally they should end up looking at Broadway signs.”

Sora’s limitations

OpenAI noted several weaknesses with Sora, including difficulty simulating accurate physical movement and characteristics of complex scenes, or misinterpreting specific types of cause and effect.

What does this mean for the future of video and film?

I have no experience in media production, but I do have extensive experience in AI animation and video technology.

I think Sora is incredible and this is just the beginning of a new era of generative AI.

I do not think it can replace human involvement in filmmaking or video production, but I think it opens a world of possibilities for a variety of things most of us can’t imagine.

For regular, everyday people who want to generate a quick video or animate a still image - this technology will allow more people to tell their stories.

I have always been an advocate for increasing access to these tools and making people aware of their existence and how they can use them.

But, I know this technology is improving rapidly and there is a huge potential for misinformation, deep fakes, and untold negative outcomes. I will try to stay informed and share what I know with you, from my perspective.

For a quick recap, this was my first reaction to Sora, on Instagram:

hb.coop_

A post shared by @hb.coop_

🎙️Generative AI Friday Recap Space on 𝕏

This week we talked about Sora and Google’s Gemini Pro at my Generative AI Recap Space on 𝕏.

You can check it out on my new YouTube Channel, where I’m posting the full recordings and individual clips from our fantastic weekly discussions.

You can listen to it on 𝕏 here.

I host the Generative AI Friday Recap every week at 5 p.m. EST, and you can check here for details: Spaces Dashboard

Introducing Facemix 🎭

Ever imagined seeing yourself or friends in your creations?

Now’s your chance. Generate an image, tap Facemix, and remix it with any face you desire for stories that truly reflect you. It's about personalizing your images like never before.

Try Facemix! Make it You - Add your face to any image, now on Remix!

Create, share, and remix AI. Download the app for free: REMIX

You could have your AI service, tool, or event seen by Visually AI’s community of over 8,800 subscribers:

Advertise with me

🚀 This Week’s AI Tools

ElevenLabs: Generate sound effects with a text prompt with the new Sound Effects feature. (waitlist signup)

Frame: Open-source AI glasses powered by OpenAI, Perplexity, and Whisper with an integrated multimodal AI assistant. Available for pre-order ($399) in three colors. (pre-order link)

Deforum Studio: Generate amazing 2D/3D Deforum videos on a sleek, easy-to-use web app or in Discord. Now, you can upload images to use with text prompts to transfer the style and composition to your video. (link)

Easy-Peasy AI: Create custom bots for specific purposes using GPT-3.5 Turbo, Llama 2, Claude Instant, and Mixtral-8x7B. You can share the Bot publicly via URL or embed it to a website as a widget. (link)

📺Visually AI on YouTube

I’m thrilled to share my YouTube channel, Visually AI is continuing to grow from your support and I truly appreciate it!

It’s up to 417 subscribers with 34 videos ready for you to enjoy and learn. Thank you to everyone who has subscribed and I welcome those who haven’t yet!

You can subscribe here.

I’ve been adding tutorials and prompt demonstrations, like this Creative upscaling Niji 6 with Magnific AI:

🎁 Get it free: The AI Visual Creator’s Toolkit

Boost your content with my all-in-one, free visual AI toolkit!

Access AI-powered tools for AI-generated images, image editing, and more: Get your toolkit

📸 Free Gift: Realistic Photography Cheat Sheet

Prompt like a pro with easy-to-understand photography terms and example images to guide you as a reference for amazing photorealistic images.

Download your free guide: Photography Cheat Sheet

Thanks for reading, and have a creative week!