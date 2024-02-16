In case the world of artificial intelligence-generated content hasn’t felt wild enough, ChatGPT developer OpenAI just unveiled a remarkable new text-to-video tool, called Sora, that can generate photorealistic video clips based on user prompts.

According to OpenAI, Sora is able to “generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world.”

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. she wears a black leather jacket, a long red dress, and black boots, and carries a black purse. she wears sunglasses and red lipstick. she walks confidently and casually.… pic.twitter.com/cjIdgYFaWq — OpenAI (@OpenAI) February 15, 2024

While not yet available to the public, OpenAI said it is putting Sora to the test with so-called “red team” users who will assess the new tool for potential harms and risk and has also granted access to a select group of users outside the company, including visual artists, designers and filmmakers, to “gain feedback on how to advance the model to be most helpful for creative professionals.”

Based on the clips published on OpenAI’s website, as well as a handful of postings on X, formerly Twitter, by OpenAI co-founder and CEO Sam Altman, who created videos based on prompts from his followers, Sora’s capabilities are nothing short of stunning.

Short videos depicting a litter of Labrador puppies frolicking in the snow, a couple strolling down a Tokyo sidewalk along a row of snow-kissed cherry trees and a brown crab staging an underwater sneak attack on an unsuspecting octopus are all rendered with startling clarity.

Introducing Sora, our text-to-video model.



Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W



Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf — OpenAI (@OpenAI) February 15, 2024

OpenAI says the Sora model has a deep understanding of language that enables it to “accurately interpret prompts and generate compelling characters that express vibrant emotions.” The platform can also create multiple shots within a single generated video that accurately persist characters and visual style, according to OpenAI.

While the videos shared by OpenAI so far make for a compelling sizzle-reel of Sora’s text-to-video capabilities, the company noted in a Thursday blog post that the platform still has a number of challenges and will certainly be mistake-prone.

OpenAI says that Sora may “struggle with accurately simulating the physics of a complex scene” and may not be able to understand, and represent, specific instances of cause and effect. An example cited by the company: a person might take a bite out of a cookie, but afterward, the cookie may not have a bite mark.

Other issues that may occur in rendering video from user prompts, according to OpenAI, include confusion over spatial details of a prompt, like mixing up directions or accurately tracking with descriptions of events that take place over time.

OpenAI said it’s building safety features into the platform, including digital watermarking that helps identify videos as having been generated by Sora, and applying some of the same user restrictions it has established for its other AI-driven tools, including the DALL-E text-to-image generator.

Those measures include a screening process that checks and rejects user prompts that violate OpenAI’s usage policies, including requests to generate images of extreme violence, sexual content, hateful imagery, celebrity likenesses or protected intellectual property. The company said it also employs a system that performs a frame-by-frame evaluation of Sora-created videos before they’re shown to the user as a double-check for usage policy compliance.

OpenAI reports that Sora’s capabilities extend beyond its text-to-video function and is able to generate video footage from an existing still image by “animating the image’s contents with accuracy and attention to small detail.” The model can also take existing video footage and extend it or fill in missing frames.

While Congress has struggled to keep legislative pace with a wide range of emerging technologies, including artificial intelligence-driven software, some administrative and regulatory actions over the past few months aim to create at least some framework to limit how the new tools can be abused.

Last October, President Joe Biden issued a wide-ranging executive order aiming to create new regulatory oversight on emerging artificial intelligence technology and build bulwarks against consumer privacy invasions, discrimination and the dissemination of false or misleading information generated by AI-powered tools.

Earlier this month, and following a spate of fake, AI-generated robocalls that spoofed Biden’s voice and went out to some New Hampshire voters just 48 hours before that state’s presidential primary, the Federal Communications Commission issued a statement noting its efforts to give state prosecutors new tools to battle fraudulent activities leveraged with artificial intelligence tools.

“AI-generated voice cloning and images are already sowing confusion by tricking consumers into thinking scams and frauds are legitimate. No matter what celebrity or politician you favor, or what your relationship is with your kin when they call for help, it is possible we could all be a target of these faked calls,” said FCC Chairwoman Jessica Rosenworcel. “That’s why the FCC is taking steps to recognize this emerging technology as illegal under existing law, giving our partners at State Attorneys General offices across the country new tools they can use to crack down on these scams and protect consumers.”