Google has unveiled Lumiere, a new AI model for creating videos that makes use of the Space-Time-U-Net, or STUNet, dispersion model. Lumiere does not composite tiny still frames; rather, it produces 5-second videos in a single step. With this technology, objects in a video can be identified both spatially and temporally, as well as their relative positions.
The Google researchers wrote in a report, “We introduce Lumiere — a text-to-video diffusion model designed for synthesising videos that portray realistic, diverse, and coherent motion — a pivotal challenge in video synthesis.”
“Within a single model pass, we introduce a Space-Time U-Net architecture that generates the entire temporal duration of the video at once,” the authors said.
Many content creation chores and video editing applications, such as image-to-video, video inpainting, and stylized generating, are made easier by the design. With Lumiere, you may create videos in particular styles by utilizing a reference image, convert still photos to videos, apply consistent video editing with text-based prompts, and create cinemagraphs by animating specific areas of an image.
The five-second, 1024 x 1024 pixel films that the AI model produces, according to the Google researchers, are referred to as “low-resolution.”
Moreover, Lumiere produces 80 frames as opposed to Stable Video Diffusion’s 25 frames.
“We believe that it is crucial to develop and apply tools for detecting biases and malicious use cases to ensure a safe and fair use,” the paper’s authors stated. “There is a risk of misuse for creating fake or harmful content with our technology.”