① According to media reports, Google has defined a new paradigm of generative AI — Genie, a basic world model with 11 billion parameters that can generate playable interactive environments through a single image. ② Zheshang Securities released a research report stating that as overseas manufacturers continue to accelerate their deployment in the multi-modal AI field, a new wave of multi-modal AI is expected to arrive at an accelerated pace.
According to media reports, Google has defined a new paradigm of generative AI - Generative Interactive Environments (Genie, Generative Interactive Environments). Genie is a basic world model with 11 billion parameters that can be prompted by a single image to generate playable interactive environments. GenieAI is a basic world model using internet video training. It can generate an infinite number of playable (motion-controllable) worlds from composite images, photos, and even sketches. It is widely used, can be used to generate entire interactive worlds from images or text, and is an advantageous tool for training future general-purpose AI agents.
Zheshang Securities released a research report stating that recently, Nvidia announced the establishment of the GEAR Laboratory to lay out the fields of multi-modal AI and embodied intelligence. Additionally, stability.ai released the Stable Diffusion 3 model with stronger Wensheng capabilities, and announced the open public beta of the Wensheng video application Stable Video. As overseas manufacturers continue to accelerate their deployment in the multi-modal AI field, a new wave of multi-modal AI is expected to arrive at an accelerated pace.
According to the Finance Federation's theme library, among the relevant listed companies:
Insight Group's InsightGPT has functions such as Wenshengwen, intelligent video editing, and Tusheng video, and is currently developing Wensheng video functions. Under existing technical frameworks such as Tucson Video, InsightGPT can currently generate videos of 20 seconds or more.
Wanxing Technology's large canopy model has features such as multimedia, vertical solutions, computing power data, and application localization to empower global content creators to express their creativity with smarter and more immersive functional effects and product experiences.