
谷歌发布“基础世界模型”Genie 多模态AI浪潮有望加速到来

Google releases “basic world model” Genie The multi-modal AI wave is expected to arrive at an accelerated pace ·  Feb 28 07:46

① According to media reports, Google has defined a new paradigm of generative AI — Genie, a basic world model with 11 billion parameters that can generate playable interactive environments through a single image. ② Zheshang Securities released a research report stating that as overseas manufacturers continue to accelerate their deployment in the multi-modal AI field, a new wave of multi-modal AI is expected to arrive at an accelerated pace.

According to media reports, Google has defined a new paradigm of generative AI - Generative Interactive Environments (Genie, Generative Interactive Environments). Genie is a basic world model with 11 billion parameters that can be prompted by a single image to generate playable interactive environments. GenieAI is a basic world model using internet video training. It can generate an infinite number of playable (motion-controllable) worlds from composite images, photos, and even sketches. It is widely used, can be used to generate entire interactive worlds from images or text, and is an advantageous tool for training future general-purpose AI agents.

Zheshang Securities released a research report stating that recently, Nvidia announced the establishment of the GEAR Laboratory to lay out the fields of multi-modal AI and embodied intelligence. Additionally, released the Stable Diffusion 3 model with stronger Wensheng capabilities, and announced the open public beta of the Wensheng video application Stable Video. As overseas manufacturers continue to accelerate their deployment in the multi-modal AI field, a new wave of multi-modal AI is expected to arrive at an accelerated pace.

According to the Finance Federation's theme library, among the relevant listed companies:

Insight Group's InsightGPT has functions such as Wenshengwen, intelligent video editing, and Tusheng video, and is currently developing Wensheng video functions. Under existing technical frameworks such as Tucson Video, InsightGPT can currently generate videos of 20 seconds or more.

Wanxing Technology's large canopy model has features such as multimedia, vertical solutions, computing power data, and application localization to empower global content creators to express their creativity with smarter and more immersive functional effects and product experiences.

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment