① Recently, NVIDIA launched the physical AI model Cosmos, which is capable of predicting environments such as warehouses and road conditions to train Siasun Robot&Automation; ② According to NVIDIA's disclosed list, the first batch of users for Cosmos includes manufacturers such as 1X, Agility, Figure AI, and Xiaopeng Autos; ③ Brokerages believe that in the collection methods of training data for humanoid robots, synthetic data will greatly promote the development of robots.
The Star Daily reported on January 8th that $Alphabet-A (GOOGL.US)$ / $Alphabet-C (GOOG.US)$ that, OpenAI, $Microsoft (MSFT.US)$ and other Global top Technology companies are Bullish on embodied Asia Vets, which is rapidly approaching its ChatGPT moment.
Recently, $NVIDIA (NVDA.US)$ At the CES speech, Huang Renxun officially launched the physical AI large model Cosmos. According to reports, this model enables developers to generate physics-based videos based on combinations of inputs such as text, images, and videos, as well as robot sensors or motion data, allowing for predictions about real-world environments (such as warehouses, factories, traffic conditions, etc.), thereby facilitating the training of robots and autonomous vehicles.
The so-called physical AI large model refers to a world foundation model that can understand the language of the world, physical characteristics, spatial positions, and other elements, and synthesize relevant physical data. It is key to accelerating the popularization of Smart Automobiles, embodied intelligence, and other AI terminals. Compared to the leapfrog progress of large language models like ChatGPT, world models are still in a relatively early stage, generally facing issues such as high development costs and inability to consistently adhere to physical rules.
It is worth mentioning that the Cosmos released by NVIDIA will be made available in open source. According to the disclosed list, the first batch of users includes more than a dozen domestic and international robot and automobile manufacturers such as 1X, Agile Robots, Agility, Figure AI, Foretellix, Fourier, Galbot, Hillbot, IntBot, Neura Robotics, Skild AI, Virtual Incision, Waabi, and Xpeng Motors.
In fact, NVIDIA's attempt to train robots using realistic physical environments dates back to June 2024 when it utilized the simulation framework RoboCasa to provide thousands of 3D models across over 150 object categories and dozens of interactive Furniture and appliances. The related experiments proved the effectiveness of synthetic physical data in robot training.
Jensen Huang stated, "The world foundation model is the basis for advancing the development of robots and Smart Automobiles, but not all developers possess the expertise and resources required to train models independently. We created Cosmos to popularize physical AI, allowing every developer to access general robot technology."
As of now, several companies have launched world foundation models. On December 5, 2024, Google released the large foundation world model Genie2, capable of generating relatively realistic 3D worlds; in September of the same year, 1X Technologies released a humanoid robot world model that can simulate future scenarios of robots performing different actions.
Additionally, video generation models are also seen as one of the pathways to world foundational models. In the field of video generation, Sora and Runway have both expressed their aspirations to venture into world models. HTSC points out that video generation and world models share many similarities; both encode and compress data obtained from the complex external world into lower-dimensional vectors and learn this knowledge in the spatiotemporal dimensions using Transformers or other models to achieve predictions.
Institutions point out that inspired by text large models, humanoid robots are also beginning to construct embodiment large models, with the primary focus on solving data issues. Autonomous driving can be simplified to 2D motion in 3D space, while robots involve 3D motion in 3D space and need to include information like force and touch. Therefore, theoretically, the data volume required by robots is higher than that for autonomous driving. Currently, the collection of training data for humanoid robots mainly relies on three methods:
Collecting real machine data, such as when a person wears motion capture suits; this method produces high-quality data but has high collection costs and slow speeds.
Using simulated environments to generate synthetic data for training robots.
Capturing motion data based on existing internet videos; although there is no need to build a simulation physics engine, it involves complex coordinate transformations and lacks dimensions related to force and touch.
Institutions believe that among the three methods mentioned above, synthetic data will greatly promote robot development. The academic community has already proven the feasibility of the above methods, and the robot brain has entered its ChatGPT moment.
Is the robotics industry expected to bounce back? Make good use of it."Investment Theme"Function, capture investment opportunities!
编辑/jayden