share_log

Morgan Stanley's Heavyweight Robotics Yearbook (II): Robots "Escape from the Factory", Training Focus Shifts from "Brain" to "Body", Edge Computing Power Poised for Explosion

wallstreetcn ·  Dec 15, 2025 21:25

Morgan Stanley recently highlighted that AI-driven robots are undergoing a historic shift from factory floors to broader application scenarios, with training priorities transitioning from traditional cognitive abilities to physical manipulation skills. This change is expected to trigger an explosive growth in demand for edge computing.

On December 15, according to HardAI News, in its latest report, "The Robot Almanac (Volume II)," Morgan Stanley pointed out that the global robotics industry is experiencing two key transformations: the expansion of robot applications from factories to unstructured environments such as homes, cities, and even space; and the shift in training focus from the traditional AI 'brain' (general models) to the 'body' (physical motion control).

Morgan Stanley noted that this transformation will drive a surge in demand for edge computing capabilities, with real-time inference chips, simulation technologies, and robotic sensors potentially becoming core investment themes. The report emphasized that the complexity of the physical world—such as controlling grip strength when grasping objects or navigating dynamic environments—is pushing technological approaches from 'pure software optimization' toward 'hardware-software synergy,' while distributed edge computing may reshape the global computing infrastructure landscape.

Morgan Stanley predicts that by 2050, 1.4 billion robots will be sold globally, driving edge AI computing demand to the equivalent of millions of B200 chips, thereby reshaping the distribution of global computing infrastructure.

Robots 'Escaping Factories': From Structured Confinement to Complex Real-World Environments

Traditional industrial robots (Pre-AI Robotics) were confined to the 'structured cages' of factories: performing single tasks (e.g., repetitive assembly), operating in controlled environments (fixed production lines), and requiring no perception or learning capabilities.

Morgan Stanley noted that AI-empowered next-generation robots are breaking these limitations and beginning to enter homes, farms, city streets, deep seas, and even outer space—examples include autonomous vehicles navigating crowded roads, service robots picking up objects in households, and drones inspecting complex terrains.

The report uses the example of 'grabbing a bottle from the refrigerator' to illustrate the challenges of the physical world:

What appears to be a simple action for humans actually involves multiple variables, including precise finger positioning, body balance adjustments, grip strength control (too tight crushes the object, too loose causes it to drop), and the impact of environmental humidity on friction.

Morgan Stanley pointed out that this means robots must possess real-time perception, dynamic decision-making, and fine motor control capabilities, rather than relying solely on pre-programmed instructions.

Shift in Training Paradigm: From 'Brain' Optimization to 'Body' Control

The report notes that early robot training focused on the 'brain' (AI models), such as the optimization of general vision-language models (VLM). However, Morgan Stanley emphasizes that the current bottleneck has shifted to the 'body' (physical action execution). The core challenge lies in the fact that basic human instinctual skills (e.g., walking, grasping) are extremely complex for AI (Moravec's Paradox), and these skills cannot be easily learned through internet text or image data.

According to Morgan Stanley research, unlike large language models primarily trained on text and image data, robotic models require extensive real-world physical operation data, making data collection and model training significantly more complex and costly.

The bank points out that tech giants such as Tesla, NVIDIA, and Google are collecting training data through three main approaches: teleoperation, simulation training, and video learning.

Teleoperation: Humans control robots via motion capture to mimic behaviors. However, this method is time-consuming and lacks scalability, and may gradually be replaced in the future.

Simulation: Complex scenarios (e.g., extreme weather, obstacles) are infinitely replicated in virtual environments through digital twins, combined with reinforcement learning to optimize actions. Game engine companies (e.g., Unreal Engine, $Unity Software (U.US)$ ) have been deeply involved, $NVIDIA (NVDA.US)$ with NVIDIA’s Omniverse platform being built on its accumulated gaming GPU technology.

Video Learning: Action patterns are extracted from human behavior videos (e.g., YouTube videos), enabling model training without physical interaction. $Alphabet-A (GOOGL.US)$ World models such as DeepMind’s Genie 3 and Meta’s V-JEPA 2 adopt a similar approach, capable of predicting object motion trajectories and physical interaction outcomes.

Surge in Edge Computing Demand: Real-Time Inference and Distributed Computing

As robots 'leave the factory,' the latency issues of centralized cloud computing have become prominent (e.g., autonomous driving requires millisecond-level decision-making), making edge computing an essential requirement. Morgan Stanley points out that edge computing will exhibit two major trends:

1. Proliferation of specialized edge chips

NVIDIA’s Jetson Thor serves as a typical example, functioning as an edge real-time inference device priced at approximately $3,500 per unit, adopted by companies like Boston Dynamics and $Amazon (AMZN.US)$ robotics enterprises. Its core advantage lies in achieving high computational power under low energy consumption, meeting the real-time requirements of robots (e.g., dynamic obstacle avoidance).

2. Distributed Inference Network

$Tesla (TSLA.US)$ The concept of “robots as computing nodes” has been proposed: If 100 million robots with 2,500 TFLOPS computational power were deployed globally, they could provide 125,000 ExaFLOPS of computational capacity at 50% utilization, equivalent to 7 million NVIDIA B200 GPUs (18 PetaFLOPS each). This distributed model not only reduces reliance on data centers but also enhances overall efficiency through inter-robot collaboration.

According to Morgan Stanley's forecast, global demand for edge computing in robotics will significantly increase by 2030. Various forms of robotics—such as humanoid robots, autonomous vehicles, and drones—will contribute substantially to computational power demands. By 2050, global sales of robots are expected to reach 1.4 billion units, driving the demand for edge AI computational power to the equivalent of millions of B200 chips.

Editor/KOKO

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment