Under the trend of artificial intelligence shifting from technological exploration to technological application, space intelligence, as a new direction that integrates multimodal large-scale models, virtual reality, and other cutting-edge technologies, has shown great potential.
On August 29th, Caixin reported (Reporter Cui Ming) that the evolution of artificial intelligence is transitioning from the initial stage of technological exploration to the stage of technological application. Under this trend, space intelligence, as a new direction that integrates multimodal large-scale models, virtual reality, and other cutting-edge technologies, has shown great potential and commercial value.
During the 2024 Shenzhen (International) General Artificial Intelligence Conference, Wu Bangyi, Chief Data Officer of Tianyu Digital Technology, said in an interview with Caixin reporters that the next stage of artificial intelligence is to achieve General Artificial Intelligence (AGI), and the greater productivity unleashed by AGI lies in the manufacturing sector. The development of space intelligence is the key to bringing AGI from desktop to industrial applications.
The following is an interview transcript (with some modifications):
Caixin: With the widespread application of AI technology, what do you think will be the next stage of development for artificial intelligence?
Wu Bangyi: Artificial intelligence is currently in a stage of rapid development and extensive application, while also actively exploring the possibility of achieving higher levels of general intelligence. There is a general consensus on the next stage of artificial intelligence, which is to achieve AGI. At that time, AI's cognitive, understanding, driving, and decision-making abilities can match or even surpass humans. However, at present, AGI is mainly concentrated in desktop applications such as content creation, customer service, and programming, and its application in the industrial field is rare.
We believe that the development opportunities of the next stage of artificial intelligence lie in the new industrial revolution, and the greater productivity unleashed by AGI lies in the manufacturing sector.
Caixin: Compared to AGI desktop applications, why is industrial application relatively scarce? How can we bridge the gap in spatial computing in industrial scenarios?
The essential reason why artificial intelligence is scarce in industrial scenarios is that industrial scenarios are three-dimensional spaces, while the majority of current large models are two-dimensional models such as language, graphics, and videos. There is a spatial computing gap when applying them in industrial scenarios. Compared to 2D intelligence, 3D spatial intelligence has a more comprehensive perception, understanding, interaction, and decision-making capabilities in understanding the real world, redefining the relationship between humans, machines, the real world, and the virtual world, and has stronger generalization and emergent characteristics.
On one hand, by capturing depth information through 3D data, it enables AI to have a more accurate understanding of the physical world's shape, structure, and position of people and objects, thus generating more realistic scenes and providing more intuitive visual effects, making the virtual world more realistic. On the other hand, spatial intelligence can perform advanced reasoning on visual information in three-dimensional scenes like humans, surpassing the limitations of two-dimensional vision, making the real world smarter, and bringing disruptive changes to multiple industries, especially embodiment intelligence, intelligent manufacturing, and low-altitude economy.
Therefore, developing spatial intelligence is the key to bringing AGI from the desktop to industrial applications.
CaiLian News: Can you elaborate on how 3D spatial intelligence will integrate with industries such as embodiment intelligence, intelligent manufacturing, and low-altitude economy?
Wu Bangyi: If we say that spatial intelligence is the key to bringing AGI from the desktop to industrial applications, then 3D large models are the key to developing spatial intelligence.
3D large models provide support for multi-modal data fusion, spatial computing, complex scene processing, enhanced interaction, and 3D generative AI, among other aspects, and are key to promoting the development of spatial intelligence technology.
In the field of embodiment intelligence and humanoid robots, combining 3D multi-modal large models with robot technology enables robots to not only have cognitive levels such as understanding, memory, and reasoning, but also to recognize and understand the real 3D physical world, and have autonomous decision-making, action, and operational capabilities in work scenarios.
In the field of intelligent manufacturing, by combining 3D large models with multi-source heterogeneous 3D data of humans, machines, objects, and the environment, the entire production process can be reconstructed in 3D, enabling accurate analysis, cross-comparison, bottleneck identification, and assistance in management decision-making. This improves the efficiency of production, manufacturing, and warehousing logistics, reduces costs, and helps with industry upgrading and innovative models.
In the field of low-altitude economy, integrating 3D models with aircraft technology enables aircraft to intelligently perceive and recognize, navigate and avoid obstacles during flight activities. By reconstructing the 3D of the natural environment, flight activities, and infrastructure in the low-altitude airspace, a spatial intelligence system is built to solve the problems of weak perception, low intelligence, and high application cost in low-altitude management.
Finance and Securities: What attempts and layouts has Tianyu Digital Technology made in the construction of spatial intelligence? How is the progress at present?
Wu Bangyi: So far, Tianyu Digital Technology has been laying out the field of spatial intelligence for more than 3 years, from the initial introduction of AI digital humans to the current 3D models and spatial intelligence MaaS platform. We are steadily advancing the innovation of spatial intelligence technology and application scenarios.
The company integrates the wise Q&A model with 3D datasets and visual algorithms to build a spatial intelligence MaaS platform, achieving intelligent analysis of cross-type data. By adopting the "1+1+N" mode, we have built the largest high-quality 3D dataset in the country and developed a domestically produced high-performance 3D spatial intelligence model. Through DaaS and MaaS models, we have realized applications in multiple scenarios, such as embodied intelligence, humanoid robots, intelligent manufacturing, and low-altitude economy.
The MaaS platform brings together two core functions. The first is AI+3D visualization, which can provide various digital products and services such as XR virtual scenes, game development, AI customer service, and intelligent digital humans for diverse industries. It is now widely used in industries such as cultural tourism, exhibitions, finance, education, film and television, and games. The second is AI+3D dataset, which can provide AI data services for vertical model training, data analysis, and embodied intelligence for enterprises.
Currently, the spatial intelligence MaaS platform has provided 3D virtual scenes and AI digital human interactive services to multiple enterprise clients such as China Daily, Inner Mongolia Alxa TV station, Yunnan Agricultural Vocational College, Wufangzha, TeeMall, Yang Guofu, and China Resources Snow Breweries.
Finance and Securities: What challenges does the widespread application of 3D spatial intelligence face in the industrial field? What strategies does Tianyu Digital Technology have?
Wu Bangyi: First of all, it is important to emphasize that the 3D multimodal model heavily relies on large-scale, high-quality 3D data training. 3D datasets are crucial for providing realistic characters and scenes, enriched interactive experiences, and decision support.
However, at present, 3D datasets face some challenges in terms of both quality and quantity, with a scarcity of high-quality 3D datasets being a pain point for the industry. Globally, 3D data is extremely important and yet extremely scarce.
Secondly, the acquisition and processing costs of 3D data are relatively high, involving high-end devices such as depth cameras for data collection, as well as tedious data processing work. This not only requires a significant amount of time, manpower, and financial investment, but also the low standardization of 3D data results in poor data compatibility, making sharing and reusability difficult and hindering the development of spatial intelligence technology.
3D datasets are becoming the core nodes of competition, determining the development of 3D multimodal large models and spatial intelligence, as well as the trend of global technological competition in the visible future.
We have already formed an integrated solution in the stages of data collection, storage, management, research, and utilization. The spatial intelligence MaaS platform captures detailed 3D data by collecting diverse data types, using advanced equipment such as LightStage ultra-high precision light field scanning, handheld depth cameras, etc.
At the same time, the platform also utilizes cutting-edge technologies like NeRF, 3D Gaussians, to generate high-quality 3D models from scan data, videos, images, and even text, making virtual scenes and objects more realistic. These data are integrated with other modal information such as images, text, forming a comprehensive, multi-dimensional dataset.
Currently, the platform has over 0.8 million sets of 3D data, 0.35 million sets of multimodal data, establishing a massive resource advantage.