Track the latest AI trends

SIGGRAPH 2024: Nvidia's "AI box" upgraded, Jensen Huang wants to create digital replicas of the physical world.

来源：腾讯科技
作者：李海丹

北京时间7月30日，$英伟达 (NVDA.US)$在美国丹佛市举行的顶尖计算机图形大会SIGGRAPH 2024上展示了在渲染、仿真和生成式AI领域的多项最新进展。

去年的SIGGRAPH，英伟达推出GH200、L40S显卡、ChatUSD轮番登场。而今年的主角，是英伟达在生成式AI时代的新王牌——“Nvidia NIM”全新升级，并且通过NIM 将生成式 AI 应用于 USD（通用场景描述），拓宽AI在3D世界的可能性。

01 Nvidia NIM升级：既是福音，也是挑战

英伟达宣布，Nvidia NIM实现了进一步优化，并标准化了AI模型的复杂部署。NIM是英伟达在AI布局中的关键一环。黄仁勋多次对NIM带来的创新赞赏有佳，称其是“AI-in-a-Box, 本质上它就是盒子里的人工智能。”

这次升级无疑巩固了英伟达在AI领域的领导地位，成为其技术护城河的重要组成部分。

一直以来，CUDA被认为是英伟达在GPU领域建立领导地位的关键因素。借助CUDA的支持，GPU从单一的图形处理器发展成为通用的并行计算设备，使得AI开发成为可能。不过，尽管英伟达的软件生态系统非常丰富，但对于缺乏AI基础开发能力的传统行业来说，这些分散的系统仍然过于复杂和难以掌握。

为了解决这一问题，在今年3月，英伟达在GTC大会上推出了NIM（Nvidia Inference Microservices）云原生微服务，将过去几年开发的所有软件集成在一起，以简化和加速AI应用的部署。NIM可将模型作为优化的“容器”，这些容器可部署在云端、数据中心或工作站上，让开发人员能够在几分钟内完成工作，比如轻松为副驾驶、聊天机器人等构建生成式 AI 应用程序。

到现在，Nvidia布局的NIM生态系统已经可提供一系列预训练的AI模型。英伟达宣布，帮助开发者在多个领域加速应用开发和部署，并且重点在不同的领域（如理解、数字人、三维开发、机器人技术和数字生物学）中提供的具体AI模型：

Nvidia NIM（Nvidia Inference Model）提供的服务及其具体模型

理解方向，NIM可使用Llama 3.1和NeMo Retriever，提升文本数据的处理能力；数字人方向，提供了Parakeet ASR和FastPitch HiFiGAN等模型，支持高保真语音合成和自动语音识别，为构建虚拟助手和数字人类提供了强大的工具；

在三维开发方面，USD Code和USD Search等模型简化三维场景的创建和操作，帮助开发者更高效地构建数字孪生和虚拟世界；

在机器人具身方向，英伟达推出了MimicGen和Robocasa模型，通过生成合成运动数据和模拟环境，加速了机器人技术的研发和应用。MimicGen NIM 可根据 Apple Vision Pro 等空间计算设备记录的远程操作数据，生成合成运动数据。Robocasa NIM 可在 OpenUSD (一个用于在 3D 世界中进行开发和协作的通用框架)中生成机器人任务和仿真就绪环境。

数字生物学领域的DiffDock和ESMFold等模型，则在药物发现和蛋白质折叠预测方面提供了先进的解决方案，推动了生物医学研究的进展等等。

此外，Nvidia宣布Hugging Face推理即服务平台也由Nvidia NIM提供支持，在云端运行。

通过整合这些多功能模型，Nvidia的这种生态系统不仅提升了AI开发的效率，还提供了创新的工具和解决方案。不过，尽管Nvidia NIM的诸多升级对于行业确实是一大“福音”。但从另一面来看，也给程序员们带来了很多挑战。

Nvidia NIM通过提供预训练的AI模型和标准化的API，大大简化了AI模型的开发和部署过程，这对于开发者来说确实是一大福音，但是否也意味着普通程序员的就业机会未来或将进一步收缩？毕竟，企业可以用更少的技术人员完成同样的工作，因为这些任务已经由NIM预先完成，普通程序员可能不再需要进行复杂的模型训练和调优工作。

02 教AI用3D思维进行思考，构建虚拟物理世界

英伟达在SIGGRAPH大会上也展示了生成性AI在开放USD和Omniverse平台上的应用。

英伟达宣布，构建了世界上首个能够理解基于 OpenUSD（Universal Scene Description 通用场景描述）语言、几何、材料、物理和空间的生成性AI模型，并将这些模型打包为Nvidia NIM微服务。目前，在Nvidia API目录中有三个NIM可供预览：USD Code，用于回答开放USD的知识问题并生成开放USD Python代码；USD Search，允许开发者使用自然语言或图像输入搜索庞大的开放USD 3D和图像数据库；USD Validate，可检查上传文件与开放USD发布版本的兼容性，并使用Omniverse云API生成完全RTX渲染的路径追踪图像。

英伟达表示，随着Nvidia NIM微服务对OpenUSD的增强和可访问性，未来各行各业都可以构建基于物理的虚拟世界和数字孪生。通过基于开放USD的新的生成性AI和Nvidia加速开发框架，这些框架构建于Nvidia Omniverse平台之上，更多行业现在可以开发用于可视化工业设计和工程项目的应用程序，以及用于模拟环境以构建下一波物理AI和机器人。此外，新的USD连接器将机器人和工业模拟数据格式以及开发者工具连接起来，使用户能够将大规模、完全由Nvidia RTX光线追踪的数据集流式传输到Apple Vision Pro。

简而言之，通过Nvidia NIM引入USD，通过大模型更好的理解物理世界和构建虚拟世界，这是一笔非常宝贵的数字资产。举个例子，在2019年，法国巴黎圣母院遭遇严重火灾，教堂大面积被毁。庆幸的是，育碧游戏设计师曾无数次造访这座建筑物，学习它的结构，完成了对巴黎圣母院的数字复原工作，在3A游戏《刺客信条：大革命》，重现了巴黎圣母院的所有细节，也给巴黎圣母院的修复带来很大的帮助。当时设计师和历史学家用了两年的时间来复刻，但随着该技术的推出，未来针对数字副本的重现我们可以大规模的提速，通过AI来更精细化的理解和复刻物理世界。

再比如，设计师在Omniverse中构建基础三维场景，并利用这些场景调节生成性AI，实现可控和协作的内容创作过程。比如WPP与可口可乐公司率先采用这一工作流程，来扩大其全球广告活动。

Nvidia还宣布即将推出几种新的NIM微服务，包括USD Layout、USD Smart Material和FDB Mesh Generation，以进一步提升开发者在开放USD平台上的应用能力和效率。

这次NVIDIA Research携20多篇论文参会，分享涉及推动合成数据生成器和逆渲染工具发展的创新成果，其中两篇获得了技术最佳论文奖。今年展示的研究表明，AI通过提升图像质量和解锁新的3D表示方式，使模拟能力变得更好；同时，改进的合成数据生成器和更多内容也提高了AI的水平。这些研究展示了Nvidia在AI和模拟领域的最新进展和创新。

英伟达表示，设计师和艺术家现在有了新的改进方式，通过使用基于许可数据训练的生成性AI来提高生产力。比如Shutterstock（美国图片供应商），推出了其生成性3D服务的商业测试版。它仅需使用文本或图像提示，使创作者能够快速原型化3D资产，并生成360 HDRi背景以照亮场景；以及Getty Images（美国图片交易公司）加速了其生成性AI服务，使图像生成速度加倍，提高输出质量。这些服务基于多模态生成性AI架构Nvidia Edify，通过新模型速度加倍，提升了图像质量和提示准确性，让用户能够控制相机设置，如景深或焦距。用户可以在大约六秒钟内生成四张图像，并将它们放大到4K分辨率。

03 结语

在黄仁勋出现的各大场合中，他总是穿着一袭皮衣，向世界描绘AI所带来的激动人心的未来。

我们也经历着英伟达的成长，目睹英伟达一步步从游戏GPU巨头到AI芯片霸主、再到纵横AI软硬件全栈式布局，英伟达的野心十足，在AI的技术浪潮最前沿快速迭代。

从可编程着色GPU、CUDA加速计算，到Nvidia Omniverse和生成式AI NIM微服务的推出，再到推动3D建模、机器人模拟和数字孪生技术的发展，也意味着新一轮AI产业的革新来临。

不过，随着大公司拥有更多的资源，包括资金、技术和人力，能够更快地采用和实施Nvidia NIM等先进技术。而中小企业由于资源有限，可能难以跟上技术发展的步伐。再加上人才技术水平的不同，未来是否会导致更多的技术不平等加剧？

人类之理想中的AI，是帮助人类解放双手和劳动力，带给人类更高生产力的世界。但是当生产力和生产资料被少部分人掌握的时候，会不会引发带来更深层次的一场危机？这都是我们需要思考的问题。

编辑/Somer

Source: Tencent Technology 1. Huang Renxun emphasized that generative AI is growing at an exponential rate and that businesses need to adapt and utilize this technology quickly, rather than standing by and falling behind the pace of technological development. 2. Huang Renxun believes that open and closed source AI models will coexist and that companies need to leverage their respective strengths to promote the development and application of AI technology. 3. Huang Renxun proposed that the development of AI needs to consider energy efficiency and sustainability, reducing energy consumption by optimizing the use of computing resources and promoting the inference and generation capabilities of AI models to achieve more eco-friendly intelligent solutions. 4. With the constant accumulation of data and the continuous advancement of intelligent technology, customer service will become a key area for companies to achieve intelligent transformation. 5. According to foreign media reports, at the 2024 Databricks Data + AI Summit held recently, 6. Founder and CEO Huang Renxun had a fascinating conversation with Ali Ghodsi, co-founder and CEO of Databricks. The dialogue between the two parties demonstrated the importance and development trends of artificial intelligence and data processing technology in modern enterprises, emphasizing the key role of technological innovation, data processing capabilities and energy efficiency in promoting enterprise transformation and industry development. 7. Huang Renxun looked to the future of data processing and generative AI in the conversation. He pointed out that the business data of each company is like an untapped gold mine, with tremendous value but extracting deep insight and intelligence from it has always been a daunting task. 8. Huang Renxun also talked about open source models like Llama and DBRX are driving corporate transformation into AI companies, activating a global AI movement and promoting technological development and corporate innovation. Through the collaboration between NVIDIA and Databricks, the two companies will work together to leverage their respective strengths in accelerating computing and generative AI, bringing unprecedented benefits to users. 9. The following is the transcript of the conversation: 10. Moderator: I am very excited to introduce our next guest, a man who needs no introduction, the one and only global rock star CEO - NVIDIA CEO Huang Renxun. Please come to the stage. Thank you very much for coming! I want to start with NVIDIA's remarkable performance, with a market capitalization of up to 3 trillion US dollars. Did you ever think five years ago that the world would evolve so rapidly and present such a remarkable picture today? 11. Huang Renxun: Absolutely! I expected that from the beginning. 12. Moderator: That's really amazing. Can you offer some advice to the CEOs in the audience on how to achieve their goals? 13. Huang Renxun: Whatever you decide to do, my advice is not to get involved in the development of graphics processors (GPUs). 14. Moderator: I will tell the team that we are not going to get involved in that field. We spent a lot of time today discussing the profound significance of data intelligence. Enterprises have vast amounts of proprietary data that are critical for building customized artificial intelligence models. The deep mining and application of this data are crucial to us. Have you also noticed this industry trend? Do you think we should increase our investment in this area? Have you collected any feedback and insights from the industry on this issue? 15. Huang Renxun: Every company is like a gold mine with abundant business data. If your company offers a series of services or products and customers are satisfied with them while giving valuable feedback, you have accumulated a large amount of data. These data may involve customer information, market trends, or supply chain management. Over the years, we have been collecting these data and have a huge amount of data, but until now, we have just started to extract valuable insights from them, and even higher-level intelligence. 16. Currently, we are passionate about this. We use these data in chip design, defect databases, creation of new products and services, and supply chain management. This is our first time using engineering processes based on data processing and detailed analysis, building learning models, then deploying these models, and connecting them to the Flywheel platform for data collection. 17. Our company is moving towards the world's largest companies in this way. This is, of course, due to the extensive use of artificial intelligence technology in our company, which has helped us achieve many remarkable achievements. I believe that every company is experiencing such changes, so I think we are in an extraordinary era. The starting point of this era is data, and the accumulation and effective use of data. 18. The harmonious coexistence of open source and closed source 19. Moderator: This is truly amazing and very much appreciated. At present, the debate about closed-source and open-source models is gradually heating up. Can open-source models catch up? Can they coexist? Will they eventually be dominated by a single closed-source giant? What is your view of the entire open-source ecosystem? What role does it play in the development of large language models? And how will it develop in the future?
Author: Li Haidan On July 30th Beijing Time, many of the latest developments in rendering, simulation and generative AI were showcased at SIGGRAPH 2024 in Denver, USA. Last year's SIGGRAPH saw Nvidia launch GH200, L40S graphics cards, ChatUSD, etc. This year's protagonist is Nvidia's new trump card in the generative AI era - the upgraded Nvidia NIM, which applies generative AI to USD (Universal Scene Description) through NIM, expanding the possibilities of AI in the 3D world.

On July 30th Beijing Time,$NVIDIA (NVDA.US)$Many of the latest developments in rendering, simulation, and generative AI in the field were showcased at SIGGRAPH 2024 in Denver, USA.

Last year, Nvidia launched GH200, L40S, ChatUSD and other products at SIGGRAPH. This year, Nvidia's upgraded Nvidia NIM is the protagonist, which is a new trump card in the era of generative AI, and applies generative AI to USD through NIM, expanding the possibilities of AI in the 3D world.

01 Nvidia NIM Upgrade: both a gospel and a challenge

Nvidia announced that Nvidia NIM has been further optimized and standardized the deployment of AI models. NIM is a key part of Nvidia's AI layout. Huang Renxun has praised NIM many times for its innovations, calling it "AI-in-a-Box, essentially artificial intelligence in a box."

This upgrade undoubtedly consolidates Nvidia's leading position in the AI field and becomes an important part of its technical moat.

CUDA has always been considered a key factor in Nvidia's leading position in the GPU field. With the support of CUDA, GPUs have evolved from single graphics processors to general-purpose parallel computing devices, making AI development possible. However, although Nvidia's software ecosystem is very rich, for traditional industries that lack AI basic development capabilities, these scattered systems are still too complex and difficult to master.

To solve this problem, in March of this year, Nvidia launched NIM (Nvidia Inference Microservices) cloud-native microservices at the GTC conference, integrating all software developed over the past few years to simplify and accelerate the deployment of AI applications. NIM can deploy models as optimized "containers" that can be deployed in the cloud, data centers, or workstations, allowing developers to complete the work in a few minutes, such as easily building generative AI applications for co-pilots and chatbots.

Until now, the Nvidia-laid NIM ecosystem can provide a range of pre-trained AI models. Nvidia announced that it helps developers accelerate application development and deployment in multiple fields, and provides specific AI models in different fields (such as understanding, digital humans, 3D development, robot technology, and digital biology).

Services and specific models provided by Nvidia NIM (Nvidia Inference Model)

In the field of understanding, NIM can use Llama 3.1 and NeMo Retriever to enhance the processing capabilities of text data. In the digital human direction, it provides models such as Parakeet ASR and FastPitch HiFiGAN, supporting high-fidelity speech synthesis and automatic speech recognition, providing powerful tools for building virtual assistants and digital humans.

In 3D development, models such as USD Code and USD Search simplify the creation and operation of 3D scenes, helping developers to efficiently build digital twins and virtual worlds.

In the direction of robot embodiment, Nvidia has launched the MimicGen and Robocasa models, which accelerate the research and application of robot technology through the generation of synthetic motion data and simulation environments. MimicGen NIM can generate synthetic motion data based on remote operation data recorded by spatial computing devices such as Apple Vision Pro. Robocasa NIM can generate robot tasks and simulation-ready environments in OpenUSD, a general-purpose framework for developing and collaborating in the 3D world.

Digital biology models such as DiffDock and ESMFold provide advanced solutions for drug discovery and protein folding prediction in the field of digital biology, promoting the progress of biomedical research, and so on.

In addition, Nvidia announced that the Hugging Face inference-as-a-service platform is also supported by Nvidia NIM and runs in the cloud.

By integrating these versatile models, Nvidia's ecosystem not only improves the efficiency of AI development but also provides innovative tools and solutions. However, although the many upgrades of Nvidia NIM are indeed a "blessing" for the industry, they also bring many challenges to programmers.

By providing pre-trained AI models and standardized APIs, Nvidia NIM greatly simplifies the development and deployment of AI models, which is indeed a blessing for developers. However, does this also mean that the employment opportunities of ordinary programmers may further shrink in the future? After all, companies can complete the same work with fewer technical personnel, because these tasks have been pre-completed by NIM, and ordinary programmers may no longer need to perform complex model training and tuning work.

02 Teaching AI to think in 3D and building a virtual physical world

NVIDIA also demonstrated the application of generative AI on OpenUSD and Omniverse platforms at the SIGGRAPH conference.

NVIDIA announced that it created the world's first generative AI model that can understand geometry, materials, physics, and space based on the OpenUSD (Universal Scene Description) language and packaged these models into Nvidia NIM microservices.

With the enhancement and accessibility of Nvidia NIM microservices to OpenUSD, industries of all kinds can build physics-based virtual worlds and digital twins in the future. With the new generative AI based on OpenUSD and the Nvidia acceleration development framework built on the Nvidia Omniverse platform, more industries can now develop applications for visual industrial design and engineering projects, as well as simulations to build the next wave of physical AI and robots. In addition, the new USD connector connects robotics and industrial simulation data formats and developer tools, allowing users to stream large-scale NVIDIA RTX ray-traced data sets to Apple Vision Pro.

In short, introducing USD through NVIDIA NIM and understanding the physical world better through large models and building virtual worlds is a valuable digital asset. For example, in 2019, the Notre-Dame de Paris suffered a serious fire and the church was extensively damaged. Fortunately, Ubisoft game designers had visited the building countless times to learn its structure and completed the digital restoration of Notre-Dame de Paris. In the 3A game "Assassin's Creed: Revolution," they reproduced all the details of Notre-Dame de Paris, which greatly helped the church's restoration. At that time, designers and historians spent two years replicating it, but with the advent of this technology, we can speed up the reproduction of digital copies on a large scale in the future and refine our understanding and replication of the physical world through AI.

Designers can also build basic 3D scenes in Omniverse and use these scenes to adjust generative AI for controllable and collaborative content creation processes. For example, WPP and Coca-Cola were the first to adopt this workflow to expand their global advertising campaigns.

Nvidia also announced several new NIM microservices that will further enhance developers' capabilities and efficiency on the OpenUSD platform, including USD Layout, USD Smart Material, and FDB Mesh Generation.

NVIDIA Research, with more than 20 papers, participated in the conference and shared innovative results related to driving the development of synthetic data generators and inverse rendering tools. Two papers won the Best Technical Paper Award. This year's research shows that AI has improved its capabilities by improving image quality and unlocking new 3D representations; at the same time, improved synthetic data generators and more content have also improved AI's level. These studies showcase Nvidia's latest advances and innovations in AI and simulation fields.

Photo caption: Getty Images generative AI case

NVIDIA stated that designers and artists now have new and improved ways to increase productivity by using generative AI trained on licensed data. For example, Shutterstock, a US image supplier, has released a commercial beta version of its generative 3D service. It allows creators to prototype 3D assets quickly with text or image prompts and generate 360 HDRi backgrounds to illuminate the scene; and Getty Images, a US image trading company, has accelerated its generative AI service to double the speed of image generation and improve output quality. The services are based on the multimodal generative AI architecture Nvidia Edify, which doubles the speed with new models, improves image quality and prompt accuracy, and allows users to control camera settings such as depth of field or focal length. Users can generate four images in about six seconds and enlarge them to 4K resolution.

Conclusion

In all the occasions where Jensen Huang has appeared, he always wears a leather jacket, depicting the exciting future brought by AI to the world.

We have also witnessed NVIDIA's growth, from a game GPU giant to an AI chip dominator, and then to a horizontal AI software and hardware full-stack layout. NVIDIA is full of ambition, rapidly iterating at the forefront of the AI technology wave.

From programmable shading GPUs and CUDA accelerated computing to the launch of Nvidia Omniverse and generative AI NIM microservices, to the promotion of 3D modeling, robot simulation, and digital twin technologies, it also means a new round of innovation in the AI industry is coming.

However, as large companies have more resources, including funds, technology, and manpower, they may be able to adopt and implement advanced technologies such as Nvidia NIM more quickly. Small and medium-sized enterprises may find it difficult to keep up with the pace of technological development due to limited resources. Coupled with differences in talent and technology levels, will this lead to further technological inequality in the future?

The ideal AI for humanity is to help liberate hands and labor, and bring a higher productivity world to humanity. But when productivity and means of production are controlled by a small number of people, will it cause a deeper crisis? These are the issues we need to consider.

Editor/Somer

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.