Source: Brokerage China
Author: Shi Qian
NVIDIA has recognized it too!
据 $NVIDIA (NVDA.US)$ According to the latest news on the official website, to help developers safely experiment with these features and build their own dedicated agents, the 671 billion parameter DeepSeek-R1 model is now available as a preview version of the NVIDIA NIM microservice on Build.nvidia.com. The DeepSeek-R1 NIM microservice can provide up to 3,872 tokens per second on a single NVIDIA HGX H200 system. Developers can use the application programming interface (API) for testing and experimentation, which is expected to be available soon as a downloadable NIM microservice, and is part of the NVIDIA AI Enterprise Software platform.
In addition, NVIDIA also stated on the official website that DeepSeek-R1 is an open model with state-of-the-art inference capabilities. Inference models like DeepSeek-R1 do not provide direct responses but perform multiple inferences on queries, using reasoning chains, consensus, and search methods to generate the best answers.
Recognition from NVIDIA
NVIDIA states that inference models like DeepSeek-R1 will not provide direct responses, but will reason multiple times over queries, using thought chains, consensus, and search methods to generate the best answer. The execution of this series of inference processes (using inference to derive the best answer) is referred to as test time extension. DeepSeek-R1 is a perfect example of this extension law, demonstrating that accelerated computing is crucial for agent AI reasoning.
Because the models can repeatedly 'think' about the questions, they create more output tokens and longer generation cycles, thus continuously improving the quality of the model. Extensive computational testing is vital for achieving real-time reasoning and higher quality responses from inference models like DeepSeek-R1, requiring larger scale inference deployment. R1 provides leading accuracy in tasks requiring logical reasoning, inference, mathematics, coding, and language understanding, while also offering high inference efficiency.
To help developers safely experiment with these features and build their own dedicated agents, the 671 billion parameter DeepSeek-R1 model is now available as a preview version of NVIDIA NIM microservices at Build.nvidia.com. The DeepSeek-R1 NIM microservice can provide up to 3,872 tokens per second on a single NVIDIA HGX H200 system. Developers can test and experiment using an application programming interface (API), which is expected to be available soon as a downloadable NIM microservice, part of the NVIDIA AI Enterprise software platform.
The DeepSeek-R1 NIM microservice simplifies deployment by supporting industry-standard APIs. Enterprises can maximize security and data privacy by running the NIM microservices on their preferred accelerated computing infrastructure. Additionally, using NVIDIA AI Foundry and NVIDIA NeMo software, businesses can create customized DeepSeek-R1 NIM microservices for dedicated AI agents.
DeepSeek-R1 is a large mixture of experts (MoE) model. It contains an impressive 671 billion parameters—10 times more than many other popular open-source LLMs—and supports an input context length of 128,000 tokens. The model also employs an extensive number of experts at each layer. Each layer of R1 has 256 experts, with each token being routed in parallel to eight different experts for evaluation.
Providing real-time answers for R1 requires many GPUs with high computational performance and connections through high bandwidth and low latency communication to route prompt tokens to all experts for inference. Combined with the software optimizations provided in NVIDIA NIM microservices, a server with eight H200 GPUs connected via NVLink and NVLink Switch can run the full 671 billion parameter DeepSeek-R1 model at speeds of up to 3,872 tokens per second. This throughput is achieved by utilizing the NVIDIA Hopper architecture's FP8 Transformer Engine at each layer and leveraging 900 GB/s NVLink bandwidth for MoE expert communication.
Fully utilizing the floating point operations per second (FLOPS) performance of GPUs is critical for real-time reasoning. The next-generation NVIDIA Blackwell architecture will significantly enhance the test time extension of inference models like DeepSeek-R1 through the fifth generation Tensor Core, which offers peak FP4 computation performance of up to 20 petaflops, as well as a 72-GPU NVLink domain optimized specifically for inference.
From open source to reproduction.
Recently, researchers from the University of California, Berkeley developed a small-scale language model replica of the AI language model DeepSeek R1-Zero, which was developed in China, at a cost of about $30. The language model TinyZero is a project led by campus graduate student Jiayi Pan and three other researchers, guided by campus professor Alane Suhr and assistant professor Hao Peng from the University of Illinois Urbana-Champaign.
The weights and codebase of DeepSeek's R1 model are under the public MIT license, allowing Pan and his team to access the underlying code to train a model that is significantly smaller. Pan stated that TinyZero is also open-source, which means the code is available for public use. He said that the open-source nature of TinyZero allows people to download the code and try to train and modify the model. "Small-scale replication is very easy to achieve, and it's very low-cost, even if people do it as a side project for experimentation," Pan said. "From the start of the project, our goal was essentially to uncover the mystery of how to train these models and to better understand the science and design decisions behind them."
Yesterday,$Microsoft (MSFT.US)$The official website shows that DeepSeek R1 is now available in the model catalog on Azure AI Foundry and GitHub, joining a diverse product portfolio of over 1,800 models, including cutting-edge, open-source, industry-specific, and task-based AI models. As part of Azure AI Foundry, DeepSeek R1 is accessible on a trusted, scalable, and enterprise-ready platform, enabling businesses to seamlessly integrate advanced AI while meeting SLA, security, and responsible AI commitments—all backed by Microsoft's reliability and innovation.
Blackstone's latest attitude
This week, leaders in Silicon Valley, Washington, Wall Street, and other areas were thrown into confusion by the unexpected rise of the Chinese AI company DeepSeek. Many analysts believe that DeepSeek's success shakes the core belief that has driven the development of the USA's AI industry.
However, AI scientists counter that many concerns are exaggerated. They stated that while DeepSeek does represent a genuine advancement in AI efficiency, the USA's AI industry still holds key advantages. RAND Corporation AI researcher Lennart Heim said, "This is not a leap in frontier AI capabilities. I think the market just got it wrong."
Additionally, as of now, private equity giant and major global AI systems datacenter investor Blackstone Group remains optimistic. "We still believe that the demand for physical infrastructure, datacenters, and electrical utilities is very urgent," said Jonathan Gray, President of Blackstone, during the fourth quarter earnings report conference call with investors on Thursday. "The way these demands are utilized may change."
Gray stated that, like most people in the investment and business community, Blackstone's executives have spent a lot of time over the past week weighing the impact of DeepSeek. In recent years, Blackstone has actively purchased and built datacenters, which are the physical infrastructure used by technology companies to run AI systems. In 2021, Blackstone acquired the American datacenter company QTS for 10 billion USD, and last year, Blackstone led a purchase of AirTrunk, which operates datacenters in Asia, for approximately 16 billion USD.
Gray also expects that with the significant decrease in the cost of AI computing power, AI will see wider applications. In other words, while the capability required for AI models to answer specific questions may decrease, people will ask more questions. Gray mentioned that Blackstone only builds datacenters for technology companies that sign long-term leases. "We will not build them speculatively," he noted, indicating that the way clients use these datacenters is likely to change.
The wave of AI applications is coming! Make good use of the "Investment Theme" feature to capture investment opportunities.
Open Futubull > US Stocks > Investment Themes >AI application software stocks
Editor/Jeffy
Comment(37)
Reason For Report