share_log

黄仁勋最新万字访谈:AGI即将来临,AI将彻底改变生产力

Huang Renxun's latest 10,000-word interview: AGI is coming soon, AI will completely change productivity.

wallstreetcn ·  19:52

The flywheel of machine learning is the most important. Just having a powerful GPU does not guarantee a company's success in the field of AI.

On October 4th, NVIDIA CEO Huang Renxun was a guest on the interview program Bg2 Pod, and had an extensive conversation with hosts Brad Gerstne and Clark Tang.

They mainly discussed how to expand intelligence to AGI, NVIDIA's competitive advantages, the importance of reasoning and training, the market dynamics of the future in the AI field, the impact of AI on various industries, and topics such as Elon's Memphis supercluster and X.ai, OpenAI, among others.

Huang Renxun emphasized the rapid evolution of AI technology, especially breakthroughs on the path to General Artificial Intelligence (AGI). He mentioned that AGI assistants will soon appear in some form and will become more sophisticated over time.

Huang Renxun also shared NVIDIA's leadership position in the computing revolution, pointing out that by reducing computing costs and innovating hardware architecture, NVIDIA has a significant advantage in driving machine learning and AI applications. He particularly mentioned NVIDIA's "moat", the decade-long accumulation of a software and hardware ecosystem, making it difficult for competitors to surpass with a single chip improvement.

Furthermore, Huang Renxun praised xAI and the Musk team for completing the construction of the Memphis supercluster with one hundred thousand GPUs in just 19 days, calling it an "unprecedented" achievement. This cluster is undoubtedly one of the fastest supercomputers globally, and will play a crucial role in AI reasoning and training tasks.

Regarding the impact of AI on productivity, Huang Renxun optimistically mentioned that AI will greatly enhance enterprise efficiency, bring more growth opportunities, and not lead to mass unemployment. At the same time, he called for the industry to strengthen its focus on AI security to ensure that the development and use of technology are beneficial to society.

The summary of the key points of the full text is as follows:

  • Siasun Robot&Automation assistants will soon appear in some form... At first, it will be very useful, but not perfect. Then over time, it will become more and more perfect.
  • We have reduced the calculated marginal costs by 100,000 times in 10 years. Our entire stack is growing, the entire stack is innovating.
  • People think that the reason for designing better chips is that it has more triggers, more bits, and bytes... but machine learning is not just about software, it's about the entire data pipeline.
  • The flywheel of machine learning is the most important. You have to think about how to make this flywheel faster.
  • Merely having a powerful GPU does not guarantee a company's success in the AI field.
  • Musk's understanding of large-scale system engineering, construction, and resource allocation is unique... 100,000 GPUs as a cluster... completed in 19 days.
  • AI will not change every job, but will have a huge impact on how people work. When companies use AI to increase productivity, it often manifests as better profits or growth.

Evolution of AGI and AI assistants

Brad Gerstner:

This year's theme is to expand intelligence to AGI. When we did this two years ago, we were in the AI era, which was two months before ChatGPT, considering all these changes, it's truly incredible. So I think we can start with a thought experiment and a prediction.

If I popularly imagine AGI as my personal assistant in my pocket, if I think of AGI as that conversational assistant I'm used to. It knows everything about me. It has perfect memory of me and can communicate with me. They can help me book hotels or make doctor appointments. Looking at the speed of change in the world today, when do you think we will have a personal assistant?

Huang Renxun:

It will soon appear in some form. And over time, this assistant will get better and better. This is the wonderful technology we know. So I think at first it will be very useful but not perfect. Then over time, it will become more and more perfect. Just like all technologies.

Brad Gerstner:

When we look at the speed of change, I think Musk once said, the only thing that really matters is the speed of change. We do feel that the speed of change has accelerated dramatically, this is the fastest rate of change we have seen on these issues because we have been in the AI field for ten years, or even longer. Is this the fastest rate of change you have seen in your career?

Huang Renxun:

This is because we have reinvented computing. Many things have happened because we have reduced the marginal cost of computing by 100,000 times in 10 years. Moore's Law should be around 100 times. We have achieved this in multiple ways. Firstly, we introduced accelerated computing, moving the less efficient work on CPUs to GPUs. We achieved this through inventing new numerical precisions. We achieved this through new architectures, inventing tensor cores, building the MV Link system, very fast memory, extending with MV Link, and working across the entire stack. Essentially, everything I described about how NVIDIA operates has led to an innovation pace beyond Moore's Law.

What is truly amazing is that from then on, we transitioned from manual programming to machine learning. The magic of machine learning is that it can learn very quickly. And it has been proven. So, when we redefined the way computations are allocated, we did a lot, all kinds of parallelism. Tensor parallelism, various pipeline parallelisms. We excel in inventing new algorithms and training methods on this basis, and all these technologies, all these inventions are the cumulative result.

Looking back, if you look at how Moore's Law worked, software was static. It was precompiled, like a shrink-wrapped raft put in a store. It was static, while the hardware underneath grew at the pace of Moore's Law. Now, our entire stack is growing, the whole stack is innovating. So, I think now we suddenly see expansion.

This is certainly extraordinary. But what we used to talk about was pre-trained models and the scalability at that level, and how we doubled the model size, hence doubled the data size accordingly. As a result, the required computing power increases fourfold each year. This is a big deal. But now we see the post-training expansion, we see the inference expansion. So people used to think pre-training was hard, inference was easy. Now everything is hard. It makes sense, but thinking that all human thinking is one-shot ideas is a bit absurd. So, there must be concepts of fast thinking, slow thinking, inference, reflection, iteration, and simulation. Now it is emerging.

NVIDIA's competitive moat

Clark Tang:

I think one of the most easily misunderstood things about NVIDIA is how deep the true NVIDIA model is. I think there is a notion that if someone invents a better chip, they win. But the fact is, you spend a decade building a complete stack from GPUs to CPUs to the network, particularly the software and libraries supporting applications. Running on NVIDIA. So, when you talk about this, when you think about NVIDIA's moat today, do you think today's video model is larger or smaller than three or four years ago?

Huang Renxun:

Okay, I really appreciate your recognition of how calculations are changing. In fact, people think (and many still do) that the reason for designing better chips is because they have more triggers, more bits, and bytes. Do you understand what I mean? You will see their keynote presentation slides. It has all these triggers, bar graphs, and things like that. All of these are good. What I mean is, look, horsepower is indeed important. Yes. So these things are fundamentally important.

However, unfortunately, these are all ideas. These are ideas in the sense that software is running some application on Windows and software is static, right? This means that the best way to improve the system is to create faster ships. But we realize that machine learning is not human coding. Machine learning is not just software, it's about the entire data pipeline. In fact, the flywheel of machine learning is the most important. So how do you view me enabling this flywheel? On one hand, enabling data scientists and researchers to work efficiently in this flywheel, which started from the beginning. Many people didn't even realize the need for AI to manage data, to teach AI. And AI itself is quite complex.

Brad Gerstner:

Is AI itself improving? Is it also accelerating? Again, when we consider competitive advantage, yes, that's right. It's a combination of all these.

Huang Renxun:

It is precisely because of smarter AI managing data that this situation arises. We now even have synthetic data generation and various different ways of presenting data to it. So before you undergo training, you have already been involved in a lot of data processing. So people would think, oh, Pytorch, this is the beginning of the world and also the end of the world. This is very important.

But don't forget, before and after Pytorch, the significance of the flywheel is in how you must think, how do I think about the whole flywheel, how to design a computing system, a computing architecture, to help you leverage this flywheel, make it as efficient as possible. It's not about the size of the application training. Does that make sense? It's just one step. Alright. Every step on the flywheel is challenging. So the first thing you should do is not think about how to make Excel faster, how to make Doom faster, that's in the past, right? Now you have to think about how to make this flywheel faster? This flywheel has many different steps, machine learning is not easy, you all know.

The things that the OpenAI or X or Gemini team is doing are not easy, they are carefully considering our needs. I mean, what they are doing is not easy. So we decided, you look, this is what you should consider. This is the whole process, you want to accelerate every part of it. You must respect Moore's Law, Moore's Law shows that if it's 30% of the time, and I speed it up three times, then I haven't really accelerated the entire process. Does that make sense? You really want to create a system to speed up every step, because only by doing the whole thing, can you really substantially improve the cycle time and the flywheel, which is the learning rate, ultimately leading to exponential growth.

So, what I want to say is that our view of what the company is really doing will be reflected in the products. Note, I have been talking about this flywheel, the entire website. Yes, that's right. We are accelerating everything.

Now, the main focus is on video. Many people are focused on physical AI and video processing. Imagine the front end. TB of data enters the system every second. For example, a pipeline will receive all this data. First, training needs to be prepared. Yes, this way the whole process can be accelerated.

Clark Tang:

Today, people only think about text models. Yes, but in the future, it's the video model, using some text models, like o1, to really process large amounts of data before we get there.

Huang Renxun:

Yes. So the language model will involve everything. But us, this industry has spent significant technology and effort to train the language models, training these large language models. Now, we are using large language forms at every step. This is amazing.

Brad Gerstner:

I hear you saying, in composite systems, yes, the advantage will grow over time. So I hear you say, our advantage today is greater than three to four years ago because we are improving every component. This is a combination, when you think of, for example, as a business case study, intel, relative to where you are now, it has the dominant mode, dominant position in the stack. Perhaps, to put it simply, compare your competitive advantage to theirs during their peak in the cycle.

Huang Renxun:

What sets Intel apart is the possibility that they may be the first company to excel in manufacturing process engineering and manufacturing. When it comes to manufacturing, it's about making chips. Designing chips, building chips in the x86 architecture, making faster x86 chips, that's where their talent lies, they integrate it with manufacturing.

Our company is a bit different, we recognize this fact, in fact, parallel processing does not require every transistor to perform well, while serial processing requires every transistor to perform well. Parallel processing requires a large number of transistors to be cost-effective. I would rather have ten times more transistors, 20% slower. Then ten times fewer transistors, 20% faster. Does this make sense? They want the opposite result. Therefore, single-thread performance, single-thread processing, and parallel processing are very different. Therefore, we observe that, in fact, our world is not getting better and better. We want to do very well, as best as possible, but our world is truly advancing constantly.

Parallel computing, parallel processing is difficult because each algorithm requires a different way of restructuring and re-architecting the algorithm. What people don't realize is that you can have three different CPUs. They each have their own C compilers. You can compile software to that axis.

This is not possible in accelerating calculations. Companies proposing architectures must come up with their own Open GL. So we have revolutionized deep learning because our domain-specific library is called cuDNN (Deep Neural Network Library), one domain-specific library is called optical. We have a domain-specific library called cuQuantum.

Brad Gerstner:

For industry-specific algorithms below, you know, everyone is focusing on Pytorch layers. Like I often hear.

Huang Renxun:

If we don't invent it, none of the applications above can work. Do you understand what I mean? So, algorithms are really what NVIDIA excels at. The dissemination of science on top of the underlying architecture is what we excel at.

NVIDIA is building a complete AI computing platform, including hardware, software, and ecosystem.

Clark Tang:

Now, all the attention is focused on reasoning. But I remember, two years ago, when Brad and I had dinner together, I asked you a question. Do you think your moat in reasoning will be as strong as in training?

Huang Renxun:

I'm not sure if I said it would be stronger.

Clark Tang:

You just mentioned a lot of these elements, the combinability between the two, or we don't know the overall combination. For customers, maintaining flexibility between the two is very important. But now that we are in the era of reasoning, can you talk about it?

Huang Renxun:

Reasoning training is to reason on that scale. What I mean is, you are correct. So, if you train properly, there is a good chance you will reason properly, it will run on this framework if you build it on this framework without any consideration. Well, you can still optimize it to fit other architectures, but at least, because it was built on nvidia architecture, it will run on nvidia.

Now, on the other hand, of course, it is also a capital investment aspect, that is, when you train new models, you want to train with your best new devices. This will leave out the devices you used yesterday, which are very suitable for inference. So behind the new infrastructure, there is a range of free devices that are compatible. Therefore, we rigorously ensure that we always remain compatible, so everything we leave behind will continue to excel.

Now, we are also putting a lot of effort into constantly reinventing new algorithms so that when the time comes, the Hopper architecture is two, three, four times better than when they were purchased, so that this infrastructure will continue to be truly effective. Therefore, all the work we do, improving new algorithms, new frameworks. Note that it helps us have every installed base. Hopper is better for it, Ampere is better for it, and even Volta is better for it.

I just heard from Sam Altman that they have recently stopped using OpenAI's Volta infrastructure. So I think, we leave traces of this installed base, just as all computing installed bases are important. Nvidia enters every cloud, including local and edge.

The VILA visual language model is created in the cloud, no modifications needed, and it can run perfectly on siasun robot&automation edges. They are both highly compatible. Therefore, I think architectural compatibility is very important for large devices, as well as for iPhones and other devices. I believe the installed base is crucial for inference.

Huang Renxun:

But what really benefits me greatly is that we are working on training these large language models in new architectures. We are able to think about how to create architectures that perform exceptionally well in reasoning when the time is right. Therefore, we have been thinking about the iterative model of inference models and how to create a very interactive inference experience for this, right, your personal agent. You don't want to leave and think for a while after finishing speaking. You want to interact with you quickly. So how do we create something like this?

This way, we can use these systems that are very suitable for training. However, when you complete it, the inference performance will be excellent. So you want to optimize this to the time of the first Token. Achieving the time of the first Token is actually very difficult because it requires a lot of bandwidth. But if your context is also rich, then you need a lot of FLOPS. Therefore, you need an unlimited amount of bandwidth and an unlimited amount of FLOPS to achieve a response time of a few milliseconds. So this architecture is really difficult to achieve. We have invented the great Blackwell MVLink for this.

Brad Gerstner:

Earlier this week, I had dinner with Andy Jassy (Amazon's President and CEO), and Andy said, we have Tranium, Inferentia coming soon. I think most people once again view these as Nvidia's issues. But then he said Nvidia is our important partner and will continue to be our important partner. As I see it, the future world will depend on Nvidia.

So when you think about the custom ASICs being built, they will be used for target applications. Maybe Meta's inference accelerators, maybe Amazon's training, or Google's TPU. And then you think about the supply shortages you are facing today, will these factors change this dynamic? Or will they complement the systems they buy from you?

Huang Renxun:

We are just doing different things. Yes, we are trying to accomplish different things. Nvidia is now trying to build a computing platform for this new world, this machine learning world, this generative AI world, this agentive AI world. We are trying to create, in the field of computing, to such a profound degree that after 60 years of development, we have reinvented the entire computing stack. From programming to machine learning, from CPU to GPU, from software to AI, applications from software to AI. From software tools to AI. Therefore, every aspect of the computing stack and technology stack has changed.

What we want to do is create a ubiquitous computing platform. This is indeed the complexity of our work, the complexity of our work is that if you think carefully about what we are doing, you will find that we are building the entire AI infrastructure, we see it as a computer. I have said before, the data center is now the unit of computation. For me, when I think of a computer, I am not thinking about chips. I am thinking about this thing. This is my mental model, with all the software, all the orchestration, all the machines inside, it is my mission. This is my computer.

Every year we try to build a new one. Yes, that's crazy. No one has ever done this before. Every year, we try to build a completely new one. Every year, we provide two to three times the performance. Therefore, every year, we reduce costs by two to three times. Every year, we improve energy efficiency by two to three times. Therefore, we ask customers not to buy everything at once, just buy a little each year, right? Ok. The reason for this is that we want their costs to remain average in the future. Now, everything is compatible in architecture, so it is very difficult to build these things separately at our current pace.

The double challenge right now is that we accept all of these, instead of selling them as infrastructure or services, we disagree with all of them. We integrate it into GCP, AWS, Azure, and X. So everyone's integration is different. We have to integrate all of our architecture libraries, all algorithms, and all frameworks into their frameworks. We integrate our security system into their system, we integrate our network into their system, right? Then we basically do 10 integrations, and now we do this every year. That's a miracle.

Brad Gerstner:

We, I mean, you try to do this every year, it's crazy. So what drives you to do this every year?

Huang Renxun:

Yes, that's when you systematically break it down. The more you break it down, the more surprised everyone is. Yeah. How the entire electronic ecosystem can be dedicated to working with us today, ultimately building a computer cube integrated into all these different ecosystems, and coordinating so seamlessly. So obviously what we propagate backward is APIs, methods, business processes, and design rules, while what we propagate forward is methods, architecture, and APIs.

Brad Gerstner:

That's how they were.

Huang Renxun:

For decades, they have been working hard. Yes, and they are also continuously developing along with our progress. However, these APs must be integrated together.

Clark Tang:

Some people just need to call the OpenAI API, and it works. That's it.

Huang Renxun:

Yes. Yes, it's a bit crazy. This is a whole. This is what we invented, this immense computing infrastructure, the entire planet is collaborating with us. It integrates anywhere. You can sell it through Dell, or you can sell it through HP. It's hosted in the cloud. It's ubiquitous. People are now using it in robot systems, robot and human robots, in self-driving cars. They are compatible in architecture. Pretty crazy.

Brad Gerstner:

This is too crazy.

Huang Renxun:

I don't want you to leave with the impression that I didn't answer the question. In fact, I did. When we talk about layered foundations, what I mean is the way we think. We are just doing things differently. Yes, as a company, we want to understand the situation, and I am very knowledgeable about everything around the company and the ecosystem, right?

I know everyone is doing other things, what they are doing. Sometimes this is not favorable to us, sometimes it is. I am very clear about this, but it doesn't change the company's goal. Yes, the company's sole goal is to build a platform architecture that can be ubiquitous. That's our goal.

We will not try to take shares from anyone. NVIDIA is a market maker, not a share taker. If you look at the slides our company hasn't shown, you'll find that this company doesn't talk about market share for a day, not internally. What we talk about is how we create the next thing?

What's the next problem we can solve in this flywheel? How can we better serve people? How can we shorten the flywheel that used to take about a year to about a month? Yes. What is the speed of light? Isn't it?

So we are considering all these different things, but one thing we won't, we won't, we don't have all the details on everything, but we are certain that our mission is very unique. The only question is whether this mission is necessary. Does it make sense? All companies, all great companies should have this as the core. It's about what you're doing, right?

Of course. The only question is, is it necessary? Is it valuable? Yes. Is it impactful? Is it helpful to people? I'm sure you are a developer, you are a generative AI startup, you are about to decide how to become a company.

One choice you don't have to make is which A6 I support? If you only support CUDA, you can go anywhere. You can always change your mind later. But we are the gateway to the world of AI, aren't we?

Once you decide to join our platform, you can postpone all other decisions. You can always build your own foundation later. We don't oppose that. We won't be angry about it. When I work with all GCP, GCP Azure, we show them our roadmap years in advance.

They have not shown us their basic roadmap, and this has never offended us. Does that make sense? We create, we are as one. If you have a unique goal, your goal is meaningful, your mission is valuable to you and others, then you can be transparent. Please note that my roadmap is transparent at GTC. My roadmap is deeper for our friends at Azure, AWS, and other companies. We have no problem doing any of these things, even if they are building their own assets.

Brad Gerstner:

I think when people look at the business, you recently said the demand for Blackwell has been crazy. You said one of the most challenging parts of the job is emotionally telling people 'no' in a world lacking the computational power you can produce and provide. But critics said these things. Hold on a moment. They said it's like Cisco in 2000, we are overbuilding fiber. This will be a cycle of prosperity and depression. I think back to the dinner we had at the beginning of 23. At that dinner in January 23, Nvidia's forecast was that the revenue in 2023 would reach $26 billion. You hit $60 billion.

Huang Renxun:

Let the facts be revealed. This is the biggest forecast failure in world history. Yes. At least we can admit that.

GPUs are playing an increasingly important role in AI computing.

Brad Gerstner:

That's right, we were very excited on November 22 because we had people like Mustafa from Inflection, without people from Character coming to our office to talk about investing in their company. They said, well, if you can't invest in our company, buy Nvidia, because everyone in the world is trying to get Nvidia chips to build these world-changing applications. Of course, the Cambrian moment happened on ChatGPT. Even so, these 25 analysts are still focused on crypto winners so much so that they cannot imagine what is happening in the world. So, in the end, the scale is larger. Put in very plain English, the demand for Blackwell is crazy, and as long as you can foresee, it will continue this way. Of course, the future is unknown and unpredictable. But why are critics so wrong, thinking it won't overbuild like Cisco did in 2000.

Huang Renxun:

The best way to think about the future is to start from first principles, right? Okay, so, what is the first principle of what we are doing about the problem? First, what are we doing? The first thing we are doing is reinventing computing, isn't it? As we just mentioned, the future of computing will be highly machine learning. Yes, highly machine learning. Okay, almost everything we do, almost every application, Word, Excel, Powerpoint, Photoshop, Premier, AutoCAD, your favorite applications, are all manually designed. I assure you, in the future, it will be highly machine learning. Right? So all these tools will be like that, most importantly, you will have machines, agents to help you use them. Okay. So now we know this is a fact. Right? We have reinvented computing. We're not turning back. The entire computing technology stack is being reinvented. Okay. Now that we have done that, we said software will be different. What software can write will be different. The way we use software will also be different. So let's acknowledge that now. So these are the basic facts for now. Yes.

The question now is, what will happen? Let's look back at past home computing. 1 trillion dollars were invested in past computing. We see, just open the door, look at the data centers, look at them. Are these computers the future you want? The answer is no. You have all those CPUs. We know what they can do, what they can't do. We only know we have 1 trillion dollars worth of data centers that need modernizing. So now, as we speak, if we are to modernize these old things in the next four to five years. That’s not unreasonable.

So we have a trend, you are talking to those who need to modernize it. Yes, they are modernizing it on GPUs. That's it.

Let's do the test again. You have 50 billion dollars in capital expenditure. Do you like to spend option A, option B, to build capital expenditure for the future, right?

Or build capital expenditure like in the past, now that you already have past capital expenditure, right? Yes, correct. It's already there. It hasn't improved much anyway. Moore's Law has basically ended. So why rebuild it?

We'll take out 50 billion dollars and invest in generative AI, right? So now your company becomes better. Right? How much of this 50 billion dollars would you invest now? Well, I would invest 100% of the 50 billion dollars because I already have the infrastructure from the past four years.

So now, from someone reasoning from first principles, this is what they are doing. Smart people are doing smart things. Now, the second part is this. So we have a trillion dollars worth of capacity. Go for it, Bill.

Infrastructure worth trillions of dollars. Probably around $150 billion. Okay. So we have $1 trillion worth of infrastructure to be built in the next four to five years. Well, the second thing we observe is that the way software is written is different, but the way software is used is also different.

In the future, we will have proxies. Our company will have digital employees. In your inbox, you'll see these small dots on the faces of these short icons. In the future, things will mean low icons of AIS. Right? I will send these to them.

I no longer program computers in C++, I program AI with prompts. Right? Now, this is no different from when I chatted with you this morning.

Before coming here, I wrote a lot of emails. Of course, I was guiding my team. I would describe the background, describe the basic constraints I knew, describe their tasks. I would leave enough space, give enough direction for them to understand what I need. I would try to explain as clearly as possible what the results should be, but I left enough room for ambiguity, a little creative space, so they could surprise me.

Right? This is no different from the way I prompt AI today. Yes, this is exactly how I propose AI. So, on top of the modernized infrastructure, there will be a new infrastructure. This new infrastructure will be an AI factory that operates these digital beings 24/7.

We will provide these devices to companies all over the world. We will own them in the factories, we will own them in autonomous systems. Right? So there's a whole layer of computing structures. This whole layer I call the AI factory, a world that must be manufactured, but doesn't exist at all today.

So the question is, how big is this. Currently unknown. It could be trillions of dollars. I know the current situation, but when we sit down to build, the beauty of it is that the modern architecture of this new datacenter is the same as the architecture of the AI factory. This is a good thing.

Brad Gerstner:

Can you be clear that you have a trillion old things? You must modernize. You have at least a trillion dollars of new AI workloads coming. Yes, your revenue this year will reach $125 billion. Someone once told you that the market cap of this company will never exceed $1 billion. What reason do you have sitting here today? Yes, if you only have $125 billion in a TAM of trillions, your future revenue will not be two or three times now. Is there a reason your revenue hasn't increased? No.

Huang Renxun:

As you know, not everything is like that, companies are only limited by the size of the fishpond, goldfish bowls can only be so big. So the question is, what is our fishpond? What is our pond? This requires a lot of imagination, that's why market makers consider the future without creating new fishponds. Looking back and trying to grab market share makes it difficult to understand this. Yes. Share gainers can only be this big. Of course. Market makers can be very big. Of course.

So, I think the good luck our company has is that from the very beginning of the company, we had to create a market to swim in it. People didn't realize it at the time, but now they have, and we are at the starting point of creating the 3D gaming PC market. We basically invented this market, as well as all the ecosystems and graphics card ecosystems, we invented all of it. So, there is a need to invent a new market to serve it in the future, which is a very comfortable thing for us.

Huang Renxun: I am happy about OpenAI's success

Brad Gerstner:

As is well known, OpenAI raised $6.5 billion with a valuation of $150 billion this week. We all participated.

Huang Renxun:

Yes, I am truly happy for them, really happy that they have come together. Yes, they have done something great, and the team has also performed well.

Brad Gerstner:

Reportedly, their revenue or operating income for this year will reach around 5 billion USD, possibly reaching 10 billion USD next year. If you look at the business today, its revenue is approximately double that of Google's IPO. They have 0.25 billion, yes, an average of 0.25 billion users per week, which we estimate is double that of Google's IPO. If you look at the company's P/E ratio, if you believe they will reach 10 billion USD next year, then it is about 15 times the expected revenue, similar to the P/E ratios of Google and Meta at their IPOs. Imagine a company with zero revenue and zero average users 22 months ago.

Tell us about the importance of OpenAI as a partner for you, and the power of OpenAI in driving public awareness and use of AI.

Huang Renxun:

Well, this is one of the most important companies of our time, a purely AI company pursuing the AGI vision. What its definition is, doesn't really matter to me. I almost don't think the definition or timing is important at all. What I do know is that AI will have a capability roadmap over time. And this capability roadmap will be very magnificent and fascinating. In this process, even before it reaches anyone's definition of AGI, we will make full use of it.

What you need to do now, while we are talking, is to talk to digital biologists, climate tech researchers, materials researchers, physicists, astrophysicists, quantum chemists. You can talk to video game designers, manufacturing engineers, robot experts. Choose your favorite. Whatever industry you choose, you need to delve into it, talk to important people, ask them how AI has completely changed your way of working. Collect these data points, then ask yourself how skeptical you want to be. Because they are not talking about AI's conceptual advantages. They are talking about using AI in the future. Now, in agriculture tech, materials tech, climate tech, choose your tech, choose your field of science. They are progressing. AI is helping them advance their work.

Now, as we said, every industry, every company, every height, every university. Incredible. Right? Absolutely. It's going to change business in some way. We know this. I mean, we know how real it is.

Today. It is happening. It is happening. So, I think the awakening of ChatGPT triggered it, which is absolutely incredible. I like their speed and their unique goals driving the development in this field, it's really important.

Brad Gerstner:

They built an economic engine that can fund the next frontier of models. I think Silicon Valley is forming a consensus that the entire model layer, commoditized Llama, enables many to build models at very low prices. So, in the early days, we had many model companies. These, features, tones, and cohesiveness are all listed.

Many question whether these companies can build escape velocity on the economic engine to continue funding the next generation. My own feeling is, this is why you see consolidation. OpenAI clearly achieved velocity. They can finance their own future. I am not sure many other companies can do the same. Is this a fair assessment of the current state of the model layer? We will, like in many other markets, consolidate this onto market leaders who can afford it, they have economic engines and applications to keep investing.

Merely having a powerful GPU does not guarantee a company's success in the AI field

Huang Renxun:

First, there is a fundamental distinction between models and AI. Yes. Models are an essential element. Yes. For AI, it is necessary but not sufficient. Yes. So, AI is a capability, but what is it used for, right? Then what is its application, right? The AI driving a software car has to do with human robots, but is not the same, the latter is related to chatbot AI, but is not the same.

So you have to understand taxonomy. Yes, the taxonomy of the stack. At every layer of the stack, there will be opportunities, but not every layer of the stack provides limitless opportunities for everyone.

Now, what I just said is that you replace the term model with GPU. In fact, this is a great observation from our company 32 years ago, that there is a fundamental difference between GPU, graphics chips or GPUs, and accelerated computing. Accelerated computing is different from what we do in ai infrastructure. They are related, but not entirely the same. They are layered on top of each other. They are not entirely the same. And each of these abstract layers requires completely different skills.

People who are truly good at building GPUs do not know how to become an accelerated computing company. I can give you an example, there are many people manufacturing GPUs. I don't know which one came later, we invented GPUs, but you know we are not, we are not the only company manufacturing GPUs today, right? GPUs are everywhere, but they are not accelerated computing companies. Many people do this. Their accelerators can accelerate applications, but this is different from being an accelerated computing company. For example, a very specialized AI application, right, can be a very successful thing, right?

Brad Gerstner:

This is MTIA (Mata's next-generation AI accelerator chip).

Huang Renxun:

Yes. But it may not be the kind of impactful and capable company. So you have to decide what kind of person you want to be. All these different fields may have opportunities. But just like building a company, you have to pay attention to the changes in the ecosystem and what will become commoditized over time, understand what is function, what is product, yes, what is a company. Okay. What I just said, well, you can think about this question in many different ways.

xAI and the Memphis supercomputer cluster have entered the era of "0.2 million to 0.3 million GPU clusters"

Brad Gerstner:

Of course, there is a new entrant who is rich, intelligent, and ambitious. That is xAI. Yes, indeed. Moreover, there are reports that you and Larry Ellison (Oracle's founder) and Musk had dinner together. They convinced you to abandon 100,000 H100 chips. They went to Memphis and within a few months set up a large coherent super cluster.

Huang Renxun:

Three points, don't equate them, okay? Yes, I had dinner with them.

Brad Gerstner:

Do you think they have the ability to build this super cluster? There are rumors that they want another one hundred thousand H200, right, to expand the scale of this super cluster. First, let's talk about X and their ambitions and achievements, but at the same time, have we reached the era of 0.2 million to 0.3 million GPU clusters?

Huang Renxun:

The answer is yes. First, acknowledge the achievements. From the moment of concept to the datacenter being ready for Nvidia to install our equipment there, to us powering it up, connecting it, and conducting the first training, everything is worthwhile.

Huang Renxun:

Okay. So the first part is to build a huge factory in such a short time, with water cooling, electricity supply, permits obtained - what I mean is, it's like Superman. Yes, as far as I know, there's only one person in the world who can do this. I mean, Musk's understanding of large system engineering, construction, and resource allocation is unparalleled. Yes, it's truly incredible. Of course, his engineering team is also excellent. I mean, the software team is great, the network team is great, the infrastructure team is great. Musk has a deep understanding of this.

From the moment we decided to start planning with the engineering team, network team, infrastructure computing team, and software team, all the preparations were made in advance. Then all the infrastructure, all the logistics, the number of technical equipment brought in that day, video infrastructure, computing infrastructure, and all the technology needed for training, everything remained undecided for 19 days, what do you want? Done.

Stepping back, do you know how long 19 days are? Are 19 days a few weeks, right? If you see it in person, the amount of technology is unbelievable. All the wiring and networks, the network of NVIDIA equipment is very different from that of a hyperscale data center. Okay, how many wires does one node need? The back of the computers is full of wires, and integrating this heap of technology and all the software together is really incredible.

So, I think what Musk and the X team have done, I am very grateful for him acknowledging the engineering work, planning, and so on that we did with him. But what they have achieved is unique, never seen before. Just from this perspective. One hundred thousand GPUs, as a cluster, could easily become the fastest supercomputer on Earth. Supercomputers you build typically require three years of planning. Then they deliver the equipment, and it takes a year to get them all up and running. Yes, we're talking about 19 days.

Clark Tang:

What credit does NVIDIA deserve?

Huang Renxun:

Everything is running smoothly. Yes, of course, there are piles of X algorithms, X frameworks, X stacks, and so on. We say we have a lot of reverse integration to do, but the planning is excellent. Just pre-planning.

Large-scale distributed computing is an important direction for the future development of AI.

Brad Gerstner:

One end is correct. Musk is one end. Yes, but when you answer this question, you start by saying, Yes, here we have 20 to 0.3 million GPU clusters. Yes, correct. Can this scale to 0.5 million? Can it scale to 1 million? Does your product demand depend on it scaling to 2 million?

Huang Renxun:

The last part is negative. My feeling is that distributed training must be effective. My feeling is that distributed computing will be invented. Some form of federated learning and distributed computing, asynchronous distributed computing will be discovered.

I am very enthusiastic and optimistic about this, of course, to be aware that the scaling law used to be about pre-training. Now we have shifted to multimodal, we have shifted to synthetic data generation, post-training has now expanded incredibly. Synthetic data generation, reward systems, based on reinforcement learning, and now inference scaling has peaked. A model has gone through an incredible 10,000 internal reasoning steps before giving you an answer.

This may not be unreasonable. It may have already completed tree search. It may have already undergone reinforcement learning on this basis. It may, it may have done some simulation, certainly a lot of reflection, maybe looked up some data, checked some information, right? So his background might be quite large. What I mean is, this type of intelligence is. Well, this is what we do. This is what we do. Right? So, for the capability, this kind of expansion, I just did the calculation and compounded it with model size and computation size quadrupling each year.

On the other hand, demand continues to grow in terms of usage. Do we think we need millions of GPUs? No doubt about it. Yes, now this is certain. So the question is, how do we build it from a datacenter perspective? It is largely about whether it is once a few gigawatts or 250 megawatts. My feeling is, you will get both at the same time.

Clark Tang:

I always think analysts focus on current architectural bets, but one of the biggest takeaways from this conversation for me is that you're considering the entire ecosystem and many years into the future. So, as Nvidia is just expanding or scaling up, it's to meet future demands. This doesn't mean you can only rely on a world with half a million or even a million GPU clusters. When distributed training emerges, you will write software to implement it.

Huang Renxun:

We developed Megatron seven years ago. Yes, these extensions for large training tasks will happen. So we invented Megatron, so all ongoing model parallelism, all breakthroughs in distributed training, all batching, and all that stuff is because we did the early work, and now we're doing the early work for the next generation.

AI is changing the way we work.

Brad Gerstner:

So let's talk about strawberries and o1. I think naming them o1 is cool. It means recruiting the best and brightest in the world and bringing them to the USA. I know we're all passionate about this. So I love the idea of creating a model of thought that takes us to the next level of expanding intelligence, right, paying tribute to the fact that it's these people who came to the USA through immigration that made us who we are today, bringing their collective wisdom to America.

Huang Renxun:

Of course. There's also extraterrestrial intelligence.

Brad Gerstner:

Of course. This is led by our friend Noam Brown. The importance of reasoning time as a new carrier for expanding intelligence is separate from simply building larger models.

Huang Renxun:

This is a big deal. This is a big deal. I think a lot of intelligence cannot be completed a priori. Right. A lot of computations, even a lot of computations cannot be reordered. What I mean is, unordered execution can be prioritized, many things can only be completed at runtime.

So, whether you are thinking from a computer science perspective or an intelligent perspective, too many things require context. Environment, right. And the quality, the type of answer you are looking for. Sometimes, a quick answer is enough. It depends on the consequences and impact of the answer. It depends on the nature of the answer's utility. So, some answers, take a night, some answers need a week.

Yes. Right? So I can totally imagine sending a prompt to my AI, telling it to consider overnight. Don't tell me immediately. I want you to consider all night, and then tell me tomorrow. What is your best answer and reasoning for me. So I think from a product perspective, the quality now, the segmentation of intelligence. There will be one-time versions. Of course. Some will need only five minutes.

Right? And humans. So if you're willing, we will become a huge workforce. Some of them are digital humans in AI, some are biological humans, and I hope some will even be super robots.

Brad Gerstner:

From a business perspective, this is something that is seriously misunderstood. You just described a company whose output is equivalent to having a workforce of 0.15 million people, but you achieved it with only 0.05 million people. That's right. Now, you didn't say I want to fire all employees. No. You are still increasing the number of employees in the organization, but the output of the organization will increase significantly.

Huang Renxun:

This is often misunderstood. AI is not me. AI will not change every job. AI will have a huge impact on how people work. Let's admit this. AI has the potential to bring incredible benefits. It also has the potential to cause harm. We must build safe AI. Yes, let's lay this foundation. Yes. Good.

Huang Renxun:

The part that people overlook is that when companies use AI to improve productivity, it is likely to result in better earnings or better growth, or both. When this happens, the next email from the CEO is most likely not about layoffs.

Brad Gerstner:

Of course it's an announcement, because you are growing.

Huang Renxun:

The reason is that we have more ideas, we can explore, we need people to help us carefully consider before automation. So in terms of automation, AI can help us achieve that. Obviously, it will also help us think, but we still need to figure out what problems I want to solve. There are tens of trillions of problems we can solve. So, the company needs to identify what problems to solve, choose these ideas, and find ways to automate and scale. Therefore, as we become more productive, we will hire more people. People forget that, if you go back in time, obviously we have more ideas today than 200 years ago. That's why GDP is bigger, and there are more jobs. Even though we are frantically automating at the base.

Brad Gerstner:

This is a very important point of this period, we are entering a time where almost all human productivity, almost all human prosperity is a byproduct of automation. The past 200 years of technology. What I mean is, you can look at the creative destruction of Adam Smith and Joseph Schumpeter, you can look at the chart of per capita GDP growth over the past 200 years, now it's accelerating.

Yes, this makes me think of this issue. If you look at the 90s, the United States' productivity growth rate was about 2.5% to 3% per year, right? Then in 2010, it slowed down to about 1.8%. And the past decade has been the slowest decade of productivity growth. So this is our fixed amount of labor and capital or output, it has actually been the slowest on record.

Many people are debating this reason. But if the world is really as you describe it, that we are harnessing and creating intelligence, are we on the verge of a sharp expansion in human productivity?

Huang Renxun:

This is our hope. This is our hope. Of course, we live in this world, so we have direct evidence.

We have direct evidence that either isolated cases or individual researchers who can use AI to explore science on an unimaginably large scale. That's productivity. One hundred percent measure productivity, or we're designing such incredible chips at such a high speed. The complexity of chips and computer complexity we are building is growing exponentially, and the company's employee base is not the standard for measuring productivity, right.

The software we are developing is getting better and better because we are using AI and supercomputers to assist us. The number of employees is almost linearly increasing. Another manifestation of productivity.

So, I can delve into research, I can sample and review many different industries. I can inspect personally. Yes, you're right. Business. Exactly.

So I can, of course, you can't, we can't, we may overfit. But its art is of course to summarize what we observe and whether it will be reflected in other industries.

Without a doubt, AI is the most valuable commodity known in the world. Now we are going to mass produce it. We, we, all of us must excel, what happens if you are surrounded by these AIs, they perform so well, much better than you. When I think back, this is my life. I have 60 direct subordinates.

They are world-class in their own fields, and they do better than me. Much better than me. I have no difficulty interacting with them, and I can effortlessly design them. I can also effortlessly program them. So I think what people need to learn is that they will all become CEOs.

They will all become CEOs of AI agents. They have the ability to be creative, uh, some knowledge, and how to reason, how to break down problems, so you can program these AIs to help you achieve goals like mine. That is running a company.

AI security requires multi-party efforts

Brad Gerstner:

Now. You mentioned something, which is the lack of coordination, safe AI. You mentioned the tragedy unfolding in the Middle East. We have a lot of autonomy, and a lot of AI is being used around the world. So let's talk about bad actors, secure AI, and coordination with Washington. How do you feel today? Are we on the right path? Do we have enough level of coordination? I think Mark Zuckerberg once said that the way to defeat bad AI is to make good AI better. How would you describe your view on how we ensure a positive net benefit for humanity, rather than living in this dystopian world.

Huang Renxun:

Discussion about security is indeed very important and good. Yes, abstract views, the conceptual view of AI as a huge neural network, is not as good, right. Okay. The reason being, as we all know, AI and large language models are related but not the same thing. I think a lot of things are being done very well. First, open source models so that the entire research community, every industry, and every company can be involved in AI and learn how to apply this capability. Very good.

Second, people underestimate the number of technical efforts dedicated to inventing AI to ensure AI safety. Yes, AI can process data, carry information, train, and create. AI is created to coordinate AI, generate synthetic data to expand AI's knowledge, and reduce its illusions. All are being created for vectorizing or graphifying or any other informing AI, protecting AI to monitor other AI AI systems, and the safe AI created by these AI systems is being praised, right?

Brad Gerstner:

So we've established.

Huang Renxun:

So. We are building all this. Yes, in the entire industry, methodologies, red teams, processes, model cards, evaluation systems, benchmarking systems, all of these, all of these are being built at an incredible pace. I wonder, celebrate. Do you understand? Yes, you know.

Brad Gerstner:

And, no, no, no government regulations say you have to do this. Yes, today participants building these AIs in this field are taking these key issues seriously and coordinating around best practices. Exactly.

Huang Renxun:

So this has not been fully appreciated, nor fully understood. Yes. Someone needs to, everyone needs to start talking about AI, it's an AI system, it's an engineering system, it's carefully designed, built from first principles, rigorously tested, and so on. Remember, AI is a capability that can be applied. I don't think it's necessary to regulate important technologies, but also not to over-regulate, so some regulation must be for the majority of applications. All the different ecosystems that already regulate technological applications must now regulate the technology applications integrated with AI.

So, I think, don't misunderstand, don't overlook the plethora of regulations that must be launched worldwide for AI. Don't rely on just one cosmic galaxy. An AI committee may be able to do this, as the establishment of all these different institutions serves a purpose. The establishment of all these different regulatory bodies serves a purpose. Going back to the original principle, I would.

The opposition between open source and closed source is wrong

Brad Gerstner:

You have launched a very important, very large, and very powerful open-source model.

Huang Renxun:

Nemotron.

Brad Gerstner:

Yes, it is clear that Meta has made a significant contribution to open source. I find that when I read Twitter, there are many discussions about open and closed. How do you view open source, your own open-source model, and if it can keep up with the forefront? This is the first question. The second question is, you know, having open-source and closed-source models, do they drive business operations, is this your view of the future? Do these two things create healthy tension for security?

Huang Renxun:

Open source and closed source are related to security, but not only security. For example, having closed-source models is absolutely not wrong, they are the engine of the economic model necessary to sustain innovation. Well, I completely agree with this. I think the opposition between closed and open is wrong.

Because openness is a necessary condition for activating many industries, now, if we don't have open source, how can all these different scientific fields be activated, activating AI? Because they must develop AI specific to their own areas, they must use open-source models to develop their AI, creating AI for specific areas. They are related, not, I repeat, not the same. Just because you have an open-source model does not mean you have AI. So you must have that open-source model to create AI. So financial services, medical care, transportation, these industries, scientific fields, are now able to be realized because of open source.

Brad Gerstner:

Unbelievable. Have you seen a huge demand for your open-source models?

Huang Renxun:

Our open-source model? First. Llama downloads. Obviously, yes, the work that Mark and the team have done is amazing. Beyond imagination. Yes. It has fully activated and engaged every industry, every scientific field.

Sure. The reason we do Nemotron is to generate synthetic data. Intuitively, an AI will somehow sit there in a loop and generate data to learn itself. This sounds fragile. How many times can you go around this infinite loop, this loop is questionable. However, the image in my mind is a bit like locking a super intelligent person in a padded room, closing the door for about a month, what comes out may not be a smarter person. So, so, but you can have two or three people sitting together, we have different AIs, we have different knowledge distributions, we can carry out quality assurance back and forth. The three of us can all become smarter.

Therefore, you can let AI models exchange, interact, pass back and forth, discuss reinforcement learning, synthetic data generation, etc., this idea makes intuitive sense, can propose suggestions and make sense. Therefore, our Nemotron 350B model is the best reward system model in the world. Therefore, this is the best critique.

Interesting. This is a great model that can enhance other models. Therefore, no matter how good other people's models are, I would recommend using Nemotron 340B to enhance and improve them. We have seen Llama get better, making all other models better.

Brad Gerstner:

As someone who delivered the DGX1 in 2016, this has truly been an incredible journey. Your journey is both incredible and unbelievable. It's truly remarkable how you survived in the early days. You delivered the first DGX1 in 2016. We ushered in the Cambrian moment in 2022.

So let me ask you a question I often wonder about – how long can you continue your current job with 60 direct reports? You are everywhere. You are driving this revolution. Are you having fun? Is there something else you would rather be doing?

Huang Renxun:

This is a question about the past hour and a half. The answer is "I really enjoy it." Great times. I can't imagine doing anything else I'd rather be doing. Let's see. I think we shouldn't leave people with the impression that our work is always fun. My work is not always fun, and I don't expect it to be always fun. Did I once expect it to be fun all the time? I think it's always important.

Yes, I don't take myself too seriously. I take my job very seriously. I take our responsibilities very seriously. I take our contributions and our moments very seriously.

Is it always fun? No. But have I always enjoyed it? Yes. Like all things, whether it's family, friends, or kids. Is it always fun? No. Do we always love it? Absolutely.

So I think, how long can I do this? The real question is, how long can I stay relevant? That's the most important thing, and the answer to that question will only be how will I keep on learning? Today I'm more optimistic. I say this not just because of our topic today. I'm more optimistic about my relevance and ability to keep learning because of AI. I use it every day, I don't know, but I believe you all use it too. I use it almost every day.

I don't have a study that doesn't involve AI. Yes, not a question, even if I know the answer, I'll check it with AI multiple times. Yes, surprisingly, the next two or three questions I ask reveal some things I didn't know. You choose your topics. You choose your topics. I think AI is a teacher.

AI is an assistant, AI is a partner, can brainstorm with me, check my work, guys, this is completely revolutionary. I am an information worker. I output information. So I think their contribution to society is remarkable. So I think, if that's the case, if I can maintain this relevance and continue to contribute, I know this job is important enough, yes, I want to keep pursuing it, my quality of life is incredible. So I will.

Brad Gerstner:

I can't imagine that you and I have been working in this field for decades, I can't imagine missing this moment. This is the most important moment of our careers. We are very grateful for this partnership.

Huang Renxun:

Looking forward to the next ten years.

Brad Gerstner:

Thought partnership. Yes, you make things smarter. Thank you. I think you as part of the leadership are really important, right, this will optimistically and safely lead everything forward. So thank you.

Huang Renxun:

Being with you all. I am really happy. Really. Thank you.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment