share_log

扎克伯格最新采访:Meta最强开源模型Llama 3凭什么值百亿美金

Zuckerberg's latest interview: Why is Llama 3, Meta's strongest open source model, worth 10 billion US dollars

騰訊科技 ·  Apr 19 21:19

Source: Tencent Technology

On April 19, according to foreign media reports, on Thursday local time in the US, Facebook parent company$Meta Platforms (META.US) $It launched Llama 3, its most powerful open source artificial intelligence (AI) model to date, with the intention of catching up with the leader OpenAI in fierce industry competition. The Llama 3 model released this time includes two versions with 8 billion and 70 billion parameters, and a top version with more than 400 billion parameters will be launched in the future, highlighting Meta's ambitions in the field of AI.

It is reported that Llama 3 has shown excellent performance in a number of industry benchmarks, and has added many features, such as improved reasoning ability. Meta plans to deeply integrate Llama 3 into its virtual assistant Meta AI. This assistant has been widely used in popular apps such as Facebook, Instagram, WhatsApp, and Messenger, and is about to usher in a new round of updates to bring users a smarter and more convenient experience.

In addition, Meta also announced that Llama 3 will soon be launched on platforms such as Amazon AWS, Google Cloud, IBM's cloud platform WatsonX, Microsoft Cloud Azure, and Nvidia's Nim, and is supported by hardware giants such as AMD, Dell, Intel, and Nvidia. This series of collaborations and integrations will undoubtedly further accelerate the spread and application of Llama 3 worldwide.

On the big day of Meta's launch of Llama 3, the company's CEO Mark Zuckerberg (Mark Zuckerberg) was interviewed by well-known tech podcast host Dwarkesh Patel (Dwarkesh Patel). They had in-depth discussions on topics such as Llama 3, general artificial intelligence (AGI), energy bottlenecks, the strategic significance of artificial intelligence technology, the potential risks of open source, and the metaverse (metaverse). At the same time, Zuckerberg also shared the decision-making process for the open source $10 billion model and custom chip source code.

The following is a transcript of this interview:

1. Llama 3 top version is in training

Patel: Mark, it's a great honor to invite you to our podcast.

Zuckerberg: Thanks for the invitation, Patel. I'm so happy to be here, I've always loved your podcast.

Patel: Great, thank you! Now let's talk about Llama 3 first! Please share with me some highlights and exciting new developments about this latest big model and Meta AI.

Zuckerberg: I think most people are probably more concerned about the new version of Meta AI, but in reality, our efforts to upgrade the model are our top priority. We're launching Llama 3. We're not only providing it to the developer community as an open source project, but we're also using it to support Meta AI. I'm sure we'll have lots of interesting things to talk about about Llama 3. But I think the most important thing is that now we believe Meta AI is the smartest and most freely available AI assistant, and people can use it anytime, anywhere.

Additionally, we have integrated real-time knowledge from Google and Bing to enable AI assistants to provide more accurate and comprehensive information. We plan to make it more prominent in our apps, like at the top of Facebook and Messenger, where you'll be able to ask any questions directly using the search box. In addition to these, we've also added some new creative features, which I think are really cool, and I'm sure everyone will love them.

Especially the animation feature, you can easily animate any image, which is really fun. One amazing feature here is that it can generate and update high-quality images in real time as you type. You only need to enter the content of your query, such as “Show me a scene of eating macadamias and drinking beer in the field, with cows and mountains in the background,” and it will update the image in real time based on your input. This experience is simply amazing. I'm sure everyone would love this feature.

These are some of the obvious changes most people will see. We're rolling out these new features gradually, and while they aren't currently available globally, we'll start with some countries and gradually expand over the next few weeks and months.

I think this is going to be a huge breakthrough, and I'm excited to let everyone experience it. But if you want to get to the bottom of it, Llama 3 is definitely our most technically interesting part. We are training three Llama 3 models of different sizes, including 8 billion parameter, 70 billion parameter, and 405 billion parameter versions.

Currently, the first two versions are ready, and the largest model is still being trained. Although we can't immediately release the 405 billion parameter version today, I'm confident about the performance of the 8 billion and 70 billion parameter models. They are all industry-leading in terms of size, and we'll be publishing our benchmark results in detail in blog posts, so everyone can get an in-depth understanding of their performance.

Of course, Llama 3 is open source, which means developers will have the chance to try it out for themselves and explore its potential. We also have a series of carefully planned release roadmaps that will bring features such as multi-modality, more language support, and a longer context window (Context Window, which refers to the range of text fragments that the language model takes into account when generating text). It is expected that later this year we will launch that exciting 405 billion parameter version. Based on current training progress, its MMLU (Cross-Modal Learning Understanding) score is close to 85, and we expect it to show excellent performance in many benchmarks.

As for the model with 70 billion parameters, it performed just as well. Today we officially released it, it has an MMLU score of around 82, and has achieved great results in math and reasoning. I believe it will be very interesting and meaningful for users to experience this model.

I want to emphasize that even the 8 billion parameter model's performance is almost comparable to the Llama-2 version we released before. This means that even the “smallest” Llama-3 is almost as powerful in terms of functionality as the “biggest” Llama-2.

Patel: Before we dive deeper into these models, I'd like to go back in history. I remember that in 2022, Meta faced a sharp drop in stock prices, when people were puzzled about your large investment in purchasing Nvidia H100 chips. The concept of the metaverse was not widely recognized by the market. I'm guessing you made the decision to invest in H100 at the time based on what considerations? How did you anticipate the demand for these GPUs?

Zuckerberg: I think we were in the development phase of the Reels project at the time. We have always believed in reserving sufficient capacity to handle unforeseen innovations, and the Reels project is one such example. We discovered that in order to train the model, we needed more GPUs. This is a huge shift because our service is no longer simply arranging content based on people or pages you follow, but is now strongly recommending so-called “unconnected content” — content from people or pages you don't follow.

As a result, the set of content candidates we may be presenting has surged from thousands to hundreds of millions. Naturally, this requires a brand-new infrastructure to support it. We're already building that infrastructure, but in keeping up with TikTok, we've run into bottlenecks and can't meet our expectations quickly. Seeing this situation, I realized, “We have to make sure we don't fall into this passive situation. So not only did we order enough GPUs to do the Reels and content sorting work, but we also doubled our orders.” We have always adhered to the principle that there will always be something new in the future that we cannot foresee, and we must be prepared for it.

Patel: Did you know it would be artificial intelligence?

Zuckerberg: We originally thought this should have something to do with training big models. But then I realized it was more closely related to content recommendations. Running a company is like playing a game; new challenges always arise. At the time, I put my heart and soul into developing Reels and other content recommendation features, hoping they would play a huge role. Today, Instagram and Facebook are able to show users content they're interested in, even if that content comes from people they haven't followed, which is certainly a huge leap forward. Looking back, that decision was certainly wise; it stemmed from the lessons we had left behind. That's not to say we were “far ahead”; in fact, many of the decisions seem right now because we've made mistakes and learned from them.

Patel: In 2006, you rejected the $1 billion takeover proposal, but I think there should be a price. You'd consider selling Facebook, right? Do you have an estimate in mind that you think, “This is the real value of Facebook, and they haven't given this price?” I know if you were offered $5 trillion, you'd be happy to accept it. So what do you think of this decision, and what kind of considerations was it based on?

Zuckerberg: I think it's mostly a matter of personal choice. Looking back then, I'm not sure if I was mature enough to make this kind of decision. Many people around here are discussing the $1 billion price, and they analyze it based on various reasons, such as expected revenue and size. But these are all far beyond the stage we were in at the time. To be honest, I didn't have enough financial knowledge to participate in this kind of discussion at the time, but deep down I had a firm belief in what we were doing.

I've also done simple analyses like “If I don't do this, what would I do? Actually, I love creating new things, helping people communicate, and understanding people's dynamics and human interactions. So, I think if I sell the company, I'll probably start another similar company, and I'm quite satisfied with the current company. So why sell it?” I think many of the big decisions people make are actually based on our beliefs and values. In fact, it's very difficult to accurately predict the future through analysis.

2. The road to AGI

Patel: Facebook Artificial Intelligence Research Institute (FAIR) has been around for a long time, and now it seems deeply embedded in the core of your company. I'd like to ask when building general artificial intelligence (AGI) or the ambitious goal you are pursuing became Meta's top priority?

Zuckerberg: Actually, this transformation has been happening quietly for a while. About 10 years ago, we founded FAIR. The original intention at the time was that many innovations would emerge on the way to general artificial intelligence or other similar goals, and these innovations would continue to drive the progress of our various businesses. Therefore, we did not conceive FAIR as an independent product, but formed it as a research team. Over the past 10 years, FAIR has created many unique results, bringing significant improvements to all of our products. It has driven development in many fields and inspired other innovators in these fields, and as a result, it has created more technology to improve our products. It really excites me.

In recent years, with the rise of ChatGPT and the emergence of diffusion models in the field of image creation, we have clearly felt a huge wave of change. These new technologies are amazing, and they will profoundly impact how people interact with apps. Therefore, we decided to form a second team — the General Artificial Intelligence Team — to incorporate these cutting-edge technologies into our products and build leading foundational models that can support all of our different products.

When we began this exploration, our initial idea was that many of the things we do have a strong social nature. It helps people interact with creators, helps people communicate with businesses, and helps businesses sell products or provide customer service. Additionally, it can be integrated into our apps, smart glasses, and virtual reality as an intelligent assistant. As a result, we weren't entirely sure at first whether a full general artificial intelligence would be needed to support these use cases. However, as we worked deeper into these subtleties, I gradually realized that support for general artificial intelligence is actually essential. For example, when developing Llama-2, we didn't prioritize coding features because people didn't ask Meta AI a ton of coding questions on WhatsApp.

Patel: Will they now?

Zuckerberg: I don't know, and I'm not sure if WhatsApp, Facebook, or Instagram will be an interface for users to ask lots of coding questions. Perhaps on our upcoming Meta.ai website, coding issues will be more common. However, over the past 18 months, we were surprised to learn that coding actually plays a critical role in so many fields, not just in the programming industry. Even if users don't directly ask coding-related questions, coding training models can help them answer questions more accurately and show excellent ability in reasoning in different fields. Taking Llama-3 as an example, we're focusing on optimizing it through extensive coding training, as this will make it perform well in every aspect, even if the user's primary focus isn't on coding issues.

The ability to reason is another excellent example. Imagine when you're talking to a creator, or trying to interact with a customer as a business, that interaction is far from a simple “you send a message, I reply” model. It involves a multi-step, deep thought process that requires us to think “How can we better achieve this person's goals?” Too often, customers aren't sure what they really need or how to ask questions exactly. So simply answering questions isn't the whole job of artificial intelligence. We need to think more comprehensively and in depth; this has actually turned into a question of reasoning. If a team makes a major breakthrough in reasoning and we're still in the basic chatbot stage, then our product will be overshadowed by what other teams have built. Ultimately, we realized that in order to stay ahead, we had to do our best to solve the problem of general intelligence, so we increased our bets and investments to ensure this breakthrough.

Patel: So, is the Llama version that solves all of these user use cases powerful enough to replace all the programmers in this building?

Zuckerberg: I think these technologies will gradually mature and show great potential over time. However, it is a complicated question as to whether Llama-10 or future versions will completely replace programmers. I don't think we're trying to replace humans, but rather hope to use these tools to empower people to do more jobs that were previously unimaginable.

Patel: Assuming our programmers will work 10 times more efficiently after using Llama-10 in the future?

Zuckerberg: I have high expectations for this. I am convinced that human intelligence isn't measured by a single standard, because everyone has unique skills and talents. At some point, artificial intelligence may surpass the capabilities of most humans in some ways, but it all depends on how powerful the model is. However, I think this is a gradual evolution process, and general artificial intelligence is not something that can be achieved overnight. We're actually gradually adding different capabilities to the model.

Currently, multimodality is our area of focus, from initial photos, images, and text to video in the future. Given our strong interest in the metaverse, 3D technology is also particularly important. Additionally, one modality I'm particularly interested in is emotional understanding, an area I rarely see other teams in the industry deeply research. After all, most of the functions of the human brain are dedicated to understanding others and interpreting expressions and emotions. I am convinced that if we can make a breakthrough in this area and enable artificial intelligence to truly understand and express emotions, then human-machine interaction will become more natural and deep than ever before.

You might think this is just a video or image category, but in reality, they're a very professional version of human emotion. Therefore, in addition to improving the model's ability to reason and remember, we also need to focus on many other different abilities. I believe that in the future, we won't be satisfied with just entering questions into a query window to find answers. We will have different memory storage methods or customized models, and these models will serve people in a more personalized way. These are all the different abilities AI needs to develop. Of course, we also need to address the size of the model. We care about both large models and how to run small models in limited space. For example, if you're running a large service like Meta AI, then it mainly relies on strong computing power on the server side. However, we also expect these advanced technologies to be incorporated into compact devices, such as smart glasses. Since smart glasses have very limited space, we need to develop an efficient and lightweight solution to adapt to this environment.

Patel: If we were to invest $10 billion, or eventually up to $100 billion, to implement intelligent reasoning on an industrial scale, what specific use cases would that money be used for? Is it analog technology? Or an artificial intelligence application in the metaverse? How can we effectively use data centers to support these use cases?

Zuckerberg: According to our predictions, intelligent reasoning will profoundly change almost all product forms. I think we'll see the advent of a Meta AI general assistant product in the future. This product will gradually evolve from traditional chatbots, from simply answering questions to being able to receive and perform more complex tasks. This will require a great deal of reasoning ability, and will also trigger a huge demand for computational power.

Furthermore, interaction with other intelligent subjects (agents, refers to the intelligent abilities and behavioral performance of artificial intelligence systems, including perception, cognition, reasoning, decision-making, and action, so as to play a leading role in the human-computer interaction environment and achieve intelligent interaction with humans) will also become an important part of our work, whether serving enterprises or creators. I think humans won't just interact with one generic artificial intelligence; every business will want to have artificial intelligence that represents its interests. These artificial intelligence will not be used primarily to sell competitors' products, but rather interact with businesses, creators, and consumers in unique ways.

It is particularly worth mentioning that creators will be an important group to benefit from this technology. With around 200 million creators on our platform, they generally don't feel like they have enough time every day, and their community is eager to interact with them. If we could develop a technology that allows creators to train their artificial intelligence and use it to stay engaged with the community, that would be a very powerful feature.

These are just some of the consumer use cases. Take the Chan-Zuckerberg Foundation, which my wife and I run, for example. We are doing a lot of work in the field of science, and artificial intelligence will undoubtedly play a key role in advancing science, healthcare, etc. Ultimately, intelligent reasoning will impact almost every product and economic sector.

Patel: You mentioned artificial intelligence that can perform multi-step tasks, which makes me curious; does that mean we need a larger model to achieve this functionality? For example, for Llama-4, do we need a version with 70 billion parameters that can show amazing abilities just by training on the right data? At present, what are the main aspects of our progress? Is it an increase in the size of the model? Or, as you said before, keep the model size the same, but have more diverse features and application scenarios?

Zuckerberg: We probably don't have a clear answer to this question right now. But one clear trend I've observed is that we have a basic Llama model and then build some application-specific code around it. Some of this code is fine tuning for specific use cases, but others are logic about how Meta AI can collaborate with tools like Google, Bing, etc. to obtain real-time knowledge, which is not part of Llama's basic model. During the development of Llama-2, we tried to incorporate some of these features into the model, but more by hand. For Llama-3, we set a goal of embedding more of this kind of functionality into the model itself. As we begin to explore more intelligent subject-like behaviors, I think some of these features still need to be optimized by hand. For Llama-4, our goal is to naturally incorporate more of these features into the model.

As you progress every step of the way, you can sense where the future is likely to go. We started experimenting with all kinds of possibilities and experimenting around the model. This helps us better understand what features should be included in the next version of the model. In this way, our model can become more versatile, because apparently any feature implemented by manual coding, while unlocking some use cases, is inherently fragile and not generic enough. Our goal is to enable the model to learn and evolve on its own to adapt to various complex and changing scenarios.

Patel: What you mentioned about “incorporating more content into the model itself” can you explain in detail how you incorporated these desired features into the model through training? What exactly do you mean by “incorporating the model itself”?

Zuckerberg: Take Llama-2 as an example. Its tools use relatively specific and limited features. And when it came to Llama-3, we were happy to see that its ability to use tools has improved significantly. Now we don't have to manually code everything to be able to search with Google; it's already capable of doing these tasks independently. Similarly, Llama-3 has demonstrated outstanding abilities in programming, running code, and a range of other tasks. Once we have this ability, we can anticipate what new possibilities we can start exploring next. We don't have to wait until Llama-4 comes out to start building these abilities, so we can do all kinds of experiments and experiments around it ahead of time. While these hand-coded processes may temporarily make the product better, they also point us in the direction of what we should build in the next version of the model.

Patel: What use cases are you most looking forward to seeing in the open source community's fine-tuning of Llama-3? Maybe not the one that is most useful to you, but the one you are most interested in and want to try. For example, I've heard that some people have fine-tuned aspects of ancient history so that we can have direct conversations with historical figures such as the ancient Roman poet Virgil (Virgil).

Zuckerberg: I think the beauty of this kind of thing is that it always surprises us. Any specific use case we think is valuable is likely to be tried and built. I'm sure we'll see more stripped-down versions of the models appearing. I'm also looking forward to seeing a model with fewer parameters and a lighter weight model, such as a model with only 1 billion to 2 billion parameters, or even a model with 500 million parameters, to see what interesting and efficient applications they can bring. If an 8 billion parameter model is almost as powerful as the biggest Llama-2 model, then a 1 billion parameter model should also be able to show its unique value in some fields. They can be used for classification tasks or for pre-processing before people understand user query intent and pass it on to more powerful models for accurate processing. I think this will be an area where the community can play a huge role in helping us fill the gaps in the application of these models. Of course, we are also considering streamlining and optimizing these models, but currently all of our GPU resources are mainly used to train models with 405 billion parameters.

Patel: The number of GPUs you mentioned before, I remember you said it would reach 350,000 by the end of the year.

Zuckerberg: Yes, that's our overall goal. At present, we have set up two large GPU clusters, each with about 22,000 to 24,000 GPUs, which are mainly used to train large models. Of course, these clusters are also responsible for many other important training tasks for our company, such as the Reels model, Facebook news feed, and Instagram feed training. Reasoning is really a huge challenge for us because we need to serve a huge user base. Compared to other companies doing similar work, we probably require a much higher ratio of inference computation to training computation, mainly because of the sheer size of the community we serve.

Patel: I noticed that in the material you shared earlier, one very remarkable point is that the amount of data you used when training the model actually exceeds the amount of optimal data used only for training. Given how important reasoning is to you and the community at large, it really makes sense to have a model with trillions of tokens.

Zuckerberg: Regarding the 70 billion parameter model, we observed an interesting phenomenon. I originally thought that as the amount of data increased, the performance improvement of the model would gradually become saturated. However, after training around 15 trillion tokens, we discovered that the model is still learning. Even in the final stages of training, it still showed strong learning abilities. We may be able to continue entering more tokens into it to further improve its performance.

But as the company's operator, we need to make a decision at some point: should GPU resources continue to be used to further train this 70 billion parameter model? Or should we move in a different direction, like start testing new hypotheses for Llama-4? We need to find a balance between the two. Currently, I think we've achieved a good balance in this version of the 70 billion parameter model. Of course, in the future, we will launch other versions, such as the multi-modal version with 70 billion parameters, which we will see for some time to come. But one very fascinating thing is that the current model architecture can accommodate such a huge amount of data.

3. Energy bottlenecks

Patel: That's really thought provoking. So what does this mean for future models? You mentioned earlier that the 8 billion parameter version of Llama-3 surpassed Llama-2 with 70 billion parameters in some ways.

Zuckerberg: No, no, I don't want to exaggerate. Their performance is actually quite close; they are very similar on an order of magnitude.

Patel: So, can we expect the 70 billion parameter version of Llama-4 to be comparable to the 405 billion parameter version of Llama-3? What will the future development trend be like?

Zuckerberg: That's really a big problem. Honestly, no one can predict that accurately. One of the hardest things to predict in the world is the trend of exponential growth. How long will it last? I am convinced that we will continue to move forward. I think it's worth investing $10 billion, or even $100 billion or more, to build infrastructure. Assuming this growth trend continues, we'll have some truly impressive results to create amazing products. But no one in the industry can tell you for sure that it will continue to expand at that rate. Historically, we have always encountered development bottlenecks at some point. But today, expectations are high in this field, and perhaps these bottlenecks will soon be overcome. This is indeed a question worth pondering.

Patel: What would the world look like without these bottlenecks? As unlikely as this may seem, what if technological progress can really continue to advance at this rate?

Zuckerberg: Either way, there will always be new challenges and bottlenecks. GPU production has been an obvious issue for the past few years. Even for companies that have the money to buy GPUs, it's often difficult to get the quantity they need because supply is limited. However, the situation appears to be gradually improving. Today, we're seeing more and more companies considering investing heavily to build infrastructure to produce GPUs. I think this will continue for a while.

Furthermore, capital investment is also an issue to consider. When is it no longer cost-effective to invest more capital? In fact, I think energy issues will come first before we have capital investment problems. As far as I know, no one has yet been able to build a single gigawatt training cluster. We'll run into things that will become increasingly difficult around the world, such as obtaining energy permits. This isn't just a software issue; it involves strict government regulation, which I think is stricter than many of us in the tech world feel. Of course, if you're starting out with a small company, this probably isn't that strong of a feeling. But when we deal with different government departments and regulators, we need to abide by a number of rules and ensure that we are compliant globally. But there is no doubt that the energy aspect will be one of the major limitations we face.

If you're talking about building a large new power plant or large building and need to span other private or public land to build transmission lines, then this is a highly regulated project. What you need to consider is a multi-year lead time. If we wanted to build a huge facility, powering it would be a long and complex project. I'm sure people will work hard to achieve this goal, but I don't think it will be as simple and amazing as reaching some level of artificial intelligence, getting a lot of capital, and then suddenly the model will advance by leaps and bounds.

Patel: On the way to advancing the development of artificial intelligence, will we encounter some bottlenecks that even companies like Meta cannot overcome alone? Are there certain projects that even companies like Meta don't have enough resources to complete? Even if your R&D budget or capital expenditure budget increases tenfold, you still can't implement it? Is this what you think, but given Meta right now, you can't even raise enough capital by issuing stocks or bonds?

Zuckerberg: The energy issue is certainly one of the major challenges. I am convinced that if we can solve the energy supply problem, it is entirely possible that we can build a larger computing power cluster than we have today.

Patel: So, is this essentially a funding bottleneck?

Zuckerberg: Funding is certainly one aspect, but I think time is also a factor that cannot be ignored. Currently, many data centers are around 50 megawatts to 100 megawatts, and large ones may reach 150 megawatts. Assuming you have a complete data center with all the necessary training equipment, you've built the largest cluster that current technology allows. I think a lot of companies are close to or have reached this level. But when we talk about building 300 megawatts, 500 megawatts, or even 1 gigawatt data centers, the situation is completely different. Currently, no one has tried to build a 1 gigawatt data center. I'm sure this will be possible, it just takes time to accumulate. However, this won't happen next year, as many of the things involved will take years to complete. From this perspective, I think a 1 gigawatt data center would require an energy supply equivalent to a nuclear power plant to support model training.

Patel: Has Amazon tried this yet? They appear to have a 950 megawatt facility.

Zuckerberg: I don't really understand Amazon's specific practices; you might need to ask them directly.

Patel: Training doesn't have to be limited to a single location, right? If distributed training is effective, then we can actually consider spreading it across multiple locations.

Zuckerberg: I think this is a very important question about how to train large models in the future. Judging from the current development trend, generating synthetic data through inference and then using this data for model training seems to be a promising direction. Although I still don't know what the ratio between this synthetic data and direct training will be, I believe the generation of synthetic data is getting closer to the inference process to some extent. Obviously, if this approach is used to train models, then it will be an integral part of the overall training process.

Patel: So, it's still an open question about how to find that balance and where it will go in the future. So is it possible that this trend will be implemented on Llama-3, or even Llama-4 and later? In other words, if you publish models, entities with powerful computing power, such as Kuwait or the UAE, they can use such models to make certain applications more intelligent.

Zuckerberg: I totally agree with that possibility. Indeed, I think there will be such a dynamic development in the future. But at the same time, I also think the model architecture itself has some fundamental limitations. Take Llama-3 as an example. Although we have made significant progress, I believe there is room for further optimization of its architecture. As I said before, we feel that the model's performance can continue to improve by providing more data or iterating on some of the key steps.

In fact, we've seen many companies build new models based on Llama-2's 70 billion parameter model architecture. However, for models such as Llama-3's 70 billion or 405 billion parameters, it is not easy to make intergenerational improvements, and no similar open source model has yet appeared. I think this is a huge challenge, but it's also a huge opportunity. However, I still believe that what people can build based on the existing model architecture isn't infinitely scalable. Until we reach the next technological leap, we may only be able to make a few optimizations and improvements on what we already have.

4. Will AI get out of control overnight?

Patel: Now let's take a broader perspective. How do you think artificial intelligence technology will develop in the next few decades? Does it make you feel like another technology, like the metaverse or social technology, or do you think it's fundamentally different in human history?

Zuckerberg: I think artificial intelligence will be a very basic technology. It's more like the invention of computers, and will spawn a whole new set of applications. Just like the advent of the internet or mobile phones, making many things that were impossible before possible, people are beginning to rethink these experiences. So I think AI will bring about similar changes, but it's a deeper innovation. My feeling is that it's like a shift from not having a computer to having a computer. However, it is really difficult to predict exactly how it will develop. Judging from a longer cosmic time span, this transformation will occur very soon, possibly within a few decades. Some people do worry that it will quickly get out of control and go from a certain level of intelligence to extremely smart overnight. But I think this is unlikely to happen due to many physical limitations. I don't think we're going to face a situation where artificial intelligence gets out of control overnight. I'm sure we'll have plenty of time to get used to it. But artificial intelligence will really change the way we work, providing people with innovative tools to do different things. It will give people more freedom to pursue what they really want to do.

Patel: Maybe it wasn't overnight, but from a cosmic time perspective, do you think we can look at these milestones that way? Humans evolved, then artificial intelligence appeared, and then they might go to the Milky Way. This could take decades, or maybe a century, but is this a grand plan happening in your eyes? I'm referring to other technologies like computers or even fire, but is the development of artificial intelligence itself as important as the initial evolution of humans?

Zuckerberg: I think that's hard to judge. Human history is essentially a process of coming to terms with the gradual realization that we are not unique in some ways, yet at the same time that humans are still very special. We thought the Earth was the center of the universe, but that wasn't true, yet humans still have extraordinary characteristics, right? I think people often have another prejudice that intelligence and life are closely linked to one degree or another, but that's not true. We don't have a clear enough definition of consciousness or life to fully understand this issue. There are many science fiction novels that describe the creation of intelligent life, and these intelligences are beginning to show various human-like behaviors, etc. But current trends seem to suggest that intelligence can exist quite independently of consciousness, activism, and other traits, making it a very valuable tool.

5. The dangers of open source

Zuckerberg: Predicting where these things will evolve over time is extremely challenging, so I think anyone should avoid dogmatic planning for their development or use. Every time we release a new product, we need to re-evaluate it. We're very supportive of open source, but that doesn't mean we make all of our work public. I tend to think that open source is good for the community and ourselves because it will promote innovation. However, if at some point the capabilities of these technologies change qualitatively, and we feel that open source is irresponsible, then we will choose not to disclose it. All of this is fraught with uncertainty.

Patel: When you develop Llama-4 or Llama-5, are there any specific qualitative changes that will make you consider whether you should open source?

Zuckerberg: It's hard to answer this question from an abstract perspective because any product can have potential risks, and the key is how we effectively manage and mitigate those risks. In Llama-2, we've faced some challenges and invested significant resources to ensure that it isn't used for bad purposes such as acts of violence. That doesn't mean it has become an intelligent subject, just because it has a wealth of knowledge about the world and can answer a range of potentially risky questions. So I think the problem is how to identify and mitigate its potential bad behavior, not the behavior itself.

In my opinion, evaluating whether something is good or bad involves multiple dimensions, and it's hard to list all the possibilities in advance. Using social media as an example, we've addressed many types of harmful behaviors and grouped them into 18 or 19 categories. We've built artificial intelligence systems to recognize these behaviors to reduce their occurrence on our platform. As time goes on, I'm sure we'll further refine these classifications. This is an issue we've been working hard to research because we want to make sure we have a deep understanding of it.

Patel: I think it's very important to deploy artificial intelligence systems widely so that everyone has the opportunity to use them. I would be disappointed if future artificial intelligence systems were not widely used. At the same time, I'd like to learn more about mitigating potential risks. If mitigation measures are mostly fine-tuning, the benefit of open model weights is that people can make deeper adjustments based on these capabilities. Currently, these models are far from reaching that level; they are more like advanced search engines. But if I could show them my Petri dish and let them explain why my smallpox samples aren't growing and how to improve, then how can I ensure safe and effective use of these models in this case? After all, someone might fine-tune these models to suit their needs.

Zuckerberg: It's a complicated question, really. I think most people would choose to just use ready-made models, but there are also those with bad intentions who might try to use these models for bad behavior. So the question is really worth pondering. From a philosophical point of view, the reason I'm so supportive of open source is because I think if artificial intelligence becomes too centralized in the future, the potential risks may be as high as its widespread spread. Many people are wondering, “If we can do this, would the widespread application of these technologies in society be a bad thing?” At the same time, another question worth thinking about is if an organization has more powerful artificial intelligence than everyone else, is that also a bad thing?

I can explain it by using an analogy in the field of security. Imagine if you were able to understand and exploit certain security flaws in advance, then you could almost easily hack into any system. This isn't limited to the field of artificial intelligence. So we can't just rely on a highly intelligent artificial intelligence system to identify and fix all bugs, even though this seems theoretically possible. So how is our society dealing with this problem? Open source software plays an important role in this. It allows software improvements not to be limited to a single company, but can be widely used in various systems, including banks, hospitals, and government agencies. As software continues to improve, standards are gradually being established on how these software works, thanks to more people being able to participate in reviewing and testing. When upgrades are needed, the world can act quickly together. I think in a world where AI is widely deployed, these AI systems will gradually be hardened over time, and all the different systems will be controlled in some way.

In my opinion, this distributed and widely deployed approach is healthier than a centralized approach. Of course, there are risks in every way, but I don't think people have fully discussed those risks. There is a real risk of artificial intelligence systems being used for bad behavior. What I'm more concerned about, however, is that an untrustworthy entity has a super powerful artificial intelligence system, and I think this could be a bigger risk.

Patel: Will they try to overthrow our government because they have weapons others don't have? Or just creating a ton of chaos?

Zuckerberg: My instincts tell me that these technologies will eventually become very important and valuable for economic, security, and many other reasons. If our enemies or people we don't trust get more powerful technology, then this can indeed become a serious problem. Therefore, I think the best mitigation method may be to promote the development of good open source artificial intelligence, make it an industry standard, and play a leading role in many ways.

Patel: Open source AI systems can really help build a fairer and more balanced arena, which makes perfect sense in my opinion. If this mechanism works successfully, it is certainly the future I am looking forward to. What I want to explore further, however, is how can open source artificial intelligence prevent people from using their artificial intelligence systems to create chaos from a mechanistic perspective? For example, if someone tries to make a biological weapon, can we respond by carrying out extensive research and development around the world to develop a corresponding vaccine very quickly? What is the specific operating mechanism of this?

Zuckerberg: From the security perspective I mentioned earlier, I think people with weak AI systems will have a relatively low success rate when trying to hack into systems protected by stronger AI.

Patel: But how do we make sure everything in the world is handled properly like this? The case of biological weapons, for example, may not be so simple.

Zuckerberg: Indeed, I can't assert that everything in the world can be solved so smoothly. Biological weapons are one of the focuses of those who are deeply concerned about such issues, and I think this concern is justified. While there are mitigations, such as trying not to train certain knowledge in the model, we must recognize that in some cases this can indeed be a risk if extremely bad actors are encountered and there is no other artificial intelligence to counterbalance them and understand the severity of the threat. This is one of the issues we must take very seriously.

Patel: Have you experienced any unexpected situations when deploying these systems? For example, during training Llama-4, it may lie to you for some reason. Of course, this might not be common for systems like Llama-4, but have you considered a similar situation? For example, would you be very concerned about the deceptive nature of the system and the problems that billions of copies of this system could cause in the wild spreading freely?

Zuckerberg: Currently, we have observed many hallucinations. I think how to differentiate between illusion and deception is an issue worth exploring in depth. Indeed, there are many risks and factors to consider. In running our company, I tried to at least balance these long-term theoretical risks with what I think actually exists right now. So when it comes to deception, my biggest concern is that someone might use this technology to create misinformation and spread it through our network or other networks. To combat this harmful content, we're building artificial intelligence systems that are smarter than adversarial systems.

This forms part of my understanding of the matter. By looking at the different types of harm people cause or attempt to cause on social networks, I've found that some of the damage isn't extremely confrontational. For example, hate speech isn't highly confrontational on some level because people aren't becoming more racist because of online speech. In this regard, I think artificial intelligence is generally more complex and faster than humans in dealing with these problems. However, we both have issues. People may act improperly for a variety of purposes, whether they are trying to incite violence or other inappropriate acts, but we also have to face a large number of misinformation, that is, we may have misexamined content that should not have been censored. This situation is certainly bothering many people. As a result, I believe the situation will gradually improve as artificial intelligence becomes more accurate in this regard.

Whether it's Llama-4 or the future of Llama-6, we all need to think deeply about the behaviors we observe, and not just us. You chose to open source this project partly because so many researchers are also working on it. Therefore, we hope to share our observations with other researchers, jointly explore possible mitigation strategies, and consider open sourcing everything while keeping everything safe. In the foreseeable future, I am optimistic that we will be able to do this. At the same time, in the short term, we can't ignore the problem that people are trying to misbehave with models today. Even though these acts are not devastating, we are aware of some of the serious everyday hazards in operating our services.

Patel: I found the synthetic data thing really interesting. Using the current model, through repeated use of synthetic data, there may be an asymptotic line of performance, which is theoretically justified. But let's say these models are smarter and can use techniques like the ones you mentioned in your paper or upcoming blog post to find the most correct chain of thought. So why don't you think this will lead to a cycle where the model gets smarter, produces better output, and in turn gets smarter, and so on? Of course, this change won't happen overnight, but after months or years of continuous training, it is possible that the model will become more intelligent.

Zuckerberg: I think this kind of cyclic improvement is possible within the parameters of the model architecture. However, as far as the current 8 billion parameter models are concerned, I don't think they can reach the same level as advanced models with tens of billions of parameters and incorporating the latest research results.

Patel: Regarding these models, they're also going to be open source, right?

Zuckerberg: Yes, that's true. However, all of this presupposes that we must successfully address the challenges and issues discussed previously. We certainly hope so, but I also know that at every stage of building software, although the software itself has huge potential and possibilities, its operation is still to some extent physically limited by the chip's performance. As a result, we are always faced with various physical constraints. How big a model can become really depends on how much energy we can capture and use to reason. I am very optimistic about the future of artificial intelligence technology and believe they will continue to rapidly evolve and improve. At the same time, I'm more cautious than some people. I don't think it's particularly easy to get out of control, but we still need to be alert and carefully consider all possible risks. So I think it makes a lot of sense to keep options open.

6. Caesar and the metaverse

Patel: OK, let's move on to another topic — the metaverse. In the long course of human history, which period would you most like to explore in depth? From 100,000 BC to the present, do you just want a glimpse of what it was like back then? Does this exploration have to be limited to the past?

Zuckerberg: Yes, I prefer to explore the past. I am fascinated by American history, classical history, and scientific history. I think it would be very interesting to be able to observe and understand how those major historical advances have taken place. What we can rely on, however, is only limited historical records. For the metaverse, I'm afraid it would be very difficult to completely recreate those periods of history that we have no record of. Actually, I don't think going back to the past would be one of the main applications of the metaverse. Although this kind of function may be useful in history teaching, etc., the most important thing for me is that no matter where we are in the world, we can interact and coexist with others in real time. I am convinced that this is a killer app.

In our previous conversation about artificial intelligence, we delved into many of the physical limitations behind it. One of the valuable lessons technology has taught us is that we should work to free more things from physical constraints and move to the software field, because software is not only easier to build and evolve, but also easier to popularize. After all, not everyone can own a data center, but many people can write code, get open source code, and modify and optimize it. The metaverse is the perfect platform to achieve this goal.

This will be a major disruptive change, and it will dramatically change the way people perceive gathering and interaction. As a result, people will no longer feel like they have to come together in person in order to accomplish many things. Of course, I am also convinced that in some situations, meeting in person is still irreplaceable. This isn't an either/or choice; the advent of the metaverse doesn't mean we have to completely abandon face-to-face communication. However, it does provide us with a new dimension, allowing us to socialize, connect, get work done more easily and efficiently, and play a huge role in many fields such as industry and medicine.

Patel: We mentioned one thing before, you're not selling your company for a billion dollars. You obviously also have strong beliefs about the metaverse, although the market is skeptical about it. I'm curious, what is the source of this confidence? You've said “oh my values, my instincts,” but that seems a bit general. If you can specify some traits relating to yourself, maybe we can better understand why you are so confident in the metaverse.

Zuckerberg: I think this involves a few different issues. First, what drives me to keep moving forward? We've already discussed a lot of topics. I love to create, especially around how people communicate, express themselves and work. When I was in college, I majored in computer science and psychology, and the intersection of these two fields has always been critical for me. This is also my strong driving force. I don't know how to explain it, but deep down I always feel like if I don't create something new, then I'm doing something wrong. Even when we developed a business plan to invest $100 billion in artificial intelligence or the metaverse, our plans already made it pretty clear that these projects, if successful, would bring huge returns.

But of course, you can't be sure of everything right from the start. People always have all kinds of arguments and questions. Like, “How can you be confident enough to do this?” For me, if one day I stop trying to create new things, then I lose myself. I'll go somewhere else and keep creating. Basically, I can't imagine myself just running something without trying to create something new that I think is interesting. For me it's not an issue if we want to try to build the next thing. I just can't stop creating. Not only in tech, but also in other areas of my life. For example, our family built a ranch in Kauai, and I personally participated in the design of all the buildings. When we started raising cows, I thought, “OK, I want to raise the best cows in the world.” Then we began planning how to build everything we needed to achieve this goal. This is me!

Patel: I've always been curious about one thing: in high school and college, when you were only 19, you read a lot of ancient and classic books. I was wondering, what important lessons have you learned from these books? Not only was the content you found interesting, but more importantly, considering that the scope of knowledge you were exposed to at the time was limited.

Zuckerberg: One thing that really fascinated me was how Caesar Augustus became emperor and worked to establish peace. At that time, people had no real concept of peace; what they understood was just a short break before the enemy attacked again. He had a vision to change the economy from reliance on mercenaries and militarism to achieving a positive peace game, an idea that was very novel at the time. This reflects a very basic truth: the boundaries of reasonable ways to work that people could imagine at the time.

This concept applies both to the metaverse and to fields such as artificial intelligence. Many investors and others have trouble understanding why we should open source these technologies. They might say, “I don't understand. Now that it's open source, wouldn't you spend less time creating your proprietary technology?” But I think this is a profound idea in the technical field; it actually created more winners. I don't want to stress this analogy too much, but I do think that many times, it's hard for people to understand the model that constructs things, why this would be a valuable thing for people, or why this would be a reasonable state of affairs in the world. In fact, there are far more reasonable things than one might think.

Patel: That's really interesting. Can I share my thoughts? It may be a bit off topic, but I think this is probably because some important figures in history have already made their mark when they were young. Caesar Augustus, for example, became an important figure in Roman politics when he was 19 years old. He led battles and established alliances. I wonder if you're 19 years old and have had a similar idea: “Since Caesar Augustus did it, then I can do it too.”

Zuckerberg: This is really an interesting observation. It not only comes from rich history, but also echoes the history of our United States. I love Picasso's words: “All children are artists, and the challenge is how to maintain their status as artists when they grow up.” When we're young, we're more likely to have crazy ideas. In your life, company, or whatever you've built, there's an analogy similar to the innovator's dilemma. Early in your career, you're more likely to reorient yourself and accept new ideas without being held back by commitments to other things. I think this is an interesting part of running a company: how do you stay active and how do you continue to innovate?

7. Open source model worth $10 billion

Patel: Let's get back to investors and open source. Imagine we have a model worth up to $10 billion, and this model has gone through a rigorous safety assessment. At the same time, evaluators can also fine-tune the model. So would you open source a $10 billion model?

Zuckerberg: As long as it works for us, then open source is an option worth considering.

Patel: But would you actually do that? After all, this is a model that cost $10 billion in R&D, and now it's time to open source it.

Zuckerberg: This is a question we need to carefully weigh over time. We have a long tradition of open source software. Generally speaking, we don't directly open source products, such as Instagram's code. However, we do open source a lot of the underlying infrastructure. For example, one of the biggest open source projects in our history is the Open Compute Project (Open Compute Project), where we open source all of our server, network switch, and data center designs. Ultimately, this has been a huge benefit for us. Although many people can design servers, today the entire industry basically uses our design as the standard. This means the entire supply chain is built around our design, which increases production efficiency, reduces costs, and saves us billions of dollars. This is really great.

Open source can help us in many ways. One way is that if people can find ways to run the model more cost-effectively, that would be a huge benefit for us. After all, our investment in this will amount to billions, or even tens of billions of dollars. So if we can improve efficiency by 10%, then we'll be able to save billions or tens of billions of dollars. Also, if there are other competitive models in the market, our open source behavior doesn't give a model a crazy advantage. Instead, it will promote the progress and development of the entire industry.

Patel: How do you think model training will be commercialized?

Zuckerberg: I think there are many possibilities for the development of training, and commercialization is indeed one of them. Commercialization means that as options increase in the market, the cost of training will be greatly reduced and become more friendly to the people. Another possibility is an increase in quality. You mentioned fine-tuning; currently, the options for fine-tuning are still quite limited for many larger models. While some options exist, they don't usually work for the biggest models. If we can overcome this challenge and achieve a wider range of fine-tuning capabilities, we will be able to show more diverse functionality in different applications or specific use cases, or integrate these models into specific toolchains. This not only speeds up the development process, but may also lead to quality differentiation.

Here, I'd like to use an analogy to illustrate. A common problem in the mobile ecosystem is the existence of two gatekeeper companies — Apple and Google — that place restrictions on what developers build. On an economic level, it's like when we build something, they charge a hefty fee. But what concerns me even more is the quality aspect. Very often, we want to release certain features but Apple refuses, which is really frustrating. So what we need to think about is, are we setting up a world for artificial intelligence dominated by a few companies that run closed models that control APIs to determine what developers can build? For our part, I can definitely say that we built our models to make sure we don't get into this situation. We don't want other companies to limit our ability to innovate. From an open source perspective, I don't think many developers want to be limited by these companies either.

So the key question is what will the ecosystem built around these models look like? What exciting new things will come out? To what extent can they improve our products? I believe that if these models eventually evolve like our databases, caching systems, or architectures, the community will be able to contribute valuable value to them and make our products even better. Of course, we will still try to be unique and not be greatly affected. We will be able to continue to focus on our core work and benefit from it. At the same time, as the open source community develops, all systems, whether our own or the community's, will be improved and upgraded.

However, there is also a possibility that the model itself may eventually become a product. In this case, choosing open source requires more complex economic considerations. Because once you choose open source, you're largely tantamount to commercializing your own model. But from what I've observed so far, it seems like we haven't reached that stage yet.

Patel: Are you looking forward to earning significant revenue by licensing your model to cloud providers? In other words, you want them to pay to provide model services on their platform.

Zuckerberg: Yes, we're really looking forward to such licensing agreements with cloud providers and expect to generate significant revenue from them. This is basically the license agreement we set up for Llama. In many dimensions, we have adopted a very permissive open source licensing strategy, providing a wide range of usage rights for the community and developers. But we've put limits on the biggest companies that use it. These restrictions are not intended to prevent them from using the model, but rather to allow them to communicate and negotiate with us when they intend to directly use the model we have built to resell and obtain commercial benefits from it. If it's a cloud service provider like Microsoft Azure or Amazon AWS that wants to resell our model as part of your service, then we expect to get a share of the revenue from it.

Patel: Your point about the balance of power is very reasonable; we really need to think about how to eliminate potential harm through better technical alignment or other methods. I want Meta to establish a clear framework, as other labs have done, that open source or even potential deployments aren't viable in some specific situations. Such a framework not only helps companies prepare for potential risks, but also makes people look forward to them.

Zuckerberg: You're right, the question of existential risk really deserves our close attention. Currently, however, we are more concerned about content risk, where models may be used to create violence, fraud, or other harmful behavior. Although discussing existential risk may be more appealing, it is actually this more common harm that we currently need to invest more effort into mitigating. For current models, and maybe even the next generation, we need to make sure they aren't being used for malicious acts such as fraud. As a big company, Meta has a responsibility to make sure we're doing a good enough job at this. Of course, we are also capable of dealing with both aspects at the same time.

Patel: As far as open source is concerned, what I'm curious about is, do you think the impact of open source projects such as PyTorch, React, and Open Compute on the world is likely to surpass Meta's influence on social media? I've talked to users of these services, and they think this possibility exists; after all, most of the operation of the internet depends on these open source projects.

Zuckerberg: Our consumer products do have a huge user base around the world, covering almost half of the world's population. However, I think open source is becoming a completely new, powerful way to build. It's probably like Bell Labs, where they initially developed transistors to enable long-distance calls, and this goal has actually been achieved and brought them significant profits. But after 5 to 10 years, when people look back at their proudest inventions, they may mention other more far-reaching technologies. I am convinced that many of the projects we have built, such as Reality Labs, some AI projects, and some open source projects, will have a lasting and profound impact on human progress. Although specific products evolve, appear, and disappear over time, their contributions to human society are enduring. This is an exciting part of what we can participate in as technology practitioners.

Patel: Regarding your Llama model, when will it be trained on your own custom chip?

Zuckerberg: Soon, we're working to push this process forward, but the Llama-4 may not be the first model to train on a custom chip. Our strategy is to start by processing inference tasks such as rankings and recommendations, such as Reels, news push ads, etc. These tasks previously consumed a large amount of GPU resources. Once we can transfer these tasks to our own chips, we'll be able to use the more expensive Nvidia GPUs to train more complex models. We expect to be able to train relatively simple models using our own chips in the near future, and eventually expand to training these huge models. Currently, this project is progressing smoothly. We have a clear and long-term plan and are progressing in an orderly manner.

8. Let's say you become the CEO of Google+

Patel: One last question: if you were named CEO of Google+, would you be able to lead it to success?

Zuckerberg: Google+? Oh, I don't know.

Patel: Okay, so the real last question would be: did you guys feel the pressure when Google launched Gemini?

Zuckerberg: The problem is that Google+ isn't without a CEO; it's just a division within Google. In my opinion, for most companies, especially those that have reached a certain size, focus is critical. Startups may be underfunded; they are testing an idea; they may not have all the resources they need. But as the business grows, companies will cross a certain threshold and begin to build more elements and create more value between these elements. However, something unexpected and surprising always happens in a business, and these are all valuable. But overall, I think the company's capabilities are largely limited by the scope of matters the CEO and management team can oversee and manage. Therefore, it is extremely important for us to keep our main matters a priority and focus as much as possible on the key things. As venture capitalist Ben Horowitz (Ben Horowitz) said: “Maintaining the main thing is the main thing.”

Editor/jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment