Zuckerberg's latest interview: Why is Llama 3, Meta's strongest open source model, worth 10 billion US dollars

騰訊科技 · Apr 19 21:19

来源：腾讯科技

4月19日消息，据国外媒体报道，美国当地时间周四，Facebook母公司$Meta Platforms (META.US)$重磅推出了其迄今最强大的开源人工智能（AI）模型——Llama 3，意图在激烈的行业竞争中追赶领先者OpenAI。此次发布的Llama 3模型，包括80亿和700亿参数的两个版本，而未来还将推出超过4000亿参数的顶配版，凸显了Meta在AI领域的雄心壮志。

据悉，Llama 3在多项行业基准测试中表现出卓越的性能，并新增了诸多功能，如改进的推理能力等。Meta计划将Llama 3深度整合到其虚拟助手Meta AI中，这款助手已广泛应用于Facebook、Instagram、WhatsApp和Messenger等热门应用，并即将迎来新一轮的更新，为用户带来更加智能、便捷的体验。

此外，Meta还宣布Llama 3将很快在亚马逊AWS、谷歌云、IBM的云平台WatsonX、微软云Azure和英伟达的NIM等平台上推出，并得到了AMD、戴尔、英特尔、英伟达等硬件巨头的支持。这一系列的合作与整合，无疑将进一步加速Llama 3在全球范围内的普及和应用。

在Meta发布Llama 3的重要时刻，该公司首席执行官马克·扎克伯格（Mark Zuckerberg）接受了知名科技播客主持人达瓦克什·帕特尔（Dwarkesh Patel）的专访。他们围绕Llama 3、通用人工智能（AGI）、能源瓶颈问题、人工智能技术的战略意义、开源的潜在风险、元宇宙（Metaverse）等话题展开了深入的探讨。同时，扎克伯格还分享了开源100亿美元模型和定制芯片源代码的决策过程。

以下为此次专访实录：

1、Llama 3顶配版正在训练中

帕特尔：马克，非常荣幸能够邀请你来到我们的播客节目。

扎克伯格：感谢你的邀请，帕特尔。很高兴能来，我一直都很喜欢你们的播客节目。

帕特尔：太好了，谢谢！现在，让我们先来聊聊Llama 3吧！请给我分享一些关于这个最新大模型和Meta AI的亮点和激动人心的新进展。

扎克伯格：我想大多数人可能更关注Meta AI的新版本，但实际上，我们在模型升级方面所做的努力才是重中之重。我们正在推出Llama 3。我们既将其作为开源项目提供给开发者社区，也将用它来支持Meta AI。关于Llama 3，我相信我们会有很多有趣的话题要聊。但我认为，最重要的是，现在我们相信Meta AI是最智能、最自由可用的AI助手，人们可以随时随地使用它。

此外，我们还整合了谷歌和必应的实时知识，让AI助手能够提供更准确、更全面的信息。我们计划让它在我们的应用中变得更加显眼，比如在Facebook和Messenger的顶部，你将能够直接使用搜索框来提出任何问题。除了这些，我们还增加了一些全新的创作功能，我认为这些功能非常酷，相信大家一定会喜欢。

尤其是动画功能，你可以轻松地将任何图片制作成动画，这非常有趣。这里有一个令人惊奇的功能，就是它能够在你打字的同时，实时生成并更新高质量图像。你只需要输入查询内容，比如“给我展示一张在田野里吃夏威夷果、喝啤酒的场景，背景里有奶牛、有山”，它就会根据你的输入实时更新图像，这种体验简直太神奇了。我相信大家会喜欢这个功能。

这就是大多数人将会看到的一些明显变化。我们正在逐步推出这些新功能，虽然目前还不是全球范围内都可用，但我们会先从一些国家开始，并在接下来的几周和几个月内逐步扩大范围。

我认为这将是一个非常大的突破，我很高兴能让大家体验到它。但如果你想要深入了解的话，Llama 3无疑是我们最具技术趣味性的部分。我们正在训练三个不同规模的Llama 3模型，包括80亿参数、700亿参数以及4050亿参数版本。

目前，前两个版本已经准备就绪，而最大规模的模型仍在训练中。虽然今天我们不能立即发布4050亿参数的版本，但我对80亿和700亿参数模型的表现充满信心。它们在各自的规模上均处于行业领先地位，我们也将通过博客文章详细公布它们的基准测试结果，让大家能够深入了解它们的性能。

当然，Llama 3是开源的，这意味着开发者们将有机会亲自尝试并探索它的潜力。我们还有一系列精心规划的发布路线图，将带来多模态、更多语言支持以及更长的上下文窗口（Context Window，是指语言模型在生成文本时，所考虑的文本片段的大小范围）等功能。预计在今年晚些时候，我们将推出那款令人期待的4050亿参数版本。根据目前的训练进展，它的MMLU（跨模态学习理解）得分已经接近85，我们预期它将在众多基准测试中展现出卓越的性能。

至于700亿参数的模型，它同样表现出色。今天我们正式发布它，它的MMLU得分约为82，并在数学和推理方面取得了不俗的成绩。我相信，让用户能够体验到这款模型将会非常有趣和有意义。

我想强调的是，即使是80亿参数的模型，其性能也几乎与我们之前发布的Llama-2版本相媲美。这意味着，即使是“最小”的Llama-3，在功能上也几乎与“最大”的Llama-2一样强大。

帕特尔：在我们深入剖析这些模型之前，我想回溯一下历史。我记得，在2022年，Meta面临着股价的大幅下滑，当时人们对你们采购英伟达H100芯片的大笔投资充满了疑惑。元宇宙的概念并未得到市场的广泛认可，我猜想，你那时投资H100的决策，是出于何种考量呢？你是如何预知到这些GPU的需求的？

扎克伯格：我想，当时我们正处于Reels项目的开发阶段。我们始终坚信，要预留足够的容量来应对那些尚未预见到的创新，而Reels项目正是一个这样的例子。我们发现，为了训练模型，我们需要更多的GPU。这是一个巨大的转变，因为我们的服务不再仅仅基于你所关注的人或页面来排列内容，而是开始大力推荐所谓的“未连接内容”——那些来自你未关注的人或页面的内容。

因此，我们可能展示的内容候选集已经从数千个激增到了数亿个。这自然需要一个全新的基础设施来支撑。我们已经在构建这样的基础设施，但在追赶TikTok的步伐时，我们遇到了瓶颈，无法迅速达到我们的期望。看到这种情况，我意识到：“我们必须确保自己不再陷入这种被动局面。所以，我们不仅订购了足够完成Reels和内容排序工作的GPU，而且订购量还翻了一番。”我们始终坚守的原则是，未来总会有我们无法预见的新事物出现，我们必须为此做好准备。

帕特尔：你知道那会是人工智能吗？

扎克伯格：我们原本以为这应该与训练大模型有关。但随后我意识到，它与内容推荐更为紧密相关。经营公司，就像打游戏，总有新的挑战出现。当时，我全身心投入Reels和其他内容推荐功能的开发中，希望它们能发挥巨大作用。如今，Instagram和Facebook能够向用户展示他们感兴趣的内容，即使这些内容来自他们未曾关注的人，这无疑是一个巨大的飞跃。回顾过去，那个决定无疑是明智的，它源于我们曾经落后的教训。这并不是说我们曾经“遥遥领先”过，实际上，很多决定之所以现在看来正确，是因为我们曾犯过错误，并从中汲取了教训。

帕特尔：2006年，你拒绝了10亿美元的收购提议，但我想，应该有一个价格，你会考虑出售Facebook，对吧？你心中有没有一个估值，觉得“这才是Facebook真正的价值，而他们并没有给出这个价钱？” 我知道，如果给你开出5万亿美元，你肯定会欣然接受。那么，你如何看待这个决定，是基于怎样的考量呢？

扎克伯格：我认为，这主要是个人选择的问题。回首当年，我并不确定自己是否已经足够成熟来做出这样的决策。周围有很多人都在讨论10亿美元的价格，他们基于各种理由进行分析，比如预期的收入和规模。但这些都远远超出了我们当时所处的阶段。说实话，我当时并没有足够的财务知识来参与这样的讨论，但我内心深处对我们所做的事情有着坚定的信念。

我也做过一些简单的分析，比如“如果我不做这个，我会去做什么？其实，我很喜欢创造新东西，喜欢帮助人们进行沟通，喜欢了解人们的动态以及人与人之间的互动。所以，我想，如果我卖了公司，我可能又会去创造另一个类似的公司，而且我还挺满意现在这个公司的。那么，为什么要卖呢？”我认为，人们做出的很多重大决定，其实都是基于我们的信念和价值观。实际上，通过分析来准确预测未来是非常困难的。

2、通往AGI之路

帕特尔：Facebook人工智能研究所（FAIR）历经了漫长的岁月，如今它似乎已深深嵌入到你们公司的核心之中。我想请教一下，在何时，构建通用人工智能（AGI）或你们所追求的那个宏伟目标，成为了Meta的首要任务？

扎克伯格：其实，这个转变已经悄然发生了一段时间。大约在10年前，我们创立了FAIR。当时的初衷是，在迈向通用人工智能或其他类似目标的道路上，会有许多创新涌现，而这些创新将不断推动我们各项业务的进步。因此，我们并没有将FAIR作为一个独立的产品来构思，而是作为一个研究团队来组建。在过去的10年里，FAIR创造了许多独特的成果，为我们的所有产品带来了显著的改进。它推动了多个领域的发展，并为这些领域内的其他创新者提供了灵感，它也因此创造出了更多改进我们产品的技术。这让我感到非常振奋。

近年来，随着ChatGPT的崛起以及图像创作领域扩散模型的涌现，我们明显感受到了一股巨大的变革之风。这些新技术令人叹为观止，它们将深刻影响人们与各个应用的交互方式。因此，我们决定组建第二个团队——通用人工智能团队，旨在将这些前沿技术融入我们的产品中，并构建能够支撑所有不同产品的领先基础模型。

当我们开始这一探索时，我们最初的想法是，我们所做的很多事情都具有很强的社交属性。它帮助人们与创作者互动，帮助人们与企业沟通，也帮助企业销售产品或提供客户服务。此外，它还可以作为智能助手，集成到我们的应用中、智能眼镜以及虚拟现实中。因此，我们起初并不完全确定是否需要一个完整的通用人工智能来支持这些用例。然而，随着我们在这些细微之处深入工作，我逐渐意识到，实际上通用人工智能的支持是必不可少的。例如，在开发Llama-2时，我们并没有优先考虑编码功能，因为人们并不会在WhatsApp上向Meta AI提出大量的编码问题。

帕特尔：现在他们会吗？

扎克伯格：我不知道，也不确定WhatsApp、Facebook或Instagram是否会成为用户提出大量编码问题的界面。或许是在我们即将上线的Meta.AI网站上，编码问题会更为普遍。然而，过去18个月里，我们惊讶地发现，编码实际上在众多领域中都扮演着至关重要的角色，而不仅仅局限于编程行业。即使用户并未直接提出编码相关的问题，对模型进行编码训练也有助于它们更为精确地回答问题，并在不同领域的推理中展现出卓越的能力。以Llama-3为例，我们专注于通过大量的编码训练来优化它，因为这将使其在各个方面都表现出色，即便用户的主要关注点并非编码问题。

推理能力则是另一个绝佳的例证。设想一下，当你与创作者交流，或作为企业试图与客户互动时，这种互动远非简单的“你发信息，我回复”模式。它涉及一个多步骤、深层次的思考过程，需要我们思考“如何更好地实现这个人的目标？”很多时候，客户并不清楚自己真正需要什么，或如何准确地提出问题。因此，仅仅回答问题并非人工智能的全部工作。我们需要更全面、更深入地思考，这实际上已转化为一个推理问题。如果某个团队在推理方面取得了重大突破，而我们仍停留在基础的聊天机器人阶段，那么我们的产品与其他团队所构建的产品相比，将显得黯然失色。最终，我们意识到，为了保持领先，我们必须全力解决通用智能问题，因此我们加大了赌注和投资，以确保能够取得这一突破。

帕特尔：那么，能够解决所有这些用户用例的Llama版本，是否足够强大到可以替代这座大楼里所有程序员的水平呢？

扎克伯格：我认为，随着时间的推移，这些技术将逐步成熟并展现出巨大的潜力。然而，关于Llama-10或未来的版本是否能完全取代程序员，这是一个复杂的问题。我并不认为我们是在试图取代人类，而是希望通过这些工具，赋予人们更强大的能力，让他们能够完成更多以前难以想象的工作。

帕特尔：假设我们的程序员未来在使用Llama-10后，他们的工作效率会提升10倍吗？

扎克伯格：我对此抱有极高的期望。我深信，人类的智能并非只由单一标准来衡量，因为每个人都拥有独特的技能和才华。在某个时刻，人工智能可能会在某些方面超越大多数人类的能力，但这完全取决于模型的强大程度。然而，我认为这是一个逐步演进的过程，通用人工智能并非一蹴而就的事情。我们其实是在逐步为模型增加不同的能力。

目前，多模态是我们重点关注的领域，从最初的照片、图像和文本，未来还将涉及到视频。鉴于我们对元宇宙的浓厚兴趣，3D技术也显得尤为重要。此外，我特别关注的一个模态是情感理解，这是我在行业中鲜少看到其他团队深入研究的领域。毕竟，人类大脑的大部分功能都致力于理解他人、解读表情和情感。我坚信，如果我们能够在这方面取得突破，使人工智能能够真正理解并表达情感，那么人与机器之间的互动将会变得前所未有的自然和深入。

你可能会认为这仅仅是视频或图像的范畴，但实际上，它们是人类情感表达非常专业的一个版本。因此，除了提升模型在推理和记忆方面的能力外，我们还需要关注许多其他不同的能力。我相信，在未来，我们不会仅仅满足于将问题输入一个查询窗口来寻求答案。我们将会有不同的记忆存储方式或定制模型，这些模型将更加个性化地服务于人们。这些都是人工智能所需发展的不同能力。当然，我们还需要解决模型的大小问题。我们既关心大型模型，也关心如何在有限的空间内运行小型模型。例如，如果你正在运行类似Meta AI这样的大型服务，那么它主要依赖于服务器端的强大计算能力。然而，我们也期待这些先进的技术能够融入小巧的设备中，比如智能眼镜。由于智能眼镜的空间非常有限，因此我们需要开发一种高效且轻量级的解决方案来适应这一环境。

帕特尔：假设我们投入100亿美元，甚至最终高达1000亿美元，用于在工业规模上实施智能推理，那么这些资金将用于哪些具体用例呢？是模拟技术吗？还是元宇宙中的人工智能应用？我们该如何有效利用数据中心来支持这些用例？

扎克伯格：根据我们的预测，智能推理将深刻改变几乎所有的产品形态。我认为，未来我们将看到一种Meta AI通用助手产品的出现。这种产品将从传统的聊天机器人逐渐演变而来，从简单地回答问题，发展到能够接收并执行更复杂的任务。这将需要大量的推理能力，同时也将引发对计算能力的巨大需求。

此外，与其他智能主体（Agent，是指人工智能系统所具备的智能能力和行为表现，包括感知、认知、推理、决策和行动等方面，从而在人机交互的环境中担任主导角色，实现与人类的智能互动）的互动，也将成为我们工作的重要部分，无论是为企业还是创作者服务。我认为，人类不会只与一个通用的人工智能互动，每个企业都将希望拥有代表其利益的人工智能。这些人工智能不会主要用来销售竞争对手的产品，而是通过独特的方式与企业、创作者和消费者互动。

特别值得一提的是，创作者将成为受益于此项技术的重要群体。我们平台上拥有约2亿创作者，他们普遍觉得每天的时间不够用，而他们的社区又渴望与他们互动。如果我们能够开发出一种技术，让创作者能够训练自己的人工智能，并借助它与社区保持互动，那将是非常强大的功能。

这些只是消费者用例的一部分。以我和我的妻子经营的陈-扎克伯格基金会为例，我们正在科学领域开展许多工作，而人工智能无疑将在推动科学、医疗保健等领域的进步中发挥关键作用。最终，智能推理将影响几乎每一个产品和经济领域。

帕特尔：你提及了能够执行多步骤任务的人工智能，这不禁让我好奇，这是否意味着我们需要一个更庞大的模型来实现这一功能？比如，对于Llama-4，我们是否需要一个拥有700亿参数的版本，仅需在正确的数据上进行训练，它就能展现出惊人的能力？目前，我们的进展主要表现在哪些方面？是模型规模的扩大吗？还是如您之前所说，是保持模型大小不变，但功能和应用场景更加多样化？

扎克伯格：关于这个问题，我们目前可能还没有明确的答案。但我观察到的一个明显趋势是，我们有一个基础的Llama模型，然后围绕它构建一些特定于应用程序的代码。这些代码有些是针对特定用例的微调，但也有一些是关于如何让Meta AI与谷歌、必应等工具协作以获取实时知识的逻辑，这些并不是Llama基础模型的一部分。在Llama-2的开发过程中，我们尝试将一些这样的功能融入模型，但更多是通过手工的方式。对于Llama-3，我们设定了一个目标，那就是将更多的此类功能内嵌到模型本身中。当我们开始探索更多类似智能主体的行为时，我认为其中一些功能仍然需要通过手工方式进行优化。而对于Llama-4，我们的目标是将更多的这些功能自然而然地融入模型中。

在每一步的进展中，你都能感受到未来可能的发展方向。我们开始尝试各种可能性，围绕模型进行各种实验。这有助于我们更深入地理解，哪些功能应该被纳入下一个版本的模型中。这样，我们的模型就能变得更加通用，因为显然，任何通过手工编码实现的功能虽然可以解锁一些用例，但在本质上都是脆弱且不够通用的。我们的目标是让模型能够自我学习、自我进化，以适应各种复杂多变的场景。

帕特尔：你提到的“将更多内容纳入模型本身”，能否具体解释一下，您是如何通过训练将这些期望的功能融入模型中的？你所说的“纳入模型本身”具体指的是什么？

扎克伯格：以Llama-2为例，它的工具使用功能相对具体和有限。而到了Llama-3，我们欣喜地发现其工具使用能力得到了显著提升。现在，我们不必再手动编码所有内容来使其能够使用谷歌进行搜索，它已经能够独立完成这些任务。同样，在编程、运行代码以及其他一系列任务上，Llama-3也展现出了出色的能力。一旦我们获得了这种能力，就可以预见我们接下来可以开始探索哪些新的可能性。我们不必等到Llama-4的出现才开始构建这些能力，因此我们可以提前围绕它进行各种尝试和实验。虽然这些手工编码的过程可能会使产品暂时变得更好，但它也为我们指明了在下一个版本的模型中应该构建哪些内容的方向。

帕特尔：在开源社区对Llama-3进行的微调中，你最期待看到哪些用例？也许不是对你最有实用价值的那个，而是你最感兴趣、最想尝试的那个。比如，我听说有人对古代历史方面进行了微调，使得我们可以直接与古罗马诗人维吉尔（Virgil）等历史人物进行对话。

扎克伯格：我认为这类事物的魅力就在于它总能带给我们惊喜。任何我们认为有价值的特定应用案例，都有可能去尝试构建。我相信我们会看到更多精简版本的模型出现。我也期待看到一个参数更少、更轻量级的模型，比如一个只有10亿到20亿参数的模型，甚至是一个5亿参数的模型，看看它们能带来哪些有趣且高效的应用。如果一个80亿参数的模型几乎与最大的Llama-2模型一样强大，那么一个10亿参数的模型应该也能在某些领域展现出其独特的价值。它们可以用于分类任务，或者用于在人们理解用户查询意图并将其传递给更强大的模型进行精确处理之前的预处理工作。我认为这将是社区可以发挥巨大作用的一个领域，帮助我们填补这些模型在应用上的空白。当然，我们也在考虑对这些模型进行精简和优化，但目前我们的所有GPU资源都主要用于训练4050亿参数的模型。

帕特尔：你之前提到的GPU数量，我记得你说年底前会达到35万个。

扎克伯格：对，那是我们的总目标。目前，我们已经建立了两个大型的GPU集群，每个集群拥有约22000到24000个GPU，它们主要用于训练大型的模型。当然，这些集群还承担着我们公司其他许多重要的训练任务，比如Reels模型、Facebook新闻推送和Instagram推送的训练等。推理对我们来说确实是一个巨大的挑战，因为我们需要为庞大的用户群体提供服务。与其他从事类似工作的公司相比，我们所需的推理计算与训练计算的比例可能要高得多，这主要是因为我们所服务的社区规模极其庞大。

帕特尔：我注意到，在你们之前分享的材料中，有一个非常引人注目的点，那就是你们在训练模型时使用的数据量实际上超过了仅用于训练时的计算最优数据量。考虑到推理对你们和整个社区的重要性，拥有一个包含数万亿个token的模型确实非常有意义。

扎克伯格：关于700亿参数的模型，我们观察到一个有趣的现象。原本以为随着数据量的增加，模型的性能提升会逐渐趋于饱和。然而，我们训练了大约15万亿个token后，发现模型仍然在不断学习。即使在训练的最后阶段，它仍然展现出了强大的学习能力。我们或许还可以继续给它输入更多的token，以进一步提高其性能。

但作为公司的经营者，我们需要在某个时刻做出决策：是否应该继续将GPU资源用于进一步训练这个700亿参数的模型？还是应该转向其他方向，比如开始为Llama-4测试新的假设？我们需要在这两者之间找到平衡。目前，我认为我们在这个版本的700亿参数模型上已经取得了不错的平衡。当然，未来我们还会推出其他版本，比如700亿参数的多模态版本，这将在接下来的一段时间内与大家见面。但有一点非常令人着迷，那就是目前的模型架构竟然能够容纳如此庞大的数据量。

3、能源瓶颈

帕特尔：这确实引人深思。那么，对于未来的模型来说，这意味着什么呢？你之前提及Llama-3的80亿参数版本在某些方面甚至超越了700亿参数的Llama-2。

扎克伯格：不、不，我可不想夸大其词。它们的表现其实相当接近，数量级上非常相似。

帕特尔：那么，我们是否可以期待Llama-4的700亿参数版本能够与Llama-3的4050亿参数版本相媲美呢？未来的发展趋势又会是怎样的呢？

扎克伯格：这确实是一个大问题。说实话，没人能确切预测。世界上最难预测的事情之一就是指数级增长的趋势。它会持续多久？我坚信，我们将会继续向前迈进。我认为，投入100亿美元，甚至1000亿美元以上来建设基础设施是非常值得的。假设这种增长趋势能够持续，我们将会得到一些真正令人震撼的成果，从而打造出令人惊叹的产品。但业内没有人能确切地告诉你，它一定会以那个速度继续扩展。从历史上看，我们总会在某个时刻遇到发展的瓶颈。但如今，人们对这个领域寄予了极高的期望，或许这些瓶颈会很快被克服。这确实是一个值得我们深入思考的问题。

帕特尔：假设没有这些瓶颈，世界会呈现怎样的面貌呢？尽管这似乎不太可能，但如果技术进步真的能够继续以这种速度发展下去呢？

扎克伯格：无论如何，总会有新的挑战和瓶颈出现。在过去的几年里，GPU的生产就是一个明显的问题。即使有钱购买GPU的公司，也往往难以获得所需的数量，因为供应受到限制。但这种情况似乎正在逐步改善。如今，我们看到越来越多的公司正在考虑投入巨资来建设生产GPU的基础设施。我认为这种情况还会持续一段时间。

此外，资本投入也是一个需要考虑的问题。在什么时候，投入更多的资本就不再具有性价比了呢？实际上，我认为在我们遇到资本投入问题之前，能源问题会率先显现。据我所知，目前还没有人能够建造出一个千兆瓦特的单一训练集群。我们会遇到一些在全球范围内都会变得日益困难的事情，比如获取能源许可。这不仅仅是一个软件问题，它涉及到政府的严格监管，我认为这比我们许多技术界人士所感受到的还要严格。当然，如果你是从小公司起步的，可能这种感觉并不那么强烈。但当我们与不同的政府部门和监管机构打交道时，我们需要遵守大量的规则，并确保我们在全球范围内都做得合规。但毫无疑问，能源方面将是我们面临的一个主要限制。

如果你谈论的是建造大型新发电厂或大型建筑，并需要跨越其他私有或公有土地来建设输电线路，那么这将是一个受到严格监管的项目。你需要考虑的是多年的前置时间。如果我们想要建立一个庞大的设施，为其提供动力将是一个长期而复杂的项目。我相信人们会努力去实现这一目标，但我不认为这会像达到某种人工智能水平、获得大量资本并投入其中那样简单和神奇，然后突然之间模型就会有飞跃式的进步。

帕特尔：在推动人工智能发展的道路上，我们是否会遇到一些连Meta这样的公司都无法独自克服的瓶颈？是否存在某些项目，即使是像Meta这样的公司也没有足够的资源去完成？即使你们的研发预算或资本支出预算增加10倍，仍然无法实施？这是否是你心中所想，但鉴于目前的Meta，你们甚至无法通过发行股票或债券来筹集足够的资金？

扎克伯格：能源问题无疑是其中的一大挑战。我坚信，如果我们能够解决能源供应的问题，我们完全有可能建造出比现在规模更大的算力集群。

帕特尔：那么，这从根本上来说是资金瓶颈的限制吗？

扎克伯格：资金确实是其中一个方面，但我认为时间也是一个不可忽视的因素。目前，许多数据中心的规模大约在50兆瓦到100兆瓦之间，大型的可能会达到150兆瓦。假设你拥有一个完整的数据中心，并配备了所有必要的训练设备，你建造了目前技术允许的最大集群。我认为很多公司都已经接近或达到了这样的水平。但是，当我们谈论建造300兆瓦、500兆瓦甚至1吉瓦的数据中心时，情况就完全不同了。目前，还没有人尝试过建造1吉瓦的数据中心。我相信这将成为可能，只是需要时间的积累。然而，这不会发生在明年，因为其中涉及的许多事情需要数年时间来完成。从这个角度来看，我认为一个1吉瓦规模的数据中心将需要一个相当于核电站的能源供应来支持模型训练。

帕特尔：亚马逊是否已经在这方面有所尝试？他们似乎有一个950兆瓦的设施。

扎克伯格：关于亚马逊的具体做法，我并不是非常了解，你可能需要直接向他们询问。

帕特尔：训练不一定非得局限在单一地点，对吧？如果分布式训练是有效的，那么其实我们可以考虑将它分散到多个地方进行。

扎克伯格：我认为这是一个非常重要的问题，关乎于未来训练大型模型的方式。从目前的发展趋势来看，通过推理生成合成数据，再将这些数据用于模型的训练，似乎是一个很有潜力的方向。虽然目前我还不清楚这种合成数据与直接训练之间的比例会是多少，但我相信合成数据的生成在某种程度上已经越来越接近推理的过程。显然，如果这种方式被用于训练模型，那么它将成为整个训练流程中不可或缺的一部分。

帕特尔：所以，这仍然是一个悬而未决的问题，关于如何找到这种平衡，以及它未来的发展方向。那么，这种趋势有可能在Llama-3，甚至Llama-4及以后的版本上实现吗？也就是说，如果你们发布了模型，那些拥有强大计算能力的实体，比如科威特或阿联酋，他们就可以利用这类模型，使某些应用变得更加智能。

扎克伯格：我完全同意这种可能性。确实，我认为将来会有这样的动态发展。但同时，我也认为模型架构本身存在某些根本的局限性。以Llama-3为例，尽管我们已经取得了显著的进步，但我相信其架构仍有进一步优化的空间。正如我之前所说，我们感觉通过提供更多的数据或者进行某些关键步骤的迭代，模型的性能还可以继续提升。

事实上，我们已经看到许多公司基于Llama-2的700亿参数模型架构构建出了新的模型。然而，对于像Llama-3的700亿或4050亿参数这样的模型，要进行代际改进并非易事，目前还没有类似的开源模型出现。我认为这是一个巨大的挑战，但也是一个巨大的机遇。然而，我仍然认为，基于现有的模型架构，人们能够构建出的东西并不是无限可扩展的。在达到下一个技术飞跃之前，我们可能只能在现有基础上进行一些优化和改进。

4、AI会在一夜之间失控吗？

帕特尔：下面让我们从更宏观的角度来看，你认为未来几十年人工智能技术将如何发展？它是否会让你觉得像另一种技术，比如元宇宙或社交技术，还是你觉得它在人类历史上具有根本性的不同？

扎克伯格：我认为人工智能将会是非常基础性的技术。它更像计算机的发明，将催生一系列全新的应用。就像网络或手机的出现，使得许多以前不可能的事情变得可能，人们开始重新思考这些体验。因此，我认为人工智能将会带来类似的变革，但它是一种更深层次的创新。我的感觉是，它就像是从没有计算机到有计算机的转变。然而，要准确预测它究竟会如何发展，确实很难。从更长的宇宙时间跨度来看，这一变革将会很快发生，可能就在几十年内。有些人确实担心它会迅速失控，一夜之间从某种程度的智能变得极其智能。但我认为，由于存在许多物理限制，这种情况不太可能发生。我并不认为我们会一夜之间面临人工智能失控的局面。我相信我们将有足够的时间去适应。但人工智能将真正改变我们的工作方式，为人们提供创新的工具去做不同的事情。它将使人们能够更自由地追求他们真正想做的事情。

帕特尔：也许不是一夜之间，但从宇宙时间的角度来看，你认为我们可以这样看待这些里程碑吗？人类进化了，然后人工智能出现了，接着它们可能走向银河系。这可能需要几十年，也可能需要一个世纪，但这是你眼中正在发生的宏伟计划吗？我指的是像计算机甚至是火这样的其他技术，但人工智能本身的发展是否与人类最初的进化一样重要？

扎克伯格：我认为这很难判断。人类的历史基本上是一部逐渐认识到我们在某些方面并不独特，但同时又意识到人类仍然非常特别的历程。我们曾认为地球是宇宙的中心，但事实并非如此，然而人类依然保持着非凡的特质，对吧？我认为人们经常存在另一种偏见，即认为智能与生命在某种程度上有着紧密的联系，但事实并非如此。我们还没有对意识或生命有足够清晰的定义来全面理解这个问题。有很多科幻小说描述了智能生命的创造，这些智能开始展现出各种人类般的行为等。但目前的趋势似乎表明，智能可以相当独立于意识、能动性和其他特质存在，这使得它成为一个非常有价值的工具。

5、开源的危险

扎克伯格：预测这些事物随时间发展的方向极具挑战性，因此，我认为任何人都应避免以教条的方式规划它们的开发或用途。每次发布新产品时，我们都需要重新评估。我们非常支持开源，但并不意味着我们会公开所有成果。我倾向于认为，开源对社区和我们自身都是有益的，因为这将促进创新。然而，如果某个时刻，这些技术的能力发生了质的变化，而我们觉得开源是不负责任的，那么我们会选择不公开。这一切都充满了不确定性。

帕特尔：当你们研发Llama-4或Llama-5时，有没有可能出现某种具体的质的变化，让你们考虑是否应该开源？

扎克伯格：这个问题很难从抽象的角度来回答，因为任何产品都可能存在潜在风险，关键在于我们如何有效地管理和缓解这些风险。在Llama-2中，我们已经面临了一些挑战，并投入了大量资源来确保它不会被用于不良目的，如暴力行为等。这并不意味着它已经成了智能主体，只是因为它拥有大量有关世界的知识，能够回答一系列可能带来风险的问题。因此，我认为问题在于如何识别并缓解其潜在的不良行为，而非行为本身。

在我看来，评估事物的好坏涉及多个维度，很难事先列举所有可能性。以社交媒体为例，我们已经处理了多种类型的危害行为，并将它们分为18或19个类别。我们建立了人工智能系统来识别这些行为，以减少它们在我们平台上的发生。随着时间的推移，我相信我们会进一步细化这些分类。这是我们一直在努力研究的问题，因为我们希望确保对此有深入的理解。

帕特尔：我认为广泛部署人工智能系统，让每个人都有机会使用它们是非常重要的。如果未来的人工智能系统没有得到广泛应用，我会感到失望。同时，我也希望更深入地了解如何缓解潜在风险。如果缓解措施主要是微调，那么开放模型权重的好处在于，人们可以基于这些能力进行更深入的调整。目前，这些模型还远未达到那个水平，更像是高级搜索引擎。但如果我能向它们展示我的培养皿，并让它们解释为什么我的天花样本没有生长以及如何改进，那么在这种情况下，如何确保安全和有效地使用这些模型呢？毕竟，有人可能会对这些模型进行微调以满足自己的需求。

扎克伯格：确实，这是一个复杂的问题。我认为，大多数人会选择直接使用现成的模型，但也有一些心怀不轨的人可能会试图利用这些模型进行不良行为。因此，这个问题确实值得我们深思。从哲学角度来看，我之所以如此支持开源，是因为我认为未来如果人工智能过度集中化，其潜在风险可能不亚于它的广泛传播。许多人都在思考：“如果我们能够做到这些，那么这些技术在社会上的广泛应用是否会成为坏事？”同时，另一个值得思考的问题是，如果一个机构拥有比其他所有人更强大的人工智能，这是否也是一件坏事？

我可以用安全领域的一个类比来解释。想象一下，如果你能够提前了解并利用某些安全漏洞，那么你几乎可以轻松地入侵任何系统。这并不仅仅局限于人工智能领域。因此，我们不能单纯依赖一个高度智能的人工智能系统来识别并修复所有漏洞，尽管这在理论上似乎可行。那么，我们社会是如何应对这一问题的呢？开源软件在其中扮演了重要角色。它使得软件的改进不再局限于单一公司的范围，而是能够广泛应用于各种系统，包括银行、医院和政府机构。随着软件的不断完善，得益于更多的人可以参与查看和测试，关于这些软件如何工作的标准也逐渐建立。当需要升级时，全世界可以迅速共同行动。我认为，在一个人工智能广泛部署的世界中，随着时间的推移，这些人工智能系统会逐步得到加固，所有不同的系统都将以某种方式得到控制。

在我看来，这种分布式、广泛部署的方式比集中化的方式更为健康。当然，各方面都存在风险，但我认为人们并没有充分讨论这种风险。确实存在人工智能系统被用于不良行为的风险。然而，我更担心的是，一个不可信的实体拥有超级强大的人工智能系统，我认为这可能是一个更大的风险。

帕特尔：他们会不会因为拥有别人没有的武器而试图推翻我们的政府？或者只是制造大量的混乱？

扎克伯格：直觉告诉我，出于经济、安全和其他多种原因，这些技术最终会变得非常重要和有价值。如果我们的敌人或我们不信任的人获得了更强大的技术，那么这确实可能成为一个严重的问题。因此，我认为最好的缓解方式可能是推动好的开源人工智能的发展，让它成为行业的标准，并在多个方面发挥领导作用。

帕特尔：开源人工智能系统确实有助于建立一个更公平、更平衡的竞技场，这在我看来是极为合理的。如果这种机制能够成功运作，那无疑是我所期待的未来。然而，我想进一步探讨的是，从机制层面来看，开源人工智能是如何防止有人利用他们的人工智能系统制造混乱的？比如说，如果有人试图制造生物武器，我们是否可以通过在全球范围内进行大量的研发，以极快的速度开发出相应的疫苗来应对？这其中的具体运作机制是怎样的呢？

扎克伯格：从我之前提及的安全角度来看，我认为拥有较弱人工智能系统的人试图入侵受更强人工智能保护的系统，其成功率会相对较低。

帕特尔：但是，我们如何确保世界上的所有事情都能像这样得到妥善处理呢？比如说，生物武器的情况可能并非如此简单。

扎克伯格：确实，我无法断言世界上的所有事情都能如此顺利解决。生物武器是那些对此类问题深感忧虑的人们所关注的焦点之一，我认为这种担忧是有道理的。尽管存在一些缓解措施，例如尝试不在模型中训练某些知识，但我们必须认识到，在某些情况下，如果遇到了极其恶劣的行为者，且没有其他人工智能来制衡他们并了解威胁的严重性，那么这确实可能成为一个风险。这是我们必须高度重视的问题之一。

帕特尔：在部署这些系统时，你有没有遇到过一些出乎意料的情况？比如，在训练Llama-4的过程中，它可能出于某种原因对你撒谎。当然，对于Llama-4这样的系统，这种情况可能并不常见，但你有没有考虑过类似的情况？比如，你会非常担心系统的欺骗性，以及这个系统的数十亿个副本在野外自由传播可能带来的问题？

扎克伯格：目前，我们已经观察到许多幻觉现象。我认为，如何区分幻觉和欺骗是一个值得深入探讨的问题。确实，存在许多风险和需要考虑的因素。在运营我们的公司时，我试图至少平衡这些长期的理论风险与我认为目前确实存在的实际风险。因此，当谈到欺骗时，我最担心的是有人可能会利用这种技术制造错误信息，并通过我们的网络或其他网络进行传播。为了对抗这种有害内容，我们正在构建比敌对系统更智能的人工智能系统。

这构成了我对此事的部分理解。通过观察人们在社交网络上造成或试图造成的不同类型的伤害，我发现其中有些伤害并非极具对抗性。举例来说，仇恨言论在某种层面上并非高度对抗性，因为人们并没有因为网络言论而变得更加种族歧视。在这一方面，我认为人工智能在处理这些问题时通常比人类更为复杂和迅速。然而，我们双方都存在问题。人们可能出于各种目的做出不当行为，无论是试图煽动暴力还是其他不当行为，但我们也不得不面对大量的误报情况，即我们可能错误地审查了一些本不应审查的内容。这种情况无疑让许多人感到困扰。因此，我相信随着人工智能在这方面变得越来越精确，情况将会逐渐改善。

无论是Llama-4还是未来的Llama-6，我们都需要深入思考我们观察到的行为，而且不仅仅是我们。你选择将这个项目开源，部分原因也是因为有众多的研究者也在致力于此。因此，我们希望能够与其他研究者共享观察结果，共同探索可能的缓解策略，并在确保一切安全的前提下，考虑将其开源。在可预见的未来，我乐观地认为我们能够做到这一点。同时，在短期内，我们也不能忽视人们今天试图利用模型进行不当行为的问题。即使这些行为并非毁灭性，但在运营我们的服务时，我们也深知一些相当严重的日常危害。

帕特尔：我发现合成数据的事情真的非常有趣。使用当前的模型，通过反复利用合成数据，可能会存在一个性能渐近线，这是有理论依据的。但假设这些模型变得更加聪明，能够利用你在论文或即将发布的博客文章中提到的那种技术，找到最正确的思维链。那么，你为何认为这不会导致一个循环，即模型变得更聪明，产生更好的输出，进而变得更聪明，如此往复呢？当然，这种变化不会一夜之间发生，但经过数月或数年的持续训练，模型的确有可能变得更加智能。

扎克伯格：我认为，在模型架构的参数范围内，这种循环提升是有可能发生的。然而，就目前的80亿参数模型而言，我并不认为它们能够达到与那些拥有数百亿参数、并融入了最新研究成果的先进模型相同的水平。

帕特尔：关于这些模型，它们也将是开源的，对吧？

扎克伯格：是的，确实如此。但是，这一切的前提是我们必须成功解决先前讨论过的那些挑战和问题。我们当然希望如此，但我也深知在构建软件的每个阶段，尽管软件本身有着巨大的潜力和可能性，但在某种程度上，其运行仍然受到芯片性能的物理限制。因此，我们总是面临着各种物理层面的约束。模型能够变得多大，实际上取决于我们所能获取并用于推理的能量有多少。我对于人工智能技术的未来持非常乐观的态度，相信它们将继续迅速发展和改进。与此同时，我也比一些人更为谨慎。我并不认为失控的情况会特别容易发生，但我们仍然需要保持警惕，并认真考虑各种可能的风险。因此，我认为保持开放选择是非常有意义的。

6、凯撒大帝与元宇宙

帕特尔：好的，让我们转向另一个话题——元宇宙。在人类历史的长河中，哪个时期你最想深入探索？是从公元前10万年到现在，你只是想一窥那时的风貌吗？这个探索必须局限于过去吗？

扎克伯格：确实，我更倾向于探索过去。美国历史、古典历史以及科学史都深深吸引着我。我认为，能够观察并理解那些重大历史进步是如何发生的，将是一件非常有趣的事情。然而，我们所能依赖的，只是那些有限的历史记载。对于元宇宙来说，想要完全重现那些我们没有记录的历史时期，恐怕会非常困难。实际上，我并不认为回到过去会是元宇宙的主要应用之一，虽然这样的功能在历史教学等方面可能会很有用，但对我而言，最主要的事情是，无论我们身处世界的哪个角落，都能与他人实时互动、共同存在，我坚信这才是杀手级应用。

在之前关于人工智能的对话中，我们深入探讨了许多背后的物理限制。技术教给我们的一个宝贵经验是，我们应该努力将更多事物从物理束缚中解放出来，转移到软件领域，因为软件不仅更容易构建和进化，而且更易于普及。毕竟，不是每个人都能拥有数据中心，但很多人都能编写代码、获取开源代码，并对其进行修改和优化。元宇宙正是实现这一目标的理想平台。

这将是一个颠覆性的巨大变革，它将极大地改变人们对聚集和互动的认知。因此，人们将不再觉得为了完成许多事情而必须亲自聚在一起。当然，我也深信在某些情境下，亲自相聚仍然具有无可替代的价值。这并非是一种非此即彼的选择，元宇宙的出现并不意味着我们要完全放弃面对面的交流。然而，它确实为我们提供了一个全新的维度，让我们能够更加便捷、高效地进行社交、建立联系、完成工作，并在工业、医学等众多领域发挥巨大的作用。

帕特尔：我们之前提到过一件事，你并没有以十亿美元的价格出售公司。对于元宇宙，你显然也有着坚定的信念，尽管市场对此有所质疑。我很好奇，这种信心的来源是什么？你说过“哦，我的价值观，我的直觉”，但这样的说法似乎有些笼统。你能具体说说与你自己有关的某些特质，或许我们能更好地理解你为何对元宇宙如此有信心。

扎克伯格：我认为这涉及到几个不同的问题。首先，关于是什么驱动我不断前进？我们已经讨论了很多主题。我热爱创造，特别是围绕人们如何交流、表达自己和工作的创造。在大学时，我主修计算机科学和心理学，这两个领域的交集对我来说一直是非常关键的。这也是我强烈的驱动力所在。我不知道如何解释，但我内心深处总觉得，如果我不去创造一些新东西，那我就做错了什么。即使在我们为投资1000亿美元于人工智能或元宇宙制定商业计划时，我们的计划已经相当清晰地表明，如果这些项目成功，将会带来巨大的回报。

但当然，你不能从一开始就确定一切。人们总会有各种争论和质疑。就像“你怎么会有足够的信心去做这件事？”对我来说，如果有一天我停止尝试创造新东西，那我就失去了自我。我会去别的地方继续创造。从根本上说，我无法想象自己只是运营某样东西，而不去尝试创造我认为有趣的新事物。对我来说，我们是否要尝试建造下一个东西，这不是问题。我就是无法停止创造。不仅在科技领域，我在生活的其他方面也是如此。例如，我们家在考艾岛建了一个牧场，我亲自参与了所有建筑的设计工作。当我们开始养牛时，我就想：“好吧，我要养出世界上最好的牛。”然后我们开始规划，如何建立起我们需要的一切来实现这个目标。这就是我！

帕特尔：我一直对一件事感到好奇：在高中和大学时期，年仅19岁的你就阅读了大量的古代和古典书籍。我想知道，你从这些书籍中学到了哪些重要的教训？不仅是你觉得有趣的内容，更重要的是，考虑到你当时所接触的知识范围毕竟有限。

扎克伯格：有一件事情让我深感着迷，那就是凯撒·奥古斯都如何成为皇帝，并努力建立和平。在那个时候，人们对和平并没有真正的概念，他们理解的和平，只不过是在敌人再次攻击之前的短暂间歇。他有着改变经济从依赖雇佣军和军事主义到实现正和游戏的远见，这在当时是非常新颖的想法。这反映了一个非常基本的事实：人们在当时所能想象到的合理工作方式的边界。

这个观念既适用于元宇宙，也适用于人工智能这样的领域。许多投资者和其他人难以理解我们为什么要开源这些技术。他们可能会说：“我不明白，既然开源了，那你们制作专有技术的时间岂不是会缩短？”但我认为，这在技术领域是一个深刻的观念，它实际上创造了更多的赢家。我不想过分强调这个类比，但我确实认为，很多时候，人们难以理解构建事物的模型，难以理解这对人们为什么会是一件有价值的事情，或者为什么这会是世界上一个合理的状态。实际上，合理的事情比人们想象的要多得多。

帕特尔：这真的很有意思。我可以分享一下我的想法吗？可能有些离题，但我觉得，这也许是因为历史上一些重要人物在年轻时就已经崭露头角。例如，凯撒·奥古斯都在19岁时就已经成为罗马政治界的重要人物，他领导战斗，并建立了同盟。我想知道，19岁的你是不是也有过类似的想法：“既然凯撒·奥古斯都做到了，那么我也能做到。”

扎克伯格：这确实是一个有趣的观察，它不仅来自丰富的历史，也与我们美国的历史相呼应。我很喜欢毕加索说的一句话：“所有孩子都是艺术家，挑战在于长大后如何保持艺术家的身份。”年轻时，我们更容易拥有疯狂的想法。在你的生活、公司或你所建立的任何事物中，都存在一种与创新者困境类似的类比。在职业生涯的早期阶段，你更容易调整方向，接受新想法，而不会因对其他事物的承诺而受阻。我认为，这也是经营公司的一个有趣部分：如何保持活力，如何持续创新？

7、开源价值100亿美元的模型

帕特尔：让我们重新回到投资者和开源的话题上。设想一下，我们拥有一个价值高达100亿美元的模型，且这个模型经过了严格的安全评估。同时，评估者们也能对模型进行微调。那么，你会开源价值100亿美元的模型吗？

扎克伯格：只要这对我们有利，那么开源就是一个值得考虑的选项。

帕特尔：但你真的会这么做吗？毕竟，这是投入了100亿美元研发成本的模型，现在却要将其开源。

扎克伯格：这是一个我们随着时间流逝需要仔细权衡的问题。我们有着悠久的开源软件传统。通常来说，我们并不会将产品直接开源，比如Instagram的代码。然而，我们确实会开源很多底层的基础设施。比如，我们历史上最大的开源项目之一便是开放计算项目（Open Compute Project），我们将服务器、网络交换机和数据中心的设计全部开源。最终，这为我们带来了巨大的益处。尽管很多人都能够设计服务器，但如今整个行业基本上都以我们的设计为标准。这意味着整个供应链都是围绕我们的设计建立起来的，从而提高了生产效率，降低了成本，为我们节省了数十亿美元。这实在是太好了。

开源可以以多种方式帮助我们。其中一种方式就是，如果人们能够找到更经济高效地运行模型的方法，那么这对我们来说将是一个巨大的利好。毕竟，我们在这上面的投入将达到数十亿，甚至数百亿美元。因此，如果我们能够提高10%的效率，那么我们将能够节省数十亿或数百亿美元。而且，如果市场上还有其他竞争模型存在，我们的开源行为并不会给予某个模型疯狂的优势。相反，它将促进整个行业的进步和发展。

帕特尔：你如何看待模型训练是否会走向商品化的趋势呢？

扎克伯格：我认为训练的发展有多种可能性，其中商品化确实是其中之一。商品化意味着随着市场上选择的增多，训练的成本将大大降低，变得更加亲民。另一种可能性是质量的提升。你提到了微调，目前对于许多大型模型来说，微调的选项仍然相当有限。虽然有些选择存在，但通常并不适用于最大的模型。如果我们能够克服这一挑战，实现更广泛的微调功能，那么不同应用或特定用例中将能够展现出更多样化的功能，或者将这些模型集成到特定的工具链中。这不仅可以加速开发进程，还可能促成质量上的差异化。

这里，我想用一个类比来说明。在移动生态系统中，一个普遍存在的问题是存在两家守门人公司——苹果和谷歌，它们对开发者构建的内容施加限制。从经济层面来看，这就像我们在构建某样东西时，它们会收取高额费用。但更让我担忧的是质量层面。很多时候，我们想要发布某些功能，但苹果却会拒绝，这确实令人沮丧。因此，我们需要思考的是，我们是否正在为人工智能设置一个由少数几家运行封闭模型的公司主导的世界，它们控制着API，从而决定开发者能够构建什么？就我们而言，我可以肯定地说，我们构建自己的模型是为了确保不会陷入这种境地。我们不希望其他公司来限制我们的创新能力。从开源的角度来看，我认为许多开发者也不希望受到这些公司的限制。

因此，关键问题在于围绕这些模型构建的生态系统会呈现出怎样的面貌？将会涌现出哪些有趣的新事物？它们能在多大程度上改进我们的产品？我相信，如果这些模型的发展最终能够像我们的数据库、缓存系统或架构那样，社区将能够为其贡献宝贵的价值，使我们的产品更加出色。当然，我们仍将努力保持独特性，不会受到太大影响。我们将能够继续专注于我们的核心工作，并从中受益。同时，随着开源社区的发展，所有的系统，无论是我们自己的还是社区的，都将得到改进和提升。

然而，也存在一种可能性，即模型本身最终可能会成为产品。在这种情况下，是否选择开源就需要进行更为复杂的经济考量。因为一旦选择开源，就相当于在很大程度上将自己的模型商品化。但从我目前所观察到的情况来看，我们似乎还没有达到那个阶段。

帕特尔：你期待通过向云提供商授权你的模型来获得可观的收入吗？也就是说，你希望他们支付费用以便在其平台上提供模型服务。

扎克伯格：是的，我们确实期待与云提供商达成这样的授权协议，并期望从中获得可观的收入。这基本上就是我们为Llama所设定的许可协议。在多个维度上，我们采取了非常宽容的开源许可策略，为社区和开发者提供了广泛的使用权限。但我们对使用它的最大公司设置了限制。这样的限制设置并非出于阻止他们使用模型的目的，而是希望他们在打算直接利用我们所构建的模型进行转售并从中获取商业利益时，能够与我们进行沟通和协商。如果是像微软Azure或亚马逊AWS这样的云服务提供商，打算将我们的模型作为你们的服务的一部分进行转售，那么我们期望能够从中获得一定的收入分成。

帕特尔：你关于权力平衡的观点非常合理，我们确实需要思考如何通过更好的技术对齐或其他方法来消除潜在的危害。我希望Meta能够建立一个明确的框架，就像其他实验室所做的那样，明确在某些具体情况下，开源甚至潜在的部署都是不可行的。这样的框架不仅有助于公司为潜在的风险做好准备，也能让人们对此有所期待。

扎克伯格：你说得对，关于存在性风险的问题确实值得我们深入关注。然而，目前我们更关注的是内容风险，即模型可能被用于制造暴力、欺诈或其他伤害他人的行为。尽管讨论存在性风险可能更具吸引力，但实际上，我们目前更需要投入精力去减轻的是这种更常见的危害。对于当前的模型，甚至可能是下一代模型，我们需要确保它们不会被用于欺诈等恶意行为。作为一家大公司，Meta有责任确保我们在这方面做得足够好。当然，我们也有能力同时处理这两方面的问题。

帕特尔：就开源而言，我感到好奇的是，你认为PyTorch、React、Open Compute等开源项目对世界的影响，是否有可能超越Meta在社交媒体方面的影响？我曾与这些服务的用户交流过，他们认为这种可能性是存在的，毕竟互联网的大部分运行都依赖于这些开源项目。

扎克伯格：我们的消费产品确实在全球范围内拥有庞大的用户基础，几乎覆盖了全世界一半的人口。然而，我认为开源正成为一种全新的、强大的构建方式。它可能会像贝尔实验室一样，最初他们研发晶体管是为了实现长途通话，这一目标确实实现了，并为他们带来了可观的利润。但5到10年后，当人们回顾他们最引以为傲的发明时，可能会提到其他更有深远影响的技术。我坚信，我们构建的许多项目，如Reality Labs、某些AI项目以及一些开源项目，都将对人类的进步产生持久而深远的影响。虽然具体的产品会随着时间的推移不断发展、出现和消失，但它们对人类社会的贡献却是持久的。这也是我们作为技术从业者能够共同参与的、令人振奋的部分。

帕特尔：关于你们的Llama模型，它何时会在你们自己的定制芯片上进行训练？

扎克伯格：很快，我们正在努力推动这一进程，但Llama-4可能不是首个在定制芯片上进行训练的模型。我们的策略是先从处理排名、推荐等类型的推理任务开始，比如Reels、新闻推送广告等，这些任务之前消耗了大量的GPU资源。一旦我们能够将这些任务转移到我们自己的芯片上，我们就能将更昂贵的英伟达GPU用于训练更复杂的模型。我们期望在不久的将来，能够使用自己的芯片首先训练一些相对简单的模型，并最终拓展到训练这些庞大的模型。目前，这个项目正在顺利进行中，我们有一个清晰且长远的规划，正有条不紊地推进。

8、假设成为Google+的CEO

帕特尔：最后一个问题：如果你被任命为Google+的CEO，能否带领它成功？

扎克伯格：Google+？噢，我不知道。

帕特尔：好吧，那么真正的最后一个问题将是：当谷歌推出Gemini时，你们是否感受到了压力？

扎克伯格：问题在于，Google+并非没有CEO，它仅仅是谷歌公司内部的一个部门。在我看来，对于大多数公司，尤其是达到一定规模的企业而言，专注才是至关重要的。初创公司或许在资金上捉襟见肘，它们正在验证一个想法，可能并未拥有全部所需资源。但随着业务的发展，企业会跨越某个门槛，开始构建更多的元素，并在这些元素之间创造更多的价值。然而，企业中总会发生一些出乎意料而又令人惊喜的事情，这些都是宝贵的。但总的来说，我认为公司的能力在很大程度上受限于CEO和管理团队所能监督和管理的事务范围。因此，对我们来说，保持主要事务的优先地位，并尽可能专注于关键事项，是极为重要的。正如风投家本·霍洛维茨（Ben Horowitz）所言：“保持主要的事情才是主要的事情”。

编辑/jayden

Source: Tencent Technology

On April 19, according to foreign media reports, on Thursday local time in the US, Facebook parent company$Meta Platforms (META.US) $It launched Llama 3, its most powerful open source artificial intelligence (AI) model to date, with the intention of catching up with the leader OpenAI in fierce industry competition. The Llama 3 model released this time includes two versions with 8 billion and 70 billion parameters, and a top version with more than 400 billion parameters will be launched in the future, highlighting Meta's ambitions in the field of AI.

It is reported that Llama 3 has shown excellent performance in a number of industry benchmarks, and has added many features, such as improved reasoning ability. Meta plans to deeply integrate Llama 3 into its virtual assistant Meta AI. This assistant has been widely used in popular apps such as Facebook, Instagram, WhatsApp, and Messenger, and is about to usher in a new round of updates to bring users a smarter and more convenient experience.

In addition, Meta also announced that Llama 3 will soon be launched on platforms such as Amazon AWS, Google Cloud, IBM's cloud platform WatsonX, Microsoft Cloud Azure, and Nvidia's Nim, and is supported by hardware giants such as AMD, Dell, Intel, and Nvidia. This series of collaborations and integrations will undoubtedly further accelerate the spread and application of Llama 3 worldwide.

On the big day of Meta's launch of Llama 3, the company's CEO Mark Zuckerberg (Mark Zuckerberg) was interviewed by well-known tech podcast host Dwarkesh Patel (Dwarkesh Patel). They had in-depth discussions on topics such as Llama 3, general artificial intelligence (AGI), energy bottlenecks, the strategic significance of artificial intelligence technology, the potential risks of open source, and the metaverse (metaverse). At the same time, Zuckerberg also shared the decision-making process for the open source $10 billion model and custom chip source code.

The following is a transcript of this interview:

1. Llama 3 top version is in training

Patel: Mark, it's a great honor to invite you to our podcast.

Zuckerberg: Thanks for the invitation, Patel. I'm so happy to be here, I've always loved your podcast.

Patel: Great, thank you! Now let's talk about Llama 3 first! Please share with me some highlights and exciting new developments about this latest big model and Meta AI.

Zuckerberg: I think most people are probably more concerned about the new version of Meta AI, but in reality, our efforts to upgrade the model are our top priority. We're launching Llama 3. We're not only providing it to the developer community as an open source project, but we're also using it to support Meta AI. I'm sure we'll have lots of interesting things to talk about about Llama 3. But I think the most important thing is that now we believe Meta AI is the smartest and most freely available AI assistant, and people can use it anytime, anywhere.

Additionally, we have integrated real-time knowledge from Google and Bing to enable AI assistants to provide more accurate and comprehensive information. We plan to make it more prominent in our apps, like at the top of Facebook and Messenger, where you'll be able to ask any questions directly using the search box. In addition to these, we've also added some new creative features, which I think are really cool, and I'm sure everyone will love them.

Especially the animation feature, you can easily animate any image, which is really fun. One amazing feature here is that it can generate and update high-quality images in real time as you type. You only need to enter the content of your query, such as “Show me a scene of eating macadamias and drinking beer in the field, with cows and mountains in the background,” and it will update the image in real time based on your input. This experience is simply amazing. I'm sure everyone would love this feature.

These are some of the obvious changes most people will see. We're rolling out these new features gradually, and while they aren't currently available globally, we'll start with some countries and gradually expand over the next few weeks and months.

I think this is going to be a huge breakthrough, and I'm excited to let everyone experience it. But if you want to get to the bottom of it, Llama 3 is definitely our most technically interesting part. We are training three Llama 3 models of different sizes, including 8 billion parameter, 70 billion parameter, and 405 billion parameter versions.

Currently, the first two versions are ready, and the largest model is still being trained. Although we can't immediately release the 405 billion parameter version today, I'm confident about the performance of the 8 billion and 70 billion parameter models. They are all industry-leading in terms of size, and we'll be publishing our benchmark results in detail in blog posts, so everyone can get an in-depth understanding of their performance.

Of course, Llama 3 is open source, which means developers will have the chance to try it out for themselves and explore its potential. We also have a series of carefully planned release roadmaps that will bring features such as multi-modality, more language support, and a longer context window (Context Window, which refers to the range of text fragments that the language model takes into account when generating text). It is expected that later this year we will launch that exciting 405 billion parameter version. Based on current training progress, its MMLU (Cross-Modal Learning Understanding) score is close to 85, and we expect it to show excellent performance in many benchmarks.

As for the model with 70 billion parameters, it performed just as well. Today we officially released it, it has an MMLU score of around 82, and has achieved great results in math and reasoning. I believe it will be very interesting and meaningful for users to experience this model.

I want to emphasize that even the 8 billion parameter model's performance is almost comparable to the Llama-2 version we released before. This means that even the “smallest” Llama-3 is almost as powerful in terms of functionality as the “biggest” Llama-2.

Patel: Before we dive deeper into these models, I'd like to go back in history. I remember that in 2022, Meta faced a sharp drop in stock prices, when people were puzzled about your large investment in purchasing Nvidia H100 chips. The concept of the metaverse was not widely recognized by the market. I'm guessing you made the decision to invest in H100 at the time based on what considerations? How did you anticipate the demand for these GPUs?

Zuckerberg: I think we were in the development phase of the Reels project at the time. We have always believed in reserving sufficient capacity to handle unforeseen innovations, and the Reels project is one such example. We discovered that in order to train the model, we needed more GPUs. This is a huge shift because our service is no longer simply arranging content based on people or pages you follow, but is now strongly recommending so-called “unconnected content” — content from people or pages you don't follow.

As a result, the set of content candidates we may be presenting has surged from thousands to hundreds of millions. Naturally, this requires a brand-new infrastructure to support it. We're already building that infrastructure, but in keeping up with TikTok, we've run into bottlenecks and can't meet our expectations quickly. Seeing this situation, I realized, “We have to make sure we don't fall into this passive situation. So not only did we order enough GPUs to do the Reels and content sorting work, but we also doubled our orders.” We have always adhered to the principle that there will always be something new in the future that we cannot foresee, and we must be prepared for it.

Patel: Did you know it would be artificial intelligence?

Zuckerberg: We originally thought this should have something to do with training big models. But then I realized it was more closely related to content recommendations. Running a company is like playing a game; new challenges always arise. At the time, I put my heart and soul into developing Reels and other content recommendation features, hoping they would play a huge role. Today, Instagram and Facebook are able to show users content they're interested in, even if that content comes from people they haven't followed, which is certainly a huge leap forward. Looking back, that decision was certainly wise; it stemmed from the lessons we had left behind. That's not to say we were “far ahead”; in fact, many of the decisions seem right now because we've made mistakes and learned from them.

Patel: In 2006, you rejected the $1 billion takeover proposal, but I think there should be a price. You'd consider selling Facebook, right? Do you have an estimate in mind that you think, “This is the real value of Facebook, and they haven't given this price?” I know if you were offered $5 trillion, you'd be happy to accept it. So what do you think of this decision, and what kind of considerations was it based on?

Zuckerberg: I think it's mostly a matter of personal choice. Looking back then, I'm not sure if I was mature enough to make this kind of decision. Many people around here are discussing the $1 billion price, and they analyze it based on various reasons, such as expected revenue and size. But these are all far beyond the stage we were in at the time. To be honest, I didn't have enough financial knowledge to participate in this kind of discussion at the time, but deep down I had a firm belief in what we were doing.

I've also done simple analyses like “If I don't do this, what would I do? Actually, I love creating new things, helping people communicate, and understanding people's dynamics and human interactions. So, I think if I sell the company, I'll probably start another similar company, and I'm quite satisfied with the current company. So why sell it?” I think many of the big decisions people make are actually based on our beliefs and values. In fact, it's very difficult to accurately predict the future through analysis.

2. The road to AGI

Patel: Facebook Artificial Intelligence Research Institute (FAIR) has been around for a long time, and now it seems deeply embedded in the core of your company. I'd like to ask when building general artificial intelligence (AGI) or the ambitious goal you are pursuing became Meta's top priority?

Zuckerberg: Actually, this transformation has been happening quietly for a while. About 10 years ago, we founded FAIR. The original intention at the time was that many innovations would emerge on the way to general artificial intelligence or other similar goals, and these innovations would continue to drive the progress of our various businesses. Therefore, we did not conceive FAIR as an independent product, but formed it as a research team. Over the past 10 years, FAIR has created many unique results, bringing significant improvements to all of our products. It has driven development in many fields and inspired other innovators in these fields, and as a result, it has created more technology to improve our products. It really excites me.

In recent years, with the rise of ChatGPT and the emergence of diffusion models in the field of image creation, we have clearly felt a huge wave of change. These new technologies are amazing, and they will profoundly impact how people interact with apps. Therefore, we decided to form a second team — the General Artificial Intelligence Team — to incorporate these cutting-edge technologies into our products and build leading foundational models that can support all of our different products.

When we began this exploration, our initial idea was that many of the things we do have a strong social nature. It helps people interact with creators, helps people communicate with businesses, and helps businesses sell products or provide customer service. Additionally, it can be integrated into our apps, smart glasses, and virtual reality as an intelligent assistant. As a result, we weren't entirely sure at first whether a full general artificial intelligence would be needed to support these use cases. However, as we worked deeper into these subtleties, I gradually realized that support for general artificial intelligence is actually essential. For example, when developing Llama-2, we didn't prioritize coding features because people didn't ask Meta AI a ton of coding questions on WhatsApp.

Patel: Will they now?

Zuckerberg: I don't know, and I'm not sure if WhatsApp, Facebook, or Instagram will be an interface for users to ask lots of coding questions. Perhaps on our upcoming Meta.ai website, coding issues will be more common. However, over the past 18 months, we were surprised to learn that coding actually plays a critical role in so many fields, not just in the programming industry. Even if users don't directly ask coding-related questions, coding training models can help them answer questions more accurately and show excellent ability in reasoning in different fields. Taking Llama-3 as an example, we're focusing on optimizing it through extensive coding training, as this will make it perform well in every aspect, even if the user's primary focus isn't on coding issues.

The ability to reason is another excellent example. Imagine when you're talking to a creator, or trying to interact with a customer as a business, that interaction is far from a simple “you send a message, I reply” model. It involves a multi-step, deep thought process that requires us to think “How can we better achieve this person's goals?” Too often, customers aren't sure what they really need or how to ask questions exactly. So simply answering questions isn't the whole job of artificial intelligence. We need to think more comprehensively and in depth; this has actually turned into a question of reasoning. If a team makes a major breakthrough in reasoning and we're still in the basic chatbot stage, then our product will be overshadowed by what other teams have built. Ultimately, we realized that in order to stay ahead, we had to do our best to solve the problem of general intelligence, so we increased our bets and investments to ensure this breakthrough.

Patel: So, is the Llama version that solves all of these user use cases powerful enough to replace all the programmers in this building?

Zuckerberg: I think these technologies will gradually mature and show great potential over time. However, it is a complicated question as to whether Llama-10 or future versions will completely replace programmers. I don't think we're trying to replace humans, but rather hope to use these tools to empower people to do more jobs that were previously unimaginable.

Patel: Assuming our programmers will work 10 times more efficiently after using Llama-10 in the future?

Zuckerberg: I have high expectations for this. I am convinced that human intelligence isn't measured by a single standard, because everyone has unique skills and talents. At some point, artificial intelligence may surpass the capabilities of most humans in some ways, but it all depends on how powerful the model is. However, I think this is a gradual evolution process, and general artificial intelligence is not something that can be achieved overnight. We're actually gradually adding different capabilities to the model.

Currently, multimodality is our area of focus, from initial photos, images, and text to video in the future. Given our strong interest in the metaverse, 3D technology is also particularly important. Additionally, one modality I'm particularly interested in is emotional understanding, an area I rarely see other teams in the industry deeply research. After all, most of the functions of the human brain are dedicated to understanding others and interpreting expressions and emotions. I am convinced that if we can make a breakthrough in this area and enable artificial intelligence to truly understand and express emotions, then human-machine interaction will become more natural and deep than ever before.

You might think this is just a video or image category, but in reality, they're a very professional version of human emotion. Therefore, in addition to improving the model's ability to reason and remember, we also need to focus on many other different abilities. I believe that in the future, we won't be satisfied with just entering questions into a query window to find answers. We will have different memory storage methods or customized models, and these models will serve people in a more personalized way. These are all the different abilities AI needs to develop. Of course, we also need to address the size of the model. We care about both large models and how to run small models in limited space. For example, if you're running a large service like Meta AI, then it mainly relies on strong computing power on the server side. However, we also expect these advanced technologies to be incorporated into compact devices, such as smart glasses. Since smart glasses have very limited space, we need to develop an efficient and lightweight solution to adapt to this environment.

Patel: If we were to invest $10 billion, or eventually up to $100 billion, to implement intelligent reasoning on an industrial scale, what specific use cases would that money be used for? Is it analog technology? Or an artificial intelligence application in the metaverse? How can we effectively use data centers to support these use cases?

Zuckerberg: According to our predictions, intelligent reasoning will profoundly change almost all product forms. I think we'll see the advent of a Meta AI general assistant product in the future. This product will gradually evolve from traditional chatbots, from simply answering questions to being able to receive and perform more complex tasks. This will require a great deal of reasoning ability, and will also trigger a huge demand for computational power.

Furthermore, interaction with other intelligent subjects (agents, refers to the intelligent abilities and behavioral performance of artificial intelligence systems, including perception, cognition, reasoning, decision-making, and action, so as to play a leading role in the human-computer interaction environment and achieve intelligent interaction with humans) will also become an important part of our work, whether serving enterprises or creators. I think humans won't just interact with one generic artificial intelligence; every business will want to have artificial intelligence that represents its interests. These artificial intelligence will not be used primarily to sell competitors' products, but rather interact with businesses, creators, and consumers in unique ways.

It is particularly worth mentioning that creators will be an important group to benefit from this technology. With around 200 million creators on our platform, they generally don't feel like they have enough time every day, and their community is eager to interact with them. If we could develop a technology that allows creators to train their artificial intelligence and use it to stay engaged with the community, that would be a very powerful feature.

These are just some of the consumer use cases. Take the Chan-Zuckerberg Foundation, which my wife and I run, for example. We are doing a lot of work in the field of science, and artificial intelligence will undoubtedly play a key role in advancing science, healthcare, etc. Ultimately, intelligent reasoning will impact almost every product and economic sector.

Patel: You mentioned artificial intelligence that can perform multi-step tasks, which makes me curious; does that mean we need a larger model to achieve this functionality? For example, for Llama-4, do we need a version with 70 billion parameters that can show amazing abilities just by training on the right data? At present, what are the main aspects of our progress? Is it an increase in the size of the model? Or, as you said before, keep the model size the same, but have more diverse features and application scenarios?

Zuckerberg: We probably don't have a clear answer to this question right now. But one clear trend I've observed is that we have a basic Llama model and then build some application-specific code around it. Some of this code is fine tuning for specific use cases, but others are logic about how Meta AI can collaborate with tools like Google, Bing, etc. to obtain real-time knowledge, which is not part of Llama's basic model. During the development of Llama-2, we tried to incorporate some of these features into the model, but more by hand. For Llama-3, we set a goal of embedding more of this kind of functionality into the model itself. As we begin to explore more intelligent subject-like behaviors, I think some of these features still need to be optimized by hand. For Llama-4, our goal is to naturally incorporate more of these features into the model.

As you progress every step of the way, you can sense where the future is likely to go. We started experimenting with all kinds of possibilities and experimenting around the model. This helps us better understand what features should be included in the next version of the model. In this way, our model can become more versatile, because apparently any feature implemented by manual coding, while unlocking some use cases, is inherently fragile and not generic enough. Our goal is to enable the model to learn and evolve on its own to adapt to various complex and changing scenarios.

Patel: What you mentioned about “incorporating more content into the model itself” can you explain in detail how you incorporated these desired features into the model through training? What exactly do you mean by “incorporating the model itself”?

Zuckerberg: Take Llama-2 as an example. Its tools use relatively specific and limited features. And when it came to Llama-3, we were happy to see that its ability to use tools has improved significantly. Now we don't have to manually code everything to be able to search with Google; it's already capable of doing these tasks independently. Similarly, Llama-3 has demonstrated outstanding abilities in programming, running code, and a range of other tasks. Once we have this ability, we can anticipate what new possibilities we can start exploring next. We don't have to wait until Llama-4 comes out to start building these abilities, so we can do all kinds of experiments and experiments around it ahead of time. While these hand-coded processes may temporarily make the product better, they also point us in the direction of what we should build in the next version of the model.

Patel: What use cases are you most looking forward to seeing in the open source community's fine-tuning of Llama-3? Maybe not the one that is most useful to you, but the one you are most interested in and want to try. For example, I've heard that some people have fine-tuned aspects of ancient history so that we can have direct conversations with historical figures such as the ancient Roman poet Virgil (Virgil).

Zuckerberg: I think the beauty of this kind of thing is that it always surprises us. Any specific use case we think is valuable is likely to be tried and built. I'm sure we'll see more stripped-down versions of the models appearing. I'm also looking forward to seeing a model with fewer parameters and a lighter weight model, such as a model with only 1 billion to 2 billion parameters, or even a model with 500 million parameters, to see what interesting and efficient applications they can bring. If an 8 billion parameter model is almost as powerful as the biggest Llama-2 model, then a 1 billion parameter model should also be able to show its unique value in some fields. They can be used for classification tasks or for pre-processing before people understand user query intent and pass it on to more powerful models for accurate processing. I think this will be an area where the community can play a huge role in helping us fill the gaps in the application of these models. Of course, we are also considering streamlining and optimizing these models, but currently all of our GPU resources are mainly used to train models with 405 billion parameters.

Patel: The number of GPUs you mentioned before, I remember you said it would reach 350,000 by the end of the year.

Zuckerberg: Yes, that's our overall goal. At present, we have set up two large GPU clusters, each with about 22,000 to 24,000 GPUs, which are mainly used to train large models. Of course, these clusters are also responsible for many other important training tasks for our company, such as the Reels model, Facebook news feed, and Instagram feed training. Reasoning is really a huge challenge for us because we need to serve a huge user base. Compared to other companies doing similar work, we probably require a much higher ratio of inference computation to training computation, mainly because of the sheer size of the community we serve.

Patel: I noticed that in the material you shared earlier, one very remarkable point is that the amount of data you used when training the model actually exceeds the amount of optimal data used only for training. Given how important reasoning is to you and the community at large, it really makes sense to have a model with trillions of tokens.

Zuckerberg: Regarding the 70 billion parameter model, we observed an interesting phenomenon. I originally thought that as the amount of data increased, the performance improvement of the model would gradually become saturated. However, after training around 15 trillion tokens, we discovered that the model is still learning. Even in the final stages of training, it still showed strong learning abilities. We may be able to continue entering more tokens into it to further improve its performance.

But as the company's operator, we need to make a decision at some point: should GPU resources continue to be used to further train this 70 billion parameter model? Or should we move in a different direction, like start testing new hypotheses for Llama-4? We need to find a balance between the two. Currently, I think we've achieved a good balance in this version of the 70 billion parameter model. Of course, in the future, we will launch other versions, such as the multi-modal version with 70 billion parameters, which we will see for some time to come. But one very fascinating thing is that the current model architecture can accommodate such a huge amount of data.

3. Energy bottlenecks

Patel: That's really thought provoking. So what does this mean for future models? You mentioned earlier that the 8 billion parameter version of Llama-3 surpassed Llama-2 with 70 billion parameters in some ways.

Zuckerberg: No, no, I don't want to exaggerate. Their performance is actually quite close; they are very similar on an order of magnitude.

Patel: So, can we expect the 70 billion parameter version of Llama-4 to be comparable to the 405 billion parameter version of Llama-3? What will the future development trend be like?

Zuckerberg: That's really a big problem. Honestly, no one can predict that accurately. One of the hardest things to predict in the world is the trend of exponential growth. How long will it last? I am convinced that we will continue to move forward. I think it's worth investing $10 billion, or even $100 billion or more, to build infrastructure. Assuming this growth trend continues, we'll have some truly impressive results to create amazing products. But no one in the industry can tell you for sure that it will continue to expand at that rate. Historically, we have always encountered development bottlenecks at some point. But today, expectations are high in this field, and perhaps these bottlenecks will soon be overcome. This is indeed a question worth pondering.

Patel: What would the world look like without these bottlenecks? As unlikely as this may seem, what if technological progress can really continue to advance at this rate?

Zuckerberg: Either way, there will always be new challenges and bottlenecks. GPU production has been an obvious issue for the past few years. Even for companies that have the money to buy GPUs, it's often difficult to get the quantity they need because supply is limited. However, the situation appears to be gradually improving. Today, we're seeing more and more companies considering investing heavily to build infrastructure to produce GPUs. I think this will continue for a while.

Furthermore, capital investment is also an issue to consider. When is it no longer cost-effective to invest more capital? In fact, I think energy issues will come first before we have capital investment problems. As far as I know, no one has yet been able to build a single gigawatt training cluster. We'll run into things that will become increasingly difficult around the world, such as obtaining energy permits. This isn't just a software issue; it involves strict government regulation, which I think is stricter than many of us in the tech world feel. Of course, if you're starting out with a small company, this probably isn't that strong of a feeling. But when we deal with different government departments and regulators, we need to abide by a number of rules and ensure that we are compliant globally. But there is no doubt that the energy aspect will be one of the major limitations we face.

If you're talking about building a large new power plant or large building and need to span other private or public land to build transmission lines, then this is a highly regulated project. What you need to consider is a multi-year lead time. If we wanted to build a huge facility, powering it would be a long and complex project. I'm sure people will work hard to achieve this goal, but I don't think it will be as simple and amazing as reaching some level of artificial intelligence, getting a lot of capital, and then suddenly the model will advance by leaps and bounds.

Patel: On the way to advancing the development of artificial intelligence, will we encounter some bottlenecks that even companies like Meta cannot overcome alone? Are there certain projects that even companies like Meta don't have enough resources to complete? Even if your R&D budget or capital expenditure budget increases tenfold, you still can't implement it? Is this what you think, but given Meta right now, you can't even raise enough capital by issuing stocks or bonds?

Zuckerberg: The energy issue is certainly one of the major challenges. I am convinced that if we can solve the energy supply problem, it is entirely possible that we can build a larger computing power cluster than we have today.

Patel: So, is this essentially a funding bottleneck?

Zuckerberg: Funding is certainly one aspect, but I think time is also a factor that cannot be ignored. Currently, many data centers are around 50 megawatts to 100 megawatts, and large ones may reach 150 megawatts. Assuming you have a complete data center with all the necessary training equipment, you've built the largest cluster that current technology allows. I think a lot of companies are close to or have reached this level. But when we talk about building 300 megawatts, 500 megawatts, or even 1 gigawatt data centers, the situation is completely different. Currently, no one has tried to build a 1 gigawatt data center. I'm sure this will be possible, it just takes time to accumulate. However, this won't happen next year, as many of the things involved will take years to complete. From this perspective, I think a 1 gigawatt data center would require an energy supply equivalent to a nuclear power plant to support model training.

Patel: Has Amazon tried this yet? They appear to have a 950 megawatt facility.

Zuckerberg: I don't really understand Amazon's specific practices; you might need to ask them directly.

Patel: Training doesn't have to be limited to a single location, right? If distributed training is effective, then we can actually consider spreading it across multiple locations.

Zuckerberg: I think this is a very important question about how to train large models in the future. Judging from the current development trend, generating synthetic data through inference and then using this data for model training seems to be a promising direction. Although I still don't know what the ratio between this synthetic data and direct training will be, I believe the generation of synthetic data is getting closer to the inference process to some extent. Obviously, if this approach is used to train models, then it will be an integral part of the overall training process.

Patel: So, it's still an open question about how to find that balance and where it will go in the future. So is it possible that this trend will be implemented on Llama-3, or even Llama-4 and later? In other words, if you publish models, entities with powerful computing power, such as Kuwait or the UAE, they can use such models to make certain applications more intelligent.

Zuckerberg: I totally agree with that possibility. Indeed, I think there will be such a dynamic development in the future. But at the same time, I also think the model architecture itself has some fundamental limitations. Take Llama-3 as an example. Although we have made significant progress, I believe there is room for further optimization of its architecture. As I said before, we feel that the model's performance can continue to improve by providing more data or iterating on some of the key steps.

In fact, we've seen many companies build new models based on Llama-2's 70 billion parameter model architecture. However, for models such as Llama-3's 70 billion or 405 billion parameters, it is not easy to make intergenerational improvements, and no similar open source model has yet appeared. I think this is a huge challenge, but it's also a huge opportunity. However, I still believe that what people can build based on the existing model architecture isn't infinitely scalable. Until we reach the next technological leap, we may only be able to make a few optimizations and improvements on what we already have.

4. Will AI get out of control overnight?

Patel: Now let's take a broader perspective. How do you think artificial intelligence technology will develop in the next few decades? Does it make you feel like another technology, like the metaverse or social technology, or do you think it's fundamentally different in human history?

Zuckerberg: I think artificial intelligence will be a very basic technology. It's more like the invention of computers, and will spawn a whole new set of applications. Just like the advent of the internet or mobile phones, making many things that were impossible before possible, people are beginning to rethink these experiences. So I think AI will bring about similar changes, but it's a deeper innovation. My feeling is that it's like a shift from not having a computer to having a computer. However, it is really difficult to predict exactly how it will develop. Judging from a longer cosmic time span, this transformation will occur very soon, possibly within a few decades. Some people do worry that it will quickly get out of control and go from a certain level of intelligence to extremely smart overnight. But I think this is unlikely to happen due to many physical limitations. I don't think we're going to face a situation where artificial intelligence gets out of control overnight. I'm sure we'll have plenty of time to get used to it. But artificial intelligence will really change the way we work, providing people with innovative tools to do different things. It will give people more freedom to pursue what they really want to do.

Patel: Maybe it wasn't overnight, but from a cosmic time perspective, do you think we can look at these milestones that way? Humans evolved, then artificial intelligence appeared, and then they might go to the Milky Way. This could take decades, or maybe a century, but is this a grand plan happening in your eyes? I'm referring to other technologies like computers or even fire, but is the development of artificial intelligence itself as important as the initial evolution of humans?

Zuckerberg: I think that's hard to judge. Human history is essentially a process of coming to terms with the gradual realization that we are not unique in some ways, yet at the same time that humans are still very special. We thought the Earth was the center of the universe, but that wasn't true, yet humans still have extraordinary characteristics, right? I think people often have another prejudice that intelligence and life are closely linked to one degree or another, but that's not true. We don't have a clear enough definition of consciousness or life to fully understand this issue. There are many science fiction novels that describe the creation of intelligent life, and these intelligences are beginning to show various human-like behaviors, etc. But current trends seem to suggest that intelligence can exist quite independently of consciousness, activism, and other traits, making it a very valuable tool.

5. The dangers of open source

Zuckerberg: Predicting where these things will evolve over time is extremely challenging, so I think anyone should avoid dogmatic planning for their development or use. Every time we release a new product, we need to re-evaluate it. We're very supportive of open source, but that doesn't mean we make all of our work public. I tend to think that open source is good for the community and ourselves because it will promote innovation. However, if at some point the capabilities of these technologies change qualitatively, and we feel that open source is irresponsible, then we will choose not to disclose it. All of this is fraught with uncertainty.

Patel: When you develop Llama-4 or Llama-5, are there any specific qualitative changes that will make you consider whether you should open source?

Zuckerberg: It's hard to answer this question from an abstract perspective because any product can have potential risks, and the key is how we effectively manage and mitigate those risks. In Llama-2, we've faced some challenges and invested significant resources to ensure that it isn't used for bad purposes such as acts of violence. That doesn't mean it has become an intelligent subject, just because it has a wealth of knowledge about the world and can answer a range of potentially risky questions. So I think the problem is how to identify and mitigate its potential bad behavior, not the behavior itself.

In my opinion, evaluating whether something is good or bad involves multiple dimensions, and it's hard to list all the possibilities in advance. Using social media as an example, we've addressed many types of harmful behaviors and grouped them into 18 or 19 categories. We've built artificial intelligence systems to recognize these behaviors to reduce their occurrence on our platform. As time goes on, I'm sure we'll further refine these classifications. This is an issue we've been working hard to research because we want to make sure we have a deep understanding of it.

Patel: I think it's very important to deploy artificial intelligence systems widely so that everyone has the opportunity to use them. I would be disappointed if future artificial intelligence systems were not widely used. At the same time, I'd like to learn more about mitigating potential risks. If mitigation measures are mostly fine-tuning, the benefit of open model weights is that people can make deeper adjustments based on these capabilities. Currently, these models are far from reaching that level; they are more like advanced search engines. But if I could show them my Petri dish and let them explain why my smallpox samples aren't growing and how to improve, then how can I ensure safe and effective use of these models in this case? After all, someone might fine-tune these models to suit their needs.

Zuckerberg: It's a complicated question, really. I think most people would choose to just use ready-made models, but there are also those with bad intentions who might try to use these models for bad behavior. So the question is really worth pondering. From a philosophical point of view, the reason I'm so supportive of open source is because I think if artificial intelligence becomes too centralized in the future, the potential risks may be as high as its widespread spread. Many people are wondering, “If we can do this, would the widespread application of these technologies in society be a bad thing?” At the same time, another question worth thinking about is if an organization has more powerful artificial intelligence than everyone else, is that also a bad thing?

I can explain it by using an analogy in the field of security. Imagine if you were able to understand and exploit certain security flaws in advance, then you could almost easily hack into any system. This isn't limited to the field of artificial intelligence. So we can't just rely on a highly intelligent artificial intelligence system to identify and fix all bugs, even though this seems theoretically possible. So how is our society dealing with this problem? Open source software plays an important role in this. It allows software improvements not to be limited to a single company, but can be widely used in various systems, including banks, hospitals, and government agencies. As software continues to improve, standards are gradually being established on how these software works, thanks to more people being able to participate in reviewing and testing. When upgrades are needed, the world can act quickly together. I think in a world where AI is widely deployed, these AI systems will gradually be hardened over time, and all the different systems will be controlled in some way.

In my opinion, this distributed and widely deployed approach is healthier than a centralized approach. Of course, there are risks in every way, but I don't think people have fully discussed those risks. There is a real risk of artificial intelligence systems being used for bad behavior. What I'm more concerned about, however, is that an untrustworthy entity has a super powerful artificial intelligence system, and I think this could be a bigger risk.

Patel: Will they try to overthrow our government because they have weapons others don't have? Or just creating a ton of chaos?

Zuckerberg: My instincts tell me that these technologies will eventually become very important and valuable for economic, security, and many other reasons. If our enemies or people we don't trust get more powerful technology, then this can indeed become a serious problem. Therefore, I think the best mitigation method may be to promote the development of good open source artificial intelligence, make it an industry standard, and play a leading role in many ways.

Patel: Open source AI systems can really help build a fairer and more balanced arena, which makes perfect sense in my opinion. If this mechanism works successfully, it is certainly the future I am looking forward to. What I want to explore further, however, is how can open source artificial intelligence prevent people from using their artificial intelligence systems to create chaos from a mechanistic perspective? For example, if someone tries to make a biological weapon, can we respond by carrying out extensive research and development around the world to develop a corresponding vaccine very quickly? What is the specific operating mechanism of this?

Zuckerberg: From the security perspective I mentioned earlier, I think people with weak AI systems will have a relatively low success rate when trying to hack into systems protected by stronger AI.

Patel: But how do we make sure everything in the world is handled properly like this? The case of biological weapons, for example, may not be so simple.

Zuckerberg: Indeed, I can't assert that everything in the world can be solved so smoothly. Biological weapons are one of the focuses of those who are deeply concerned about such issues, and I think this concern is justified. While there are mitigations, such as trying not to train certain knowledge in the model, we must recognize that in some cases this can indeed be a risk if extremely bad actors are encountered and there is no other artificial intelligence to counterbalance them and understand the severity of the threat. This is one of the issues we must take very seriously.

Patel: Have you experienced any unexpected situations when deploying these systems? For example, during training Llama-4, it may lie to you for some reason. Of course, this might not be common for systems like Llama-4, but have you considered a similar situation? For example, would you be very concerned about the deceptive nature of the system and the problems that billions of copies of this system could cause in the wild spreading freely?

Zuckerberg: Currently, we have observed many hallucinations. I think how to differentiate between illusion and deception is an issue worth exploring in depth. Indeed, there are many risks and factors to consider. In running our company, I tried to at least balance these long-term theoretical risks with what I think actually exists right now. So when it comes to deception, my biggest concern is that someone might use this technology to create misinformation and spread it through our network or other networks. To combat this harmful content, we're building artificial intelligence systems that are smarter than adversarial systems.

This forms part of my understanding of the matter. By looking at the different types of harm people cause or attempt to cause on social networks, I've found that some of the damage isn't extremely confrontational. For example, hate speech isn't highly confrontational on some level because people aren't becoming more racist because of online speech. In this regard, I think artificial intelligence is generally more complex and faster than humans in dealing with these problems. However, we both have issues. People may act improperly for a variety of purposes, whether they are trying to incite violence or other inappropriate acts, but we also have to face a large number of misinformation, that is, we may have misexamined content that should not have been censored. This situation is certainly bothering many people. As a result, I believe the situation will gradually improve as artificial intelligence becomes more accurate in this regard.

Whether it's Llama-4 or the future of Llama-6, we all need to think deeply about the behaviors we observe, and not just us. You chose to open source this project partly because so many researchers are also working on it. Therefore, we hope to share our observations with other researchers, jointly explore possible mitigation strategies, and consider open sourcing everything while keeping everything safe. In the foreseeable future, I am optimistic that we will be able to do this. At the same time, in the short term, we can't ignore the problem that people are trying to misbehave with models today. Even though these acts are not devastating, we are aware of some of the serious everyday hazards in operating our services.

Patel: I found the synthetic data thing really interesting. Using the current model, through repeated use of synthetic data, there may be an asymptotic line of performance, which is theoretically justified. But let's say these models are smarter and can use techniques like the ones you mentioned in your paper or upcoming blog post to find the most correct chain of thought. So why don't you think this will lead to a cycle where the model gets smarter, produces better output, and in turn gets smarter, and so on? Of course, this change won't happen overnight, but after months or years of continuous training, it is possible that the model will become more intelligent.

Zuckerberg: I think this kind of cyclic improvement is possible within the parameters of the model architecture. However, as far as the current 8 billion parameter models are concerned, I don't think they can reach the same level as advanced models with tens of billions of parameters and incorporating the latest research results.

Patel: Regarding these models, they're also going to be open source, right?

Zuckerberg: Yes, that's true. However, all of this presupposes that we must successfully address the challenges and issues discussed previously. We certainly hope so, but I also know that at every stage of building software, although the software itself has huge potential and possibilities, its operation is still to some extent physically limited by the chip's performance. As a result, we are always faced with various physical constraints. How big a model can become really depends on how much energy we can capture and use to reason. I am very optimistic about the future of artificial intelligence technology and believe they will continue to rapidly evolve and improve. At the same time, I'm more cautious than some people. I don't think it's particularly easy to get out of control, but we still need to be alert and carefully consider all possible risks. So I think it makes a lot of sense to keep options open.

6. Caesar and the metaverse

Patel: OK, let's move on to another topic — the metaverse. In the long course of human history, which period would you most like to explore in depth? From 100,000 BC to the present, do you just want a glimpse of what it was like back then? Does this exploration have to be limited to the past?

Zuckerberg: Yes, I prefer to explore the past. I am fascinated by American history, classical history, and scientific history. I think it would be very interesting to be able to observe and understand how those major historical advances have taken place. What we can rely on, however, is only limited historical records. For the metaverse, I'm afraid it would be very difficult to completely recreate those periods of history that we have no record of. Actually, I don't think going back to the past would be one of the main applications of the metaverse. Although this kind of function may be useful in history teaching, etc., the most important thing for me is that no matter where we are in the world, we can interact and coexist with others in real time. I am convinced that this is a killer app.

In our previous conversation about artificial intelligence, we delved into many of the physical limitations behind it. One of the valuable lessons technology has taught us is that we should work to free more things from physical constraints and move to the software field, because software is not only easier to build and evolve, but also easier to popularize. After all, not everyone can own a data center, but many people can write code, get open source code, and modify and optimize it. The metaverse is the perfect platform to achieve this goal.

This will be a major disruptive change, and it will dramatically change the way people perceive gathering and interaction. As a result, people will no longer feel like they have to come together in person in order to accomplish many things. Of course, I am also convinced that in some situations, meeting in person is still irreplaceable. This isn't an either/or choice; the advent of the metaverse doesn't mean we have to completely abandon face-to-face communication. However, it does provide us with a new dimension, allowing us to socialize, connect, get work done more easily and efficiently, and play a huge role in many fields such as industry and medicine.

Patel: We mentioned one thing before, you're not selling your company for a billion dollars. You obviously also have strong beliefs about the metaverse, although the market is skeptical about it. I'm curious, what is the source of this confidence? You've said “oh my values, my instincts,” but that seems a bit general. If you can specify some traits relating to yourself, maybe we can better understand why you are so confident in the metaverse.

Zuckerberg: I think this involves a few different issues. First, what drives me to keep moving forward? We've already discussed a lot of topics. I love to create, especially around how people communicate, express themselves and work. When I was in college, I majored in computer science and psychology, and the intersection of these two fields has always been critical for me. This is also my strong driving force. I don't know how to explain it, but deep down I always feel like if I don't create something new, then I'm doing something wrong. Even when we developed a business plan to invest $100 billion in artificial intelligence or the metaverse, our plans already made it pretty clear that these projects, if successful, would bring huge returns.

But of course, you can't be sure of everything right from the start. People always have all kinds of arguments and questions. Like, “How can you be confident enough to do this?” For me, if one day I stop trying to create new things, then I lose myself. I'll go somewhere else and keep creating. Basically, I can't imagine myself just running something without trying to create something new that I think is interesting. For me it's not an issue if we want to try to build the next thing. I just can't stop creating. Not only in tech, but also in other areas of my life. For example, our family built a ranch in Kauai, and I personally participated in the design of all the buildings. When we started raising cows, I thought, “OK, I want to raise the best cows in the world.” Then we began planning how to build everything we needed to achieve this goal. This is me!

Patel: I've always been curious about one thing: in high school and college, when you were only 19, you read a lot of ancient and classic books. I was wondering, what important lessons have you learned from these books? Not only was the content you found interesting, but more importantly, considering that the scope of knowledge you were exposed to at the time was limited.

Zuckerberg: One thing that really fascinated me was how Caesar Augustus became emperor and worked to establish peace. At that time, people had no real concept of peace; what they understood was just a short break before the enemy attacked again. He had a vision to change the economy from reliance on mercenaries and militarism to achieving a positive peace game, an idea that was very novel at the time. This reflects a very basic truth: the boundaries of reasonable ways to work that people could imagine at the time.

This concept applies both to the metaverse and to fields such as artificial intelligence. Many investors and others have trouble understanding why we should open source these technologies. They might say, “I don't understand. Now that it's open source, wouldn't you spend less time creating your proprietary technology?” But I think this is a profound idea in the technical field; it actually created more winners. I don't want to stress this analogy too much, but I do think that many times, it's hard for people to understand the model that constructs things, why this would be a valuable thing for people, or why this would be a reasonable state of affairs in the world. In fact, there are far more reasonable things than one might think.

Patel: That's really interesting. Can I share my thoughts? It may be a bit off topic, but I think this is probably because some important figures in history have already made their mark when they were young. Caesar Augustus, for example, became an important figure in Roman politics when he was 19 years old. He led battles and established alliances. I wonder if you're 19 years old and have had a similar idea: “Since Caesar Augustus did it, then I can do it too.”

Zuckerberg: This is really an interesting observation. It not only comes from rich history, but also echoes the history of our United States. I love Picasso's words: “All children are artists, and the challenge is how to maintain their status as artists when they grow up.” When we're young, we're more likely to have crazy ideas. In your life, company, or whatever you've built, there's an analogy similar to the innovator's dilemma. Early in your career, you're more likely to reorient yourself and accept new ideas without being held back by commitments to other things. I think this is an interesting part of running a company: how do you stay active and how do you continue to innovate?

7. Open source model worth $10 billion

Patel: Let's get back to investors and open source. Imagine we have a model worth up to $10 billion, and this model has gone through a rigorous safety assessment. At the same time, evaluators can also fine-tune the model. So would you open source a $10 billion model?

Zuckerberg: As long as it works for us, then open source is an option worth considering.

Patel: But would you actually do that? After all, this is a model that cost $10 billion in R&D, and now it's time to open source it.

Zuckerberg: This is a question we need to carefully weigh over time. We have a long tradition of open source software. Generally speaking, we don't directly open source products, such as Instagram's code. However, we do open source a lot of the underlying infrastructure. For example, one of the biggest open source projects in our history is the Open Compute Project (Open Compute Project), where we open source all of our server, network switch, and data center designs. Ultimately, this has been a huge benefit for us. Although many people can design servers, today the entire industry basically uses our design as the standard. This means the entire supply chain is built around our design, which increases production efficiency, reduces costs, and saves us billions of dollars. This is really great.

Open source can help us in many ways. One way is that if people can find ways to run the model more cost-effectively, that would be a huge benefit for us. After all, our investment in this will amount to billions, or even tens of billions of dollars. So if we can improve efficiency by 10%, then we'll be able to save billions or tens of billions of dollars. Also, if there are other competitive models in the market, our open source behavior doesn't give a model a crazy advantage. Instead, it will promote the progress and development of the entire industry.

Patel: How do you think model training will be commercialized?

Zuckerberg: I think there are many possibilities for the development of training, and commercialization is indeed one of them. Commercialization means that as options increase in the market, the cost of training will be greatly reduced and become more friendly to the people. Another possibility is an increase in quality. You mentioned fine-tuning; currently, the options for fine-tuning are still quite limited for many larger models. While some options exist, they don't usually work for the biggest models. If we can overcome this challenge and achieve a wider range of fine-tuning capabilities, we will be able to show more diverse functionality in different applications or specific use cases, or integrate these models into specific toolchains. This not only speeds up the development process, but may also lead to quality differentiation.

Here, I'd like to use an analogy to illustrate. A common problem in the mobile ecosystem is the existence of two gatekeeper companies — Apple and Google — that place restrictions on what developers build. On an economic level, it's like when we build something, they charge a hefty fee. But what concerns me even more is the quality aspect. Very often, we want to release certain features but Apple refuses, which is really frustrating. So what we need to think about is, are we setting up a world for artificial intelligence dominated by a few companies that run closed models that control APIs to determine what developers can build? For our part, I can definitely say that we built our models to make sure we don't get into this situation. We don't want other companies to limit our ability to innovate. From an open source perspective, I don't think many developers want to be limited by these companies either.

So the key question is what will the ecosystem built around these models look like? What exciting new things will come out? To what extent can they improve our products? I believe that if these models eventually evolve like our databases, caching systems, or architectures, the community will be able to contribute valuable value to them and make our products even better. Of course, we will still try to be unique and not be greatly affected. We will be able to continue to focus on our core work and benefit from it. At the same time, as the open source community develops, all systems, whether our own or the community's, will be improved and upgraded.

However, there is also a possibility that the model itself may eventually become a product. In this case, choosing open source requires more complex economic considerations. Because once you choose open source, you're largely tantamount to commercializing your own model. But from what I've observed so far, it seems like we haven't reached that stage yet.

Patel: Are you looking forward to earning significant revenue by licensing your model to cloud providers? In other words, you want them to pay to provide model services on their platform.

Zuckerberg: Yes, we're really looking forward to such licensing agreements with cloud providers and expect to generate significant revenue from them. This is basically the license agreement we set up for Llama. In many dimensions, we have adopted a very permissive open source licensing strategy, providing a wide range of usage rights for the community and developers. But we've put limits on the biggest companies that use it. These restrictions are not intended to prevent them from using the model, but rather to allow them to communicate and negotiate with us when they intend to directly use the model we have built to resell and obtain commercial benefits from it. If it's a cloud service provider like Microsoft Azure or Amazon AWS that wants to resell our model as part of your service, then we expect to get a share of the revenue from it.

Patel: Your point about the balance of power is very reasonable; we really need to think about how to eliminate potential harm through better technical alignment or other methods. I want Meta to establish a clear framework, as other labs have done, that open source or even potential deployments aren't viable in some specific situations. Such a framework not only helps companies prepare for potential risks, but also makes people look forward to them.

Zuckerberg: You're right, the question of existential risk really deserves our close attention. Currently, however, we are more concerned about content risk, where models may be used to create violence, fraud, or other harmful behavior. Although discussing existential risk may be more appealing, it is actually this more common harm that we currently need to invest more effort into mitigating. For current models, and maybe even the next generation, we need to make sure they aren't being used for malicious acts such as fraud. As a big company, Meta has a responsibility to make sure we're doing a good enough job at this. Of course, we are also capable of dealing with both aspects at the same time.

Patel: As far as open source is concerned, what I'm curious about is, do you think the impact of open source projects such as PyTorch, React, and Open Compute on the world is likely to surpass Meta's influence on social media? I've talked to users of these services, and they think this possibility exists; after all, most of the operation of the internet depends on these open source projects.

Zuckerberg: Our consumer products do have a huge user base around the world, covering almost half of the world's population. However, I think open source is becoming a completely new, powerful way to build. It's probably like Bell Labs, where they initially developed transistors to enable long-distance calls, and this goal has actually been achieved and brought them significant profits. But after 5 to 10 years, when people look back at their proudest inventions, they may mention other more far-reaching technologies. I am convinced that many of the projects we have built, such as Reality Labs, some AI projects, and some open source projects, will have a lasting and profound impact on human progress. Although specific products evolve, appear, and disappear over time, their contributions to human society are enduring. This is an exciting part of what we can participate in as technology practitioners.

Patel: Regarding your Llama model, when will it be trained on your own custom chip?

Zuckerberg: Soon, we're working to push this process forward, but the Llama-4 may not be the first model to train on a custom chip. Our strategy is to start by processing inference tasks such as rankings and recommendations, such as Reels, news push ads, etc. These tasks previously consumed a large amount of GPU resources. Once we can transfer these tasks to our own chips, we'll be able to use the more expensive Nvidia GPUs to train more complex models. We expect to be able to train relatively simple models using our own chips in the near future, and eventually expand to training these huge models. Currently, this project is progressing smoothly. We have a clear and long-term plan and are progressing in an orderly manner.

8. Let's say you become the CEO of Google+

Patel: One last question: if you were named CEO of Google+, would you be able to lead it to success?

Zuckerberg: Google+? Oh, I don't know.

Patel: Okay, so the real last question would be: did you guys feel the pressure when Google launched Gemini?

Zuckerberg: The problem is that Google+ isn't without a CEO; it's just a division within Google. In my opinion, for most companies, especially those that have reached a certain size, focus is critical. Startups may be underfunded; they are testing an idea; they may not have all the resources they need. But as the business grows, companies will cross a certain threshold and begin to build more elements and create more value between these elements. However, something unexpected and surprising always happens in a business, and these are all valuable. But overall, I think the company's capabilities are largely limited by the scope of matters the CEO and management team can oversee and manage. Therefore, it is extremely important for us to keep our main matters a priority and focus as much as possible on the key things. As venture capitalist Ben Horowitz (Ben Horowitz) said: “Maintaining the main thing is the main thing.”

Editor/jayden

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

扎克伯格最新采访：Meta最强开源模型Llama 3凭什么值百亿美金