Track the latest AI trends

Topic 795 news 19553 Subscribers

"The AI God" delivered a powerful speech: the era of "pre-training" dominated by computational power will end, the next generation belongs to "inference"; if machines have consciousness, then let them awaken!

前OpenAI联合创始人、SSI创始人Ilya Sutskever发表演讲时表示，预训练作为 AI 模型开发的第一阶段即将结束。未来AI发展将聚焦于智能体、合成数据和推理时间计算。他详细解释了这三个方向的巨大潜力，例如，合成数据可以突破真实数据量的限制，而推理时间计算则可以提升AI的效率和可控性。

北京时间14日，在NeurIPS 2024大会上，前OpenAI联合创始人、SSI创始人Ilya Sutskever发表演讲时表示，预训练作为AI模型开发的第一阶段即将结束。

他将数据比作AI发展的燃料，指出由于我们只有一个互联网，数据增长已经触顶，AI即将进入“后石油时代”，而这，意味着依赖于海量数据的预训练模型将难以为继，AI发展亟待新的突破。

Ilya认为，未来AI发展将聚焦于智能体、合成数据和推理时间计算。他详细解释了这三个方向的巨大潜力，例如，合成数据可以突破真实数据量的限制，而推理时间计算则可以提升AI的效率和可控性。

Sutskever还认为，未来的 AI 系统将具备推理能力，不再仅依赖于模式匹配，并且自我意识将在人工智能系统中出现。

进一步，Ilya还深入探讨了未来的超级智能。他认为，超级智能将具备能动性、推理能力和自我意识，其行为将难以预测，呼吁业界为超级智能的到来做好准备。

要点如下：

预训练时代即将结束：数据是有限的，如同 AI 的化石燃料，我们已经达到了峰值，预训练之后的未来方向包括智能体、合成数据和推理时间计算等。
超级智能将是推理的时代：超级智能将具备真正的能动性，强大的推理能力，以及从有限数据中学习和理解的能力。
超级智能将是不可预测的未来：这与我们习惯的，基于人类直觉的深度学习截然不同，将带来全新的机遇和挑战。

Sutskever：预训练时代落幕，AI模型转向“代理性”

Sutskever指出，预训练作为AI模型开发的第一阶段即将结束。这一阶段依赖于从大量未标记数据中学习模式，而这些数据通常来自互联网、书籍等来源。

Sutskever提到，现有的数据资源已经达到峰值，未来的模型必须在有限的数据中寻找新的发展方式：

“我们的数据已经达到峰值，不会再有更多了。我们必须处理我们拥有的数据。互联网只有一个。”

今年11月，他在接受媒体采访时表态称，大模型预训练效果正趋于平缓：

"2010年代是扩展的时代，现在我们再次回到了探索和发现的时代。每个人都在寻找下一个突。扩展正确的东西比以往任何时候都更重要。”

Sutskever还预言，下一代AI模型将具有真正的“代理性”，能够自主执行任务、做出决策，并与软件交互。

他还表示，SSI正在研究一种替代预训练扩展的方法，但是没有透露更多细节。

AI自我意识或将诞生

Sutskever还预言未来的AI系统将具备推理能力，不再仅依赖于模式匹配，并且自我意识将在人工智能系统中出现。

根据Sutskever的说法，系统推理得越多，“它就越不可预测”。他与高级AI在国际象棋中的表现进行了比较：

“它们会从有限的数据中理解事物。它们不会感到困惑。”

Sutskever还将AI系统的规模与进化生物学进行了比较。他引用了显示不同物种大脑与体重关系的研究，指出人类祖先在这一比例上显示出与其他哺乳动物不同的斜率。

他建议，AI可能会发现类似的扩展路径，超越当前的预训练工作方式。

Sutskever：AI发展方向需要自上而下的监管

当被问及如何为人类创造合适的激励机制以确保AI的发展方向时，Sutskever称，这需要“自上而下的政府结构”，并未给出明确的答案。

“我觉得从某种意义上说，这些是人们应该更多地思考的问题。但我对回答这样的问题没有信心。”

他表示，如果AI最终选择与人类共存，并拥有权利，也许是可行的，尽管他对未来的不可预测性持谨慎态度。

以下为演讲全文：

Ilya Sutskever：

我首先要感谢组织者选择我们的论文给予支持，这真是太棒了。同时，我还要感谢我的杰出合作者 Oriol Vignales 和 Kwokli，他们刚才还站在你们面前。
现在你们看到的是一张截图，来自 10 年前，2014 年在蒙特利尔的 NeurIPS 会议上我做的类似演讲。那时我们还很天真。照片上是当时的我们（“之前”）。
这是现在的我们（“之后”）。现在，我希望我看起来更成熟，更有经验。
今天我想谈谈这项工作本身，并进行一个 10 年的回顾。因为这项工作中有很多观点是正确的，但也有一些不太正确。我们可以回顾一下，看看发生了什么，以及它是如何逐渐演变成今天的样子的。
我们先来回顾一下我们当时做了什么。我会展示 10 年前那次演讲的幻灯片。总的来说，我们做了以下三件事：
• 构建了一个基于文本训练的自回归模型
• 使用了一个大型神经网络
• 使用了大型数据集
就这么简单。现在我们深入探讨一下细节。
深度学习的假设
这是 10 年前的幻灯片，还不错吧？上面写着“深度学习的假设”。我们当时认为，如果有一个大型神经网络，它包含很多层，那么它就能在不到一秒的时间内完成人类可以完成的任何事情。为什么我们要强调人类在一秒内可以完成的事情？
这是因为，如果你相信深度学习的“教条”，认为人工神经元和生物神经元是相似的，或者至少没有太大的不同，并且你相信神经元是缓慢的，那么任何人类能够快速完成的事情，只要世界上有一个人能够在一秒内完成，那么一个 10 层的神经网络也能做到。逻辑是这样的：你只需要提取他们的连接方式，然后将其嵌入到你的人工神经网络中。
这就是动机。任何人类在一秒内可以完成的事情，一个大型 10 层神经网络都可以做到。我们当时关注 10 层神经网络，是因为那时我们只知道如何训练 10 层的网络。如果层数能更多，也许能做更多事情。但当时我们只能做到 10 层，所以我们强调的是人类在一秒内能完成的事情。
核心思想：自回归模型
这是当时演讲的另一张幻灯片，上面写着“我们的核心思想”。你可能认出来至少一个东西：这里正在发生自回归的过程。这张幻灯片到底在说什么？它在说，如果你有一个自回归模型，它能很好地预测下一个 token，那么它实际上会捕获、抓住下一个序列的正确分布。
这在当时是一个相对较新的想法。这并不是第一个自回归神经网络。
但我认为，这是第一个我们真正相信，如果训练得足够好，就能得到任何想要的结果的自回归神经网络。在当时，我们的目标是（现在看来很平常，但当时非常大胆）机器翻译。
LSTM：Transformer 之前的技术
接下来我要展示一些你们很多人可能从未见过的古老历史：LSTM。对于不熟悉的人来说，LSTM 是在 Transformer 出现之前，可怜的深度学习研究人员使用的东西。它基本上是一个旋转了 90 度的 ResNet。你可以看到它集成了残差连接（现在被称为残差流），但也有一些乘法运算。它比 ResNet 稍微复杂一点。这就是我们当时用的。
并行计算：管道并行
另一个我想强调的特点是并行计算。我们使用了管道并行，每个 GPU 处理一层。使用管道并行明智吗？现在看来，管道并行并不明智。但我们当时没那么聪明。通过使用 8 个 GPU，我们获得了 3.5 倍的速度提升。
结论：规模化假设
可以说是最重要的幻灯片，因为它阐述了规模化假设的开端：如果你有非常大的数据集，并且训练非常大的神经网络，那么成功就是必然的。如果你愿意往好的方面想，可以说，这确实就是之后发生的事情。
核心理念：连接主义
我还想提一个理念，我认为这个理念经受住了时间的考验。这就是连接主义。核心理念是：
如果你相信人工神经元有点像生物神经元，那么你就有信心相信大型神经网络（即使它们不完全像人类大脑那么大）可以被配置来完成我们人类所做的大部分事情。当然还是有差异，因为人类大脑会自我重构，而我们现在最好的学习算法需要大量的数据。人类在这方面仍然更胜一筹。
预训练时代
我认为所有这些都引领了预训练时代的到来。GPT-2 模型、GPT-3 模型、缩放法则，我要特别感谢我的前合作者：Alec Radford、Gerrit Kaplan 和 Daria Amodei，他们的工作至关重要。预训练是今天我们看到所有进步的驱动力。超大型神经网络，在海量数据集上训练。
预训练的终结,但预训练终将结束。为什么？因为虽然算力在不断增长，但数据并没有无限增长，因为我们只有一个互联网。你甚至可以说，数据是人工智能的化石燃料。它被创造出来，我们使用它，并且已经达到了数据峰值，不会有更多的数据了。我们只能处理现有的数据。尽管我们还有很多路要走，但我们只有一个互联网。
下一步是什么？
接下来我将稍微推测一下未来会发生什么。当然，很多人都在推测。你可能听说过“智能体”（agents）这个词。人们觉得智能体是未来。更具体一点，但也有点模糊的是合成数据。如何生成有用的合成数据仍然是一个巨大的挑战。还有推理时的算力优化，以及最近在 o1 模型中看到的，这些都是人们在预训练之后尝试探索的方向。
生物学启示：不同物种的大脑缩放
我还想提一个生物学的例子，我觉得非常有趣。多年前，我也在这个会议上看到一个演讲，演讲者展示了一个图表，显示哺乳动物的身体大小和大脑大小之间的关系。演讲者说，在生物学中，一切都很混乱，但这里有一个特例，即动物的身体大小和大脑大小之间存在紧密的关系。
我当时对这个图表产生了好奇，并开始在谷歌上搜索。其中一个图片结果是这样的：你可以看到各种哺乳动物，非人类灵长类动物也是如此。但接下来是人科动物，如尼安德特人，他们和人类的进化关系很近。有趣的是，人科动物的脑体缩放指数具有不同的斜率。
这意味着，生物学中存在一个例子，它展示了某种不同的缩放方式。这很酷。另外，我要强调一下，x 轴是对数刻度。所以，事物是有可能不同的。我们目前所做的事情，是我们第一个知道如何进行缩放的事情。毫无疑问，这个领域的所有人都会找到下一步的方向。
关于未来的推测
现在我想花几分钟推测一下更长远的未来，我们都将走向何方？我们正在取得进步，这真是太棒了。如果你是 10 年前就入行的，你会记得当时的技术有多么不成熟。即便你觉得深度学习是理所当然的，但亲眼看到它取得的进步还是令人难以置信。我无法向那些最近两年才加入这个领域的人传达这种感觉。但我要谈谈超级智能，因为这显然是这个领域的未来。
超级智能在性质上将与我们今天拥有的智能截然不同。我希望在接下来的几分钟里，给你一些具体的直觉，让你感受到这种不同。
现在我们拥有了强大的语言模型，它们是很棒的聊天机器人，它们甚至能做一些事情，但它们也常常不可靠，有时会感到困惑，同时在某些任务上又具有超人的表现。如何协调这种矛盾目前还不清楚。
但最终，以下情况将会发生：
这些系统将真正具有智能体的性质。而现在，它们在任何有意义的层面上都不是智能体，或者说只有非常微弱的智能体性质。它们会进行真正的推理。
我还要强调一点，关于推理：
一个系统越能进行推理，就变得越不可预测。我们现在使用的模型都是可预测的，因为我们一直在努力复制人类的直觉。我们大脑在一秒钟内的反应，本质上就是直觉。所以我们用一些直觉训练了模型。但推理是不可预测的。原因之一是，好的国际象棋 AI 对人类国际象棋高手来说是不可预测的。
所以，我们将来要处理的 AI 系统将是高度不可预测的。它们会理解有限的数据，它们不会感到困惑，这是它们目前存在的巨大局限。我不是说如何做到，也不是说何时做到，我只是说它将会发生。当所有这些能力都与自我意识相结合时（为什么不呢？自我意识是有用的），我们将拥有与今天截然不同的系统。它们将拥有令人难以置信的能力。但与这些系统相关的问题将与我们过去习惯的问题大相径庭。
预测未来是不可能的，一切皆有可能。但最后，我还是要以乐观的态度结束我的演讲。

以下为问答环节实录：

• 问题1：在 2024 年，是否有其他生物结构在人类认知中发挥作用，您认为值得像您之前那样去探索？
• 回答：如果有人对大脑的运作方式有独特的见解，并且认为我们目前的做法是愚蠢的，他们应该去探索它。我个人没有这样的想法。也许从更高的抽象层面来看，我们可以说，生物学启发的人工智能是非常成功的，因为所有的神经网络都是受生物启发的，尽管其灵感非常有限，比如我们只是使用了神经元。更详细的生物灵感很难找到。但如果有特别的见解，也许可以找到有用的方向。
• 问题2：您提到推理是未来模型的核心方面。我们看到现在模型中存在幻觉。我们使用统计分析来判断模型是否产生幻觉。未来，具有推理能力的模型能否自我纠正，减少幻觉？
• 回答：我认为你描述的情况是极有可能发生的。事实上，有些早期的推理模型可能已经开始具备这种能力了。长期来看，为什么不能呢？这就像微软 Word 中的自动更正功能。当然，这种功能比自动更正要强大得多。但总的来说，答案是肯定的。
• 问题3：如果这些新诞生的智能体需要权利，我们应该如何为人类建立正确的激励机制，以确保它们能像人类一样获得自由？
• 回答：这是一个值得人们思考的问题。但是我不觉得我有能力回答这个问题。因为这涉及到建立某种自上而下的结构，或者政府之类的东西。我不是这方面的专家。也许可以用加密货币之类的东西。如果 AI 只是想与我们共存，并且也想要获得权利，也许这样就挺好。但我认为未来太不可预测了，我不敢轻易评论。但我鼓励大家思考这个问题。
• 问题4：您认为大型语言模型（LLM）是否能够进行多跳推理的跨分布泛化？
• 回答：这个问题假设答案是肯定的或者否定的。但这个问题不应该用“是”或“否”来回答，因为“跨分布泛化”是什么意思？“分布内”又是什么意思？在深度学习之前，人们使用字符串匹配、n-gram 等技术进行机器翻译。当时，“泛化”意味着，是否使用完全不在数据集中的短语？现在，我们的标准已经大幅提高。我们可能会说，一个模型在数学竞赛中取得了高分，但也许它只是记住了互联网论坛上讨论过的相同想法。所以，也许它是在分布内，也许只是记忆。我认为人类的泛化能力要好得多，但现在的模型在某种程度上也能够做到。这是一个更合理的答案。

编辑/new

Ilya Sutskever, former co-founder of OpenAI and founder of SSI, stated during his speech that the pre-training phase of AI model development is coming to an end. The future development of AI will focus on Asia Vets, synthetic data, and reasoning time computation. He elaborated on the immense potential of these three directions, for instance, synthetic data can overcome the limitations of real data volume, while reasoning time computation can enhance the efficiency and controllability of AI.

On December 14, Beijing time, during the NeurIPS 2024 conference, Ilya Sutskever, co-founder of OpenAI and founder of SSI, stated in his speech that the pre-training phase of AI model development is coming to an end.

He compared data to fuel for AI development, pointing out that because there is only one Internet, data growth has peaked, and AI is about to enter the 'post-oil era', which means that pre-trained models relying on vast data will be difficult to sustain and that AI development urgently requires new breakthroughs.

Ilya believes that the future development of AI will focus on Asia Vets, synthetic data, and reasoning time computing. He explained in detail the great potential of these three directions, for instance, synthetic data can break the limitations of real data volume, while reasoning time computing can enhance AI's efficiency and controllability.

Sutskever also believes that future AI systems will possess reasoning capabilities, no longer relying solely on pattern matching, and that self-awareness will emerge in AI systems.

Furthermore, Ilya delved into the future of superintelligence. He believes that superintelligence will possess agency, reasoning ability, and self-awareness, with its behavior becoming difficult to predict, urging the industry to prepare for the arrival of superintelligence.

The key points are as follows:

The pre-training era is coming to an end: data is limited, like fossil fuel for AI, we have reached a peak, and the future directions after pre-training include Asia Vets, synthetic data, and reasoning time computing.
Superintelligence will be the era of reasoning: superintelligence will possess true agency, strong reasoning abilities, and the capability to learn and understand from limited data.
Superintelligence will be an unpredictable future: this is fundamentally different from our accustomed, intuition-based deep learning, bringing new opportunities and challenges.

Sutskever: The era of pre-training is coming to an end, AI models are shifting towards 'agency'.

Sutskever pointed out that the pre-training phase as the first stage of AI model development is nearing its conclusion. This phase relied on learning patterns from large amounts of unlabelled data, which usually came from Internet Plus-Related sources such as books.

Sutskever mentioned that existing data resources have reached their peak; future models must find new ways to develop with limited data:

"Our data has already peaked, there will be no more. We must work with the data we have. The Internet Plus-Related only has one version."

This November, he stated in a media interview that the pre-training effects of large models are becoming increasingly flat:

"The 2010s were the era of expansion, now we are back to the era of exploration and discovery. Everyone is looking for the next breakthrough. Getting the right expansion is more important than ever."

Sutskever also predicted that the next generation of AI models will possess true "agency," capable of autonomously executing tasks, making decisions, and interacting with Software.

He also stated that SSI is researching an alternative method for pre-trained expansion, but did not disclose further details.

AI self-awareness may be born.

Sutskever also predicted that future AI systems will possess reasoning abilities, no longer relying solely on pattern matching, and that self-awareness will emerge in AI systems.

According to Sutskever, the more the system reasons, "the more unpredictable it becomes." He compared it to the performance of advanced AI in chess:

"They will understand things from limited data. They will not feel confused."

Sutskever also compared the scale of AI systems to evolutionary biology. He cited research showing the relationship between different species' brains and body weight, pointing out that human ancestors displayed a different slope in this ratio compared to other mammals.

He suggested that AI might discover similar paths of expansion, transcending the current pre-training working methods.

Sutskever: The direction of AI development requires top-down regulation.

When asked how to create appropriate incentive mechanisms for humanity to ensure the direction of AI development, Sutskever stated that it requires a 'top-down government structure', without providing a clear answer.

'I think, in a sense, these are issues that people should think about more. But I have no confidence in answering such questions.'

He stated that if AI ultimately chooses to coexist with humanity and has rights, it might be feasible, although he remains cautious about the unpredictability of the future.

The following is the full text of the speech:

Ilya Sutskever:

I first want to thank the organizers for choosing to support our paper, it’s truly wonderful. At the same time, I want to thank my outstanding collaborators Oriol Vignales and Kwokli, who were just standing in front of you.
What you see now is a screenshot from 10 years ago, a similar speech I gave at the NeurIPS conference in Montreal in 2014. At that time, we were still quite naive. The photo shows us at that time ('before').
This is who we are now ("after"). Now, a wish is for a more mature and experienced appearance.
Today, a discussion will be about the work itself and a review of the last 10 years. There are many correct viewpoints in this work, but also some that are less correct. A review will help to see what happened and how it gradually evolved into what it is today.
First, a review will be done of what was done at that time. The slides from that presentation 10 years ago will be shown. Overall, the following three things were accomplished:
• Built a text-based autoregressive model
• Used a large neural network
• Utilized large datasets
It's that simple. Now, let’s delve into the details.
The assumptions of deep learning.
This is a slide from 10 years ago, not bad, right? It says, 'Assumptions of Deep Learning.' Back then, we believed that if there was a large neural network with many layers, it could accomplish anything a human could do in less than a second. Why did we emphasize what humans can accomplish in one second?
This is because if you believe in the 'dogma' of deep learning, thinking that artificial neurons and biological neurons are similar, or at least not very different, and you believe that neurons are slow, then anything a human can accomplish quickly, as long as there is one person in the world who can achieve it in one second, a 10-layer neural network can also do it. The logic is this: you just need to extract their connections and embed them into your artificial neural network.
That is the motivation. Anything a human can accomplish in one second can also be done by a large 10-layer neural network. We focused on 10-layer neural networks at that time because we only knew how to train networks with 10 layers. If more layers were possible, maybe more could be achieved. But at that time, we could only manage 10 layers, so we emphasized what humans can accomplish in one second.
Core idea: autoregressive model.
This is another slide from the presentation at that time, which says, 'Our Core Idea.' You might recognize at least one thing: the autoregressive process is happening here. What does this slide actually say? It says that if you have an autoregressive model that can predict the next token well, then it will actually capture the correct distribution of the next sequence.
This was a relatively new idea at the time. This was not the first autoregressive neural network.
But I believe this is the first autoregressive neural network that we truly believed could yield any desired results if trained well enough. Our goal at that time was (which now seems ordinary but was very bold back then) machine translation.
LSTM: technology before the Transformer.
Next, some ancient history that many of you may have never seen: LSTM. For those unfamiliar, LSTM is something unfortunate deep learning researchers used before Transformers appeared. It is essentially a ResNet rotated 90 degrees. You can see it integrates residual connections (now known as residual flow), but also includes some multiplications. It is slightly more complex than ResNet. This is what we used at that time.
Parallel Computing: Pipeline Parallelism.
Another feature I want to emphasize is parallel computing. We used pipeline parallelism, with each GPU processing one layer. Is using pipeline parallelism wise? It doesn't seem wise now. But we weren't so clever back then. By using 8 GPUs, we achieved a 3.5 times speedup.
Conclusion: The Scaling Hypothesis.
Arguably the most important slide, as it outlines the beginning of the scaling hypothesis: If you have a very large dataset and train a very large neural network, then success is inevitable. If you want to look on the bright side, you could say this indeed was what happened afterward.
Core Idea: Connectionism.
I also want to bring up an idea that I believe has stood the test of time. This is connectionism. The core idea is:
If you believe artificial neurons are somewhat like biological neurons, then you can confidently believe that large neural networks (even if they are not as large as the human brain) can be configured to accomplish most of the tasks that we humans do. Of course, there are still differences, as the human brain can self-reconstruct, while our best learning algorithms currently require a lot of data. Humans still have the upper hand in this regard.
Pre-training Era
I believe all of this has led to the arrival of the pre-training era. The GPT-2 model, the GPT-3 model, and the scaling laws. I would especially like to thank my former collaborators: Alec Radford, Gerrit Kaplan, and Daria Amodei, whose work was crucial. Pre-training is the driving force behind all the progress we see today. Ultra-large neural networks trained on Beijing Vastdata Technology datasets.
The end of pre-training, but ultimately pre-training will come to an end. Why? Because while computing power is constantly increasing, data does not grow infinitely; we only have one Internet Plus-Related. You could even say that data is the fossil fuel of AI. It is created, we use it, and we have reached the peak of data; there will be no more data. We can only work with the existing data. Although we still have a long way to go, we only have one Internet Plus-Related.
What is the next step?
Next, I will speculate a bit about what might happen in the future. Of course, many people are speculating. You may have heard the term 'agents.' People think agents are the future. More specifically, though a bit vague, is synthetic data. How to generate useful synthetic data remains a huge challenge. There is also the optimization of computing power during reasoning, and what we have recently seen in o1 models; these are all directions that people are attempting to explore after pre-training.
Biological Insights: Scaling of Brains Across Different Species
I would also like to mention a biological example that I find very interesting. Years ago, I saw a presentation at this conference where the speaker showed a chart demonstrating the relationship between body size and brain size among mammals. The speaker stated that in biology, everything is quite chaotic, but there is one exception: a close relationship exists between an animal's body size and brain size.
I was curious about that chart at the time and started searching for it on Google. One of the image results was like this: you can see various mammals, including non-human primates. But next were hominids, such as Neanderthals, who are closely related to human evolution. Interestingly, the brain-body scaling index of hominids has different slopes.
This means there is an example in biology that showcases a different scaling method. This is cool. Additionally, I want to emphasize that the x-axis is on a logarithmic scale. So, things could be different. What we are currently doing is the first thing we know how to scale. There is no doubt that everyone in this field will find the next direction.
Speculation about the future.
Now I want to take a few minutes to speculate about the longer-term future. Where are we all headed? We are making progress, which is really amazing. If you entered this field ten years ago, you would remember how immature the technology was back then. Even if you take deep learning for granted, it is still incredible to see the progress it has made. I cannot convey that feeling to those who have only joined the field in the last two years. But I want to talk about superintelligence, as that is clearly the future of this field.
Superintelligence will be fundamentally different in nature from the intelligence we have today. I hope to provide you with some concrete intuitions over the next few minutes to help you feel this difference.
Currently, we have powerful language models, which are fantastic chatbots that can even do some things, but they are often unreliable, sometimes confused, and exhibit superhuman performance in certain tasks. It remains unclear how to resolve this contradiction.
But ultimately, the following will happen:
These systems will truly possess the nature of agents. And right now, they are not agents on any meaningful level, or they have only a very weak agent-like quality. They will perform genuine reasoning.
I also want to emphasize one point about reasoning:
The more a system is able to reason, the more unpredictable it becomes. The models we are using now are predictable because we have been trying to replicate human intuition. The reactions of our brains in one second are essentially intuition. Therefore, we have trained the models with some intuition. But reasoning is unpredictable. One reason is that good chess AI is unpredictable to human chess masters.
So, the AI systems we will be dealing with in the future will be highly unpredictable. They will understand limited data, and they will not feel confused, which is a huge limitation of their current existence. I’m not saying how to achieve this, nor when to achieve this, I am simply stating that it will happen. When all these capabilities are combined with self-awareness (why not? self-awareness is useful), we will have systems that are completely different from today. They will possess incredible capabilities. But the issues related to these systems will be vastly different from the problems we have been accustomed to in the past.
It is impossible to predict the future; everything is possible. But in the end, I still want to conclude my speech on an optimistic note.

The following is a transcript of the Q&A session:

• Question 1: In 2024, will there be other biological structures playing a role in human cognition that you think are worth exploring like you did before?
• Answer: If someone has unique insights into how the brain works and believes that our current approach is foolish, they should explore it. Personally, I do not have such thoughts. Perhaps from a higher abstract level, we can say that biology-inspired AI has been very successful because all neural networks are bio-inspired, even though the inspiration is very limited, like we only used neurons. More detailed biological inspirations are hard to find. But if there are special insights, perhaps useful directions can be found.
• Question 2: You mentioned that reasoning is a core aspect of future models. We see the presence of hallucinations in current models. We use statistics to determine whether the model generates hallucinations. In the future, will models with reasoning abilities be able to self-correct and reduce hallucinations?
• Answer: I think the situation you described is very likely to happen. In fact, some early reasoning models may have already begun to possess this capability. In the long run, why not? It is like the auto-correct feature in Microsoft Word. Of course, this feature is much more powerful than auto-correct. But overall, the answer is affirmative.
• Question 3: If these newly born intelligents need rights, how should we establish the correct incentive mechanisms for humanity to ensure they can gain freedom like humans?
• Answer: This is a question worth contemplating. However, I do not feel capable of answering it. Because it involves establishing some kind of top-down structure, or government-like entity. I am not an expert in this area. Perhaps it could be something like Cryptos. If AI just wants to coexist with us and also wants to gain rights, maybe that would be fine. But I think the future is too unpredictable, and I hesitate to comment. However, I encourage everyone to think about this issue.
• Question 4: Do you think large language models (LLMs) are capable of multi-hop reasoning cross-distribution generalization?
• Answer: This question assumes a yes or no answer. But this question should not be answered with "yes" or "no" because what does "cross-distribution generalization" mean? What does "within distribution" mean? Before deep learning, people used techniques like string matching and n-gram for machine translation. At that time, "generalization" meant whether to use phrases that were not in the dataset at all? Now, our standards have been greatly raised. We might say that a model scored high in a math competition, but maybe it just memorized the same ideas discussed on Internet forums. So, maybe it is within distribution, or maybe it's just memory. I think human generalization ability is much better, but current models can also do this to some extent. This is a more reasonable answer.

Editor/new

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.