share_log

生成式AI可能迎来下一个风口:TTT模型

Generative AI may usher in the next wave: the TTT model

wallstreetcn ·  07:52

The “brain” of the Transformers architecture that supports models such as Sora is a search table, a so-called hidden state. Unlike Transformers, TTT doesn't grow as more data is processed; it uses machine learning models to replace hidden states, like AI's nested dolls, which are models within a model.

The focus of the next generation of generative artificial intelligence (AI) may be a test time training model called TTT for short.

The Transformers architecture is the foundation of OpenAI's video model Sora, and the core of text-generation models such as Anthropic's Claude, Google's Gemini, and OpenAI's flagship model GPT-4O. But now, the evolution of these models is beginning to encounter technical hurdles, particularly those related to computation. Because Transformers aren't particularly efficient at processing and analyzing large amounts of data, at least they run on off-the-shelf hardware. Companies are building and expanding infrastructure to meet Transformers' needs, which has led to a sharp increase in electricity demand, and may not even be able to meet demand continuously.

This month, researchers from Stanford University, UC San Diego, UC Berkeley, and Meta made a joint announcement. It took them a year and a half to develop the TTT architecture. Not only can the TTT model process much more data than Transformers, but it also doesn't consume as much computational power as Transformers, the research team said.

Why do outsiders think the TTT model is more promising than Transformers? First, you need to understand that one of the basic components of Transformers is the “hidden state,” which is essentially a long list of data. When the Transformer processes something, it adds entries to a hidden state so it “remembers” what it just processed. For example, if the model is working on a book, the hidden state value would be a representation of a word (or part of a word).

Yu Sun, a postdoctoral fellow at Stanford University who participated in the aforementioned TTT study, recently explained to the media that if the Transformer is viewed as an intelligent entity, then the search table, its hidden state, is the Transformer's brain. This brain implements some of the features Transformers are well known for, such as contextual learning.

Hidden state helps Transformers become powerful, but it also hinders Transformers from developing. For example, Transformers just read a book, and in order to “say” even one word in this book, the Transformers model had to scan the entire search table. This calculation requirement is equivalent to rereading the entire book.

As a result, Sun and other TTT researchers thought of replacing hidden states with machine learning models — like AI's nested dolls, or models within a model. Unlike Transformers' lookups, the TTT model's internal machine learning model doesn't grow as more data is processed. Instead, it encodes the processed data and treats it as representative variables called weights, which is why the TTT model is so high. No matter how much data the TTT model processes, the size of its internal model doesn't change.

Sun believes future TTT models can efficiently process billions of pieces of data, from words to images, and from recordings to videos. This is far beyond the capabilities of existing models. TTT's system can say X words to a book without having to do complicated calculations by rereading the book X times. “Large video models based on Transformers, such as Sora, can only process 10 seconds of video because they only have one search table 'brain'. Our ultimate goal is to develop a system that can process long videos similar to the visual experience of human life.”

Will the TTT model eventually replace Transformers? The media believes this is possible, but it is too early to draw conclusions. The TTT model is now not a direct replacement for Transformers. The researchers developed only two small models for the study, so it is currently difficult to compare TTT with the results achieved by some large Transformers models.

Mike Cook, a senior lecturer in the Department of Informatics at King's College London, who did not participate in the TTT study mentioned above, commented that TTT is a very interesting innovation. If the data supports the idea that it can improve efficiency, that is good news, but he can't tell whether TTT is better than the existing architecture. Cook said that when he was an undergraduate, an old professor used to tell a joke: How can you solve any problem in computer science? Add another layer of abstraction. Adding a neural network to the neural network reminded him of the answer to this joke.

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment