share_log

AI新时代揭幕!会“思考解题逻辑”的OpenAI推理大模型登场

The new era of AI has begun! OpenAI's reasoning model that can 'think and solve problems logically' has appeared.

cls.cn ·  Sep 13 07:30

①The OpenAI o1 model (the "Strawberry" large model) marks a new level of artificial intelligence in the field of complex reasoning tasks; ②By changing the behavior of the AI model, the new model can effectively improve the quality of answers while avoiding some structural defects; ③OpenAI first launched two models, o1-preview version and o1 mini.

On Friday, September 13, China Finance Network News (Editor Shi Zhengcheng), at around 1 am Beijing time, the AI era ushered in a new starting point—a large model capable of general complex reasoning finally came to the forefront.

OpenAI announced on its official website that it has started pushing the OpenAI o1 preview model to all subscribed users, which is the highly anticipated "Strawberry" large model. OpenAI stated that for complex reasoning tasks, the new model represents a new level of AI capability and is therefore worth resetting the count to 1 and giving it a brand new name distinct from the "GPT-4" series.

The characteristic of large reasoning models is that AI will spend more time thinking before providing an answer, similar to the process of human thinking and problem-solving. In the past, large models relied on learning patterns from a large dataset to predict word sequence generation, and strictly speaking, did not truly understand the question.

(Clearly perceivable
(Clearly perceivable "thinking" process, source: OpenAI)

Cognition will rise to the level of a "PhD student in science"

OpenAI previously explained that the GPT-4 released in 2023 is similar to the intelligence level of a high school student, while GPT-5 represents the growth of AI from a high school student to a PhD. This o1 model is a crucial step in that journey.

Compared to existing large models such as GPT-4o, OpenAI o1 is able to solve more difficult reasoning problems and improve the systemic defects in previous models.

For example, this new model can count how many 'r's there are in 'strawberry'.

At the same time, AI will be more organized in answering programming questions. Before starting to write code, AI thinks through the entire answer process and then starts writing the code.

For example, in a pre-defined poem task (such as the last word of the second sentence needing to end with 'i'), the GPT-4o model would give an answer, but it often only satisfies part of the conditions and does not self-correct. This means that AI must hit the correct answer on the first generation, otherwise it will definitely make mistakes. However, in the o1 model, AI will continuously try and refine the answer, significantly improving the accuracy and quality of the generated results.

Interestingly, when you examine the AI thinking process, phrases like 'Is it okay for me to think like this?' and 'Oh, time is running out, I need to give an answer quickly' appear. OpenAI confirms that these are not the original thought chains, but rather 'abstracts generated by the model', and the company openly acknowledges the presence of factors that maintain 'competitive advantage'.

Jerry Tworek, the Head of Research at OpenAI, revealed that the training behind the o1 model has fundamental differences compared to previous products. Previous GPT models aimed to mimic the patterns in their training data, while the training of o1 aims to enable it to independently solve problems. In the reinforcement learning process, rewards and penalties are used to 'educate' AI to use 'thought chains' to handle problems, just like how humans acquire the ability to break down and analyze problems.

According to tests, the o1 model can obtain 83% score in the international Mathematical Olympiad qualification exams, while GPT-4o can only solve 13% of the problems correctly. In the programming competition Codeforces, the o1 model achieves a score in the 89th percentile, while GPT-4o only scores 11%.

(The chart shows that the preview version of the o1 model has slightly lower capabilities compared to the official version.)

OpenAI stated that, according to the tests, in the next updated version, ai performs at a level similar to that of doctoral students in challenging benchmark tests in physics, chemistry, and biology.

Let's talk about the disadvantages and limitations.

It is not difficult to understand that an ai model that can think about problems on its own is beneficial for programmers, creative workers, and almost all professionals in science-related fields. However, this new model also has limitations.

Firstly, the OpenAI o1 model (at least for now) is not a multimodal model and also not as good as other models in answering factual questions. So, in terms of image interaction, common sense questions, and internet search, GPT-4o is still a better choice. Of course, OpenAI has stated that they will add networking, file and image uploading, and other functions to this model in the future.

Another issue is its high cost. The pricing of the o1-preview model is $15 per million input tokens and $60 per million output tokens, which is three times and four times that of GPT-4o, respectively. One million tokens roughly equals 0.75 million English words.

In addition to the OpenAI o1-preview version, OpenAI has also released the o1-mini model. The latter is a faster and cheaper model, priced 80% lower, and suitable for scenarios that require reasoning but not extensive world knowledge.

Moreover, judging from the OpenAI's cautious behavior, this reasoning model is likely to be very computationally demanding. The company announced that starting from September 12th, ChatGPT subscribers can access these two new models, but currently o1-preview has a limit of 30 messages per week, and o1-mini has a limit of 50 messages.

Enterprise ChatGPT and educational users can access these two models starting next week. Developers with API usage level 5 can start using these two models immediately, with a rate limit of 20 calls per minute. OpenAI plans to provide the o1-mini model to free users in the future, but there is currently no timetable.

Editor/Somer

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment