share_log

马斯克突发Grok 1.5!上下文长度至128k、HumanEval得分超GPT-4

Musk suddenly hits Grok 1.5! Context length up to 128k, HumanEval score over GPT-4

量子位 ·  Mar 29 20:08

Source: Quantum Bits

Just recently, Musk's Grok Big Model announced a major upgrade.

No wonder Grok-1 was suddenly open-sourced before, because it has more powerful Grok-1.5, which focuses on reasoning ability.

The official tweet from xAI didn't say anything; just post the link. The main theme is “few words, big things”121.png

What are the breakthroughs in the new version of Grok?

First, context length soared, growing from 8192 to 128k, on par with GPT-4.

Second, inference performance was greatly improved. Mathematical ability directly increased by as much as 50%, and the score on the HumanEval data set exceeded GPT-4.

As soon as the news came out, the comments section immediately became agitated.

Let's take a look at the exact score results right away.

Grok-1.5 is here

First, for the context window.

This time, it's a direct increase of 16 times that of the previous one, to the 128k level.

This means that Grok can handle longer and more complex prompts while maintaining its ability to follow instructions.

In the “Finding a Needle in a Hail” (NIAH) test, Grok-1.5 perfectly retrieved the embedded text in the context of a 128K token.

The entire image is watery blue (100% search depth):

Second, the reasoning aspect.

Grok-1.5's ability to handle programming and math-related tasks has been greatly improved, surpassing Grok-1, Mistral Large, and Claude 2.

Mathematically, GROK-1.5 scored 50.6% on the Math benchmark, surpassing the medium cup Claude 3 Sonnet; scoring 90% on the GSM8K.

In terms of programming, GROK-1.5 scored 74.1% on the HumanEval benchmark, surpassing the medium cup Claude 3 Sonnet, Gemini Pro 1.5, and GPT-4, second only to the big cup Claude 3 Opus.

Seemingly, Grok's strength this time should not be underestimated.

Another feature of the Grok series compared to other big models is that it does not use the general Python language+Pytorch framework.

According to the official report, Grok 1.5 uses a distributed training architecture and is built using Rust, JAX+Kubernetes.

To improve training reliability and maintain uptime, the team proposed a custom training coordinator that can automatically detect problematic nodes and then eliminate them.

In addition, they have optimized processes such as checkpointing, data loading, and training restart to minimize downtime.

This is how fast we have today's Grok 1.5~

Further information has yet to be disclosed officially.

What is certain is that the new version will be rolled out to early testers in the next few days. And according to the “old rules”, it will soon be there? The platform is online.

Netizens said: Grok is progressing really fast.

Not to mention anything else, the metrics of the new version of Grok have completely surpassed Claude 2. However, xAI was founded only a year ago, and is only 9 months behind Anthropic. Therefore:

Bet after 12 months, xAI has every chance of becoming a leader.

Others gave it a higher evaluation. They thought Musk sent another “GPT-4 equivalent model” and shouted:

Go faster with OpenAI.

Are you looking forward to the new version of Grok?

edit/lambor

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment