share_log

史上最快大模型炸场!Groq一夜爆红,自研LPU速度碾压英伟达GPU

The fastest big model bombing site in history! Groq became popular overnight, and its self-developed LPU speed crushed Nvidia GPUs

wallstreetcn ·  Feb 20 20:39

Source: Wall Street News

As soon as I woke up, the AI world changed again.

Having yet to absorb the shock brought by Sora, another Silicon Valley startup dominated the hot search with the fastest big model in history and the self-developed LPU chip.

Just yesterday, AI chip startup Groq (not Musk's Gork) opened a free trial of its products. Compared to other AI chatbots, Groq's lightning-fast response speed quickly ignited internet discussions. After testing by netizens, Groq's generation speed is close to 500 tok/s per second, crushing GPT-4's 40 tok/s.

Some netizens were shocked and said:

It responds faster than I blink.

However, it should be emphasized that Groq is not developing a new model; it is just a model launcher. The homepage runs open source models Mixtral 8x7B-32K and Llama 270B-4K.

The response speed of the largest model circle comes from the hardware that drives the model - Groq is not used$NVIDIA (NVDA.US)$Instead of GPUs, they developed their own new AI chip—LPU (Language Processing Units).

500 tokens per second, writing a paper is faster than the blink of an eye

The most prominent characteristic of LPU is that it is fast.

Driven by Groq LPU according to January 2024 test results$Meta Platforms (META.US)$ The Llama 2 model, with far leading inference performance, is 18 times that of top cloud computing vendors.

图片来源:GIT HUB
Photo Credit: GIT HUB

Wall Street News mentioned in a previous article that Groq LPU with Meta Llama 270B can generate the same number of words as Shakespeare's “Hamlet” within 7 minutes, 75 times faster than an average person's typing speed.

As shown in the picture below, a Twitter netizen asked a professional question related to marketing, and Groq wrote a long story of thousands of words within four seconds.

Another netizen tested using Gemini, GPT-4, and GroQ at the same time to complete a code debugging problem.

As a result, Groq's output speed is 10 times faster than Gemini and 18 times faster than GPT-4.

Groq's speed reduction attack on other AI models made netizens call out, “Captain America in the AI reasoning world is here.”

LPU, Nvidia GPU challenger?

Once again, Groq is not developing a new model; it just uses a different chip.

According to Groq's official website, LPU is a chip specially designed for AI inference. The GPU that drives mainstream models, including GPT, is a parallel processor designed for graphics rendering. It has hundreds of cores, and the LPU architecture is different from the SIMD (single instruction, multiple data) used by GPUs. This design allows the chip to make more effective use of each clock cycle, ensure consistent latency and throughput, and also reduces the need for complex scheduling hardware:

Groq's LPU inference engine is not an ordinary processing unit; it is an end-to-end system designed to provide the fastest inference for applications that require intensive computation and continuous processing, such as LLM. By eliminating external memory bottlenecks, the performance of the LPU inference engine is several orders of magnitude higher than traditional GPUs.

Simply put, for users, the most intuitive experience is “fast.”

Readers who have used GPT must know how painful it is to wait for the big model to spit out characters one by one, and the large model driven by LPU can basically respond in real time.

As shown below, Wall Street News asked Groq about the difference between LPU and GPU. Groq generated this answer in less than 3 seconds, and there was no significant delay like GPT or Gemini at all. If you ask questions in English, they will be generated faster.

Groq's official introduction also shows that the innovative chip architecture can connect multiple tensor streaming processors (Tensor Streaming Processors, TSP for short) without traditional bottlenecks in GPU clusters, so it is extremely scalable and simplifies the hardware requirements for large-scale AI models.

Energy efficiency is another highlight of LPU. By reducing the overhead of managing multiple threads and avoiding underutilization of the core, an LPU can provide more computing power per watt.

In interviews, Groq founder and CEO Jonathan Ross never forgot to give Nvidia eye drops.

He previously told the media that in the big model inference scenario, the Groq LPU chip is 10 times faster than the Nvidia GPU, but the price and power consumption are only one-tenth of the latter.

Real-time inference runs the data calculation process through a trained AI model to provide immediate results for AI applications to achieve a smooth end user experience. With the development of big AI models, the need for real-time inference has surged.

Ross believes that inference costs are becoming an issue for companies using artificial intelligence in their products because the cost of running models is rapidly increasing as the number of customers using these products increases. Compared to Nvidia GPUs, the Groq LPU cluster will provide higher throughput, lower latency, and lower cost for large model inference.

He also stressed that Groq's chips, due to different technical paths, are more adequate than Nvidia in terms of supply and will not be$Taiwan Semiconductor (TSM.US)$Or the neck of a supplier such as SK Hynix is stuck:

GroqChip LPU is unique in that it does not rely on Samsung or SK Hynix's HBM, or TSMC's CowOS packaging technology to solder an external HBM to the chip.

However, some other AI experts said on social media that the actual cost of the Groq chip is not that low.

As AI expert Jia Yangqing analyzed, the comprehensive cost of GroQ is equivalent to more than 30 times that of Nvidia's GPU.

Considering that each GroQ chip has a memory capacity of 230MB, the actual operating model requires 572 chips, and the total cost is as high as 11.44 million US dollars.

In comparison, the 8-H100 system is comparable in performance to the Groq system, but the hardware cost is only $300,000, and the annual electricity bill is around $24,000. The three-year total operating cost comparison shows that the operating cost of the Groq system is much higher than that of the H100 system.

Also, more importantly, LPU is currently only used for reasoning, and to train large models, you still need to buy an Nvidia GPU.

The founder is one of Google's TPU designers and believes it can sell 1 million LPUs in the next 2 years

Before becoming an instant hit on the internet today, Groq had been in low-key development for over 7 years.

According to public information, Groq was founded in 2016 and is headquartered in Santa Clara Mountain View, California, USA. Company founder Jonathan Ross is a former senior Google engineer and is$Alphabet-A (GOOGL.US)$/$Alphabet-C (GOOG.US)$One of the designers of self-developed AI chip TPU. Product Director John Barrus worked at Google and$Amazon (AMZN.US)$Serves as a product executive.

Estelle Hong, the only Chinese face and vice president among the executives, has been with the company for four years and previously worked for the US military and Intel.

Just in August of last year, Groq also announced a cooperation plan with Samsung, stating that its next-generation chips will be produced using a 4 nm process at the Samsung chip factory in Texas, USA, and the expected mass production time is in the second half of '24.

Looking ahead to the next generation of LPUs, ROSS believes that GroqChip's energy efficiency will be 15 to 20 times higher, and more matrix computation and SRAM memory can be added to the device within the same power range.

In an interview at the end of last year, Ross said he believes in Groq's future development potential given the shortage and high cost of GPUs:

In 12 months, we can deploy 100,000 LPUs, and in 24 months, we can deploy 1 million LPUs.

Editor/jayden

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment