share_log

Surpassing Google Search, costs plummet by 80%! Alibaba's open-source innovative large model Search Engine.

AIGC Open Community ·  May 9 08:45

Source: AIGC Open Community

Title: "Surpassing Google Search, Costs Drop by 80%! Alibaba's Open Source Innovative Large Model Search Engine"

On May 8, Alibaba open-sourced ZeroSearch, a reinforcement learning framework that empowers large model search capabilities without interacting with a real Search Engine. It transforms the pre-trained knowledge of large models into a retrieval module, dynamically controlling the quality of generated content. Its search capabilities surpass Google Search across multiple question-and-answer datasets, while significantly reducing costs.

Yesterday, $BABA-W (09988.HK)$ an innovative large model Search Engine - ZeroSearch was open-sourced.

ZeroSearch is a reinforcement learning framework that motivates large model search capabilities without the need to interact with real search engines. It mainly leverages the rich knowledge accumulated during the large model's large-scale pre-training process, transforming it into a retrieval module that can generate relevant content based on search queries. Additionally, it can dynamically control the quality of generated content, a special feature not available in traditional search engines.

Researchers conducted comprehensive evaluations on seven major question-answering datasets, including NQ, TriviaQA, PopQA, and HotpotQA. The results showed that a supervised fine-tuned model with 7 billion parameters achieved a search capability of 33.06 after using ZeroSearch; the model with 14 billion parameters reached 33.97, surpassing Google Search's 32.47.

In terms of cost, researchers trained approximately 64,000 search queries using Google Search via SerpAPI, with a cost of about $586.70; meanwhile, simulating with a large model of 14 billion parameters on four A100 GPUs, the cost was only $70.80, resulting in a cost reduction of over 80%.

Currently, to address large model hallucinations and expand the range of external knowledge, retrieval-augmented generation (RAG) has become a standard configuration. However, early RAG mainly adopted a prompt-based strategy that integrates external knowledge by guiding large models to perform query generation, query decomposition, and multi-round information retrieval. These methods have high prompt requirements and depend heavily on the model's reasoning ability.

Research is also exploring methods such as supervised fine-tuning and Monte Carlo tree search to enhance search capabilities. Although some progress has been made, the computational cost is high, facing many challenges in practical deployment.

With the emergence of models like DeepSeek-R1 and o1, reinforcement learning has become a key technology that changes the logical reasoning capabilities of models. These models rely entirely on reward-driven learning without explicit step-by-step supervision.

Therefore, many studies have applied reinforcement learning in large model search. For example, Search-R1 autonomously generates multiple search queries through reinforcement learning, while ReSearch teaches the model to reason through search without supervision of intermediate reasoning steps. However, these methods need to be used in conjunction with commercial search engines like Google to achieve optimal results, which is very costly.

ZeroSearch incentivizes the search capabilities of large models through reinforcement learning while avoiding the high costs and uncontrollability associated with interacting with real search engines.

ZeroSearch transforms large models into a retrieval module through lightweight supervised fine-tuning. This process harnesses the rich knowledge accumulated by large models during large-scale pre-training, enabling them to generate relevant or noisy documents based on given queries. By adjusting the keywords in the prompts, the model can flexibly control the quality of generated documents, thus providing diverse retrieval scenarios for subsequent training.

This capability is achieved by collecting interaction trajectory data from real search engines and annotating and fine-tuning this data. The main approach is to engage the large model in multiple rounds of interaction with real search engines until reaching the final answer.

During this process, all interaction trajectories are meticulously recorded, covering the entire process from the model initiating the query to the search engine returning documents and the model generating the final answer based on that. Next, these interaction trajectories are carefully annotated, marking trajectories that produce correct answers as positive samples, indicating that the retrieved documents played a positive role; while trajectories that led to incorrect answers are classified as negative samples, indicating that the corresponding retrieved documents are interference information.

Precise extraction of query-document pairs from positive and negative sample interaction trajectories serves as the foundation for implementing lightweight supervised fine-tuning on the large model. During fine-tuning, researchers cleverly adjust a small number of words in the prompts, for instance, adding "useful information" or "noise information" to guide the large model in learning to generate documents of different quality. At the same time, the input question and its corresponding answer are integrated into the prompt content, broadening the knowledge boundary of the large model.

ZeroSearch also introduces a "curriculum learning mechanism" to gradually adjust the quality of generated documents during the training process. The core idea is to gradually increase the difficulty of tasks as training progresses, allowing the model to start from simple retrieval scenarios and gradually adapt to more challenging environments.

A probability function dynamically adjusts the likelihood of generating noisy documents. In the early stages of training, the model is primarily exposed to high-quality documents to quickly learn the basic output format and task requirements. As training deepens, the model is gradually exposed to more noisy documents, which forces the model to continuously enhance its reasoning ability and robustness to tackle more challenging retrieval tasks.

Within the framework of reinforcement learning, ZeroSearch employs multiple algorithms to optimize the model's search strategy. These algorithms include Proximal Policy Optimization, group relative policy optimization, etc., by maximizing the expected rewards of the policy model to train the model while considering the reference model and reward function.

The design of the reward function focuses on the accuracy of answers, using an F1 score-based reward mechanism to balance precision and recall. Additionally, to improve the stability of training, ZeroSearch introduces a loss masking mechanism to ensure that gradients are calculated only against the model's own outputs, thereby avoiding noise introduced by labels of externally generated documents.

The training template of ZeroSearch is a multi-turn interaction template that clearly distinguishes between the model's reasoning, searching, and answering phases. In the reasoning phase, the model internally reflects and explains its reasoning process within the ... tags. If the model believes additional information is needed, it issues a search query within the ... tags. The retrieved documents are generated by a simulated Search Engine and returned to the model within the ... tags.

Finally, the large model provides the final answer within the ... tags. This structured template not only improves the transparency of the model but also enhances its reliability in practical applications.

webpWant to see more market analysis?Futubull AI is now online!Providing precise answers, comprehensive insights, and grasping key opportunities!

Editor/danial

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment