① This product is said to automatically execute various complex tasks, including writing code, booking travel, and automatic e-commerce shopping; ② Altman believes that the next major breakthrough in AI will be AI assistants; ③ AI assistants may master new entry points for csi mobile internet index.
According to media reports, OpenAI is preparing to launch a new AI assistant product codenamed "Operator," which can automatically execute various complex tasks, including writing code, booking travel, and automatic e-commerce shopping. According to insider leaks, OpenAI's leadership plans to release this product in January 2025, initially as a research preview and development tool, with API access opening for developers at that time.
It is reported that OpenAI has been conducting several research projects related to intelligent agents. One person indicated that the most nearly completed project will be a universal tool that executes tasks in a web browser.
AI assistants (AI Agents) are intelligent entities capable of perceiving the environment, making decisions, and executing actions. They possess the ability to complete given goals through independent thought and tool invocation, providing personalized applications for C-end users while offering cost-reduction and efficiency-enhancing solutions for B-end users. For ordinary users, the core functionality of AI assistants is their ability to autonomously operate mobile phones, assisting in complex reasoning tasks.
OpenAI CEO Altman has long indicated his willingness for the next steps. A few weeks ago, he stated in a Reddit "Ask Me Anything" forum, "We will have increasingly better models, but I believe the next major breakthrough will be AI assistants." At last month's OpenAI press conference before the annual development day, the company's Chief Product Officer Kevin Weil stated, "I believe that 2025 will be the year when Agent systems finally enter the mainstream."
From OpenAI's perspective, it faces increasing pressure in the commercialization process, as the gradual improvements of ChatGPT may not attract users to pay higher prices. Executives are eager for a groundbreaking product to prove that the substantial investments in AI development are worthwhile.
Currently, OpenAI has open-sourced a multifunctional collaborative AI Agent—Swarm, which can create multiple intelligent agents to work together more efficiently to complete tasks. Its GPT o1 model has enhanced reasoning capabilities, showing significant progress in solving complex problems and the naturalness of user interactions, making it more suitable for AI Agent scenarios.
AI assistants are viewed as a core foundation for achieving AGI. In an era where hardware manufacturers frequently mention AI, AI assistants may become a breakthrough for endpoint intelligence. Citic sec stated that AI Agents may master new entry points for csi mobile internet index, reshaping the traffic distribution pattern. Due to their strong interactivity and convenience, AI Agents could break down the natural barriers between different apps on the same endpoint.
According to the incomplete整理 of the Star Daily, leading manufacturers at home and abroad are racing to launch AI assistant products -
$Microsoft (MSFT.US)$Recently, the AI tool OmniParser was quietly open-sourced, allowing users to create personalized intelligent agents to operate personal computers; on October 22, microsoft announced the integration of ten autonomous AI Agents into Dynamics 365, supporting OpenAI's latest model o1, which has autonomous learning capabilities and can automatically execute complex cross-platform business; in September, microsoft launched a benchmark framework called Windows Agent Arena, which also falls under AI assistant development.
According to The Information, google plans to preview its large action model 'Project Jarvis' in December, which will help users perform tasks such as 'collecting research, purchasing products, or booking flights.'
On October 22, Anthropic iterated new features for its large model Claude - Computer Use, allowing AI to control computers like humans. Claude 3.5 Sonnet is the first model to support computer control, capable of simulating human actions on computers, including moving the cursor, clicking buttons, and entering text.
Apple chose to integrate Siri with ChatGPT to achieve smarter human-computer interaction, and some netizens discovered that Apple has quietly released two implementation versions of Ferret-UI (based on Gemma 2B and Llama 8B respectively), which is a technology released by Apple in May this year that enables AI to understand smartphone screens.
Huawei announced a new research achievement allowing AI to operate mobile phones like humans, with the related team proposing a mobile control architecture: Lightweight Multi-modal App Control (LiMAC).
The Chinese unicorn company Zhiyu AI has launched an AI assistant tool AutoGLM, which requires no manual operation; users can simply speak to their phones (issue commands) to automatically open various apps on the phone, make online purchases, order takeout, book high-speed rail tickets, and even send WeChat messages, grab red envelopes, comment on Moments, organize notes, and generate strategies and summarize papers.
Citic Sec stated that terminal AI assistant technologies such as AutoGLM will lead to shorter interaction forms, with the ability to accept voice commands and automatically perform complex operations bringing great convenience to consumers. It is expected to become a highlight feature of AI terminals and attract consumers to upgrade.
Editor/Rocky