Track the latest AI trends

OpenAI's "next major breakthrough"! The first AI assistant product may be released in January next year, is a transformation in human-computer interaction imminent?

cls.cn · Nov 14 10:12

①该产品据称可自动执行各种复杂操作，包括编写代码、预订旅行、自动电商购物等；②阿尔特曼认为AI的下一个重大突破将是AI助手；③AI助理或将掌握移动互联新入口。

据媒体报道，OpenAI正准备推出一款代号为“Operator”的全新AI助理产品，可以自动执行各种复杂操作，包括编写代码、预订旅行、自动电商购物等。根据内部员工爆料，OpenAI领导层预计将在2025年1月发布该产品，首先作为研究预览版和开发工具推出，届时将为开发人员开放API接口。

报道称，OpenAI一直在进行几个与智能体相关的研究项目。其中一位人士表示，最接近完成的将是一个在网络浏览器中执行任务的通用工具。

AI助理（AI Agent）是一种能够感知环境、进行决策和执行动作的智能实体，具备通过独立思考、调用工具去逐步完成给定目标的能力，既能为C端提供个性化应用，也能为B端提供降本增效方案。对于普通用户而言，AI助理最核心的功能是AI自主操作手机，辅助完成复杂推理任务。

OpenAI首席执行官阿尔特曼早已透露下场意愿。几周前，他在Reddit的“问我任何问题”（Ask Me Anything）论坛上表示，“我们将拥有越来越好的模型，但我认为下一个重大突破将是AI助手。”在上个月公司年度开发日之前的OpenAI新闻发布会上，该公司首席产品官Kevin Weil称：“我认为2025年将是Agent系统最终进入主流的一年。”

站在OpenAI的角度，其在商业化进程中面临着越来越大的压力， ChatGPT渐进式的改进可能无法吸引用户支付更高的价格。高管急切需要一款突破性产品，以证明对AI开发的巨额投资是值得的。

目前，OpenAI已开源了多功能协同AI Agent——Swarm，可创建多个智能体协同工作，以更高效地完成任务。其GPT o1模型增强了推理能力，使其在复杂问题的解决和用户交互的自然性方面均有显著进步，亦使其更加适用于AI Agent场景。

AI助理被视作通往AGI的核心基础，在硬件厂商言必称AI的时代，AI助理或成为终端智能化的突破口。甬兴证券表示，AI Agent或将掌握移动互联新入口，流量分发格局有望重塑AI Agent智能体因具备较强交互性以及便利性，或可打通原先同个终端不同App之间的天然壁垒。

据《科创板日报》不完全梳理，国内外头部厂商正争先推出AI助理产品——

$微软 (MSFT.US)$近期低调开源了AI工具OmniParser，其可帮助用户创建个性化智能体，以操作个人计算机；10月22日，微软宣布在Dynamics 365中集成10个自主AI Agent，支持OpenAI最新模型o1，具备自主学习能力，可自动执行跨平台复杂业务；9月，微软推出了一款名为Windows Agent Arena的基准框架，同样属于AI助理开发范畴。

据The Information报道，谷歌计划在12月预览其大型动作模型“Project Jarvis”，该项目将帮助用户执行诸如“收集研究、购买产品或预订航班”等任务。

10月22日，Anthropic为大模型Claude迭代了新功能——Computer Use，让AI可以像人一样操控电脑。Claude3.5 Sonnet是首个支持计算机控制的模型，能够模拟人类操作计算机，包括移动光标、点击按钮和输入文本。

苹果选择将Siri与ChatGPT集成，实现更智能的人机交互，另有网友发现苹果已经默默发布了Ferret-UI的两个实现版本（分别基于Gemma 2B和Llama 8B），这是苹果今年5月发布的一个可让AI理解手机屏幕的技术。

华为则公布了一项可让AI像人类一样操作手机的新研究成果，相关团队提出了一个手机控制架构：Lightweight Multi-modal App Control（轻量级多模态应用控制，简称LiMAC）。

中国独角兽企业智谱AI已上线AI助理工具AutoGLM，无需手动操作，用户对着手机说话（发出指令），便可让其自动打开手机上的各类App，进行网购、点外卖、订高铁票，甚至发微信、抢红包、评论朋友圈、整理笔记并生成攻略、总结论文。

中信证券表示，AutoGLM等终端AI助理技术将带来更短路径的交互形式，接受语音指令并自动完成复杂操作的能力将为消费者带来极大便利，其有望成为AI终端的亮点功能并吸引消费者升级换代。

编辑/Rocky

① This product is said to automatically execute various complex tasks, including writing code, booking travel, and automatic e-commerce shopping; ② Altman believes that the next major breakthrough in AI will be AI assistants; ③ AI assistants may master new entry points for csi mobile internet index.

According to media reports, OpenAI is preparing to launch a new AI assistant product codenamed "Operator," which can automatically execute various complex tasks, including writing code, booking travel, and automatic e-commerce shopping. According to insider leaks, OpenAI's leadership plans to release this product in January 2025, initially as a research preview and development tool, with API access opening for developers at that time.

It is reported that OpenAI has been conducting several research projects related to intelligent agents. One person indicated that the most nearly completed project will be a universal tool that executes tasks in a web browser.

AI assistants (AI Agents) are intelligent entities capable of perceiving the environment, making decisions, and executing actions. They possess the ability to complete given goals through independent thought and tool invocation, providing personalized applications for C-end users while offering cost-reduction and efficiency-enhancing solutions for B-end users. For ordinary users, the core functionality of AI assistants is their ability to autonomously operate mobile phones, assisting in complex reasoning tasks.

OpenAI CEO Altman has long indicated his willingness for the next steps. A few weeks ago, he stated in a Reddit "Ask Me Anything" forum, "We will have increasingly better models, but I believe the next major breakthrough will be AI assistants." At last month's OpenAI press conference before the annual development day, the company's Chief Product Officer Kevin Weil stated, "I believe that 2025 will be the year when Agent systems finally enter the mainstream."

From OpenAI's perspective, it faces increasing pressure in the commercialization process, as the gradual improvements of ChatGPT may not attract users to pay higher prices. Executives are eager for a groundbreaking product to prove that the substantial investments in AI development are worthwhile.

Currently, OpenAI has open-sourced a multifunctional collaborative AI Agent—Swarm, which can create multiple intelligent agents to work together more efficiently to complete tasks. Its GPT o1 model has enhanced reasoning capabilities, showing significant progress in solving complex problems and the naturalness of user interactions, making it more suitable for AI Agent scenarios.

AI assistants are viewed as a core foundation for achieving AGI. In an era where hardware manufacturers frequently mention AI, AI assistants may become a breakthrough for endpoint intelligence. Citic sec stated that AI Agents may master new entry points for csi mobile internet index, reshaping the traffic distribution pattern. Due to their strong interactivity and convenience, AI Agents could break down the natural barriers between different apps on the same endpoint.

According to the incomplete整理 of the Star Daily, leading manufacturers at home and abroad are racing to launch AI assistant products -

$Microsoft (MSFT.US)$Recently, the AI tool OmniParser was quietly open-sourced, allowing users to create personalized intelligent agents to operate personal computers; on October 22, microsoft announced the integration of ten autonomous AI Agents into Dynamics 365, supporting OpenAI's latest model o1, which has autonomous learning capabilities and can automatically execute complex cross-platform business; in September, microsoft launched a benchmark framework called Windows Agent Arena, which also falls under AI assistant development.

According to The Information, google plans to preview its large action model 'Project Jarvis' in December, which will help users perform tasks such as 'collecting research, purchasing products, or booking flights.'

On October 22, Anthropic iterated new features for its large model Claude - Computer Use, allowing AI to control computers like humans. Claude 3.5 Sonnet is the first model to support computer control, capable of simulating human actions on computers, including moving the cursor, clicking buttons, and entering text.

Apple chose to integrate Siri with ChatGPT to achieve smarter human-computer interaction, and some netizens discovered that Apple has quietly released two implementation versions of Ferret-UI (based on Gemma 2B and Llama 8B respectively), which is a technology released by Apple in May this year that enables AI to understand smartphone screens.

Huawei announced a new research achievement allowing AI to operate mobile phones like humans, with the related team proposing a mobile control architecture: Lightweight Multi-modal App Control (LiMAC).

The Chinese unicorn company Zhiyu AI has launched an AI assistant tool AutoGLM, which requires no manual operation; users can simply speak to their phones (issue commands) to automatically open various apps on the phone, make online purchases, order takeout, book high-speed rail tickets, and even send WeChat messages, grab red envelopes, comment on Moments, organize notes, and generate strategies and summarize papers.

Citic Sec stated that terminal AI assistant technologies such as AutoGLM will lead to shorter interaction forms, with the ability to accept voice commands and automatically perform complex operations bringing great convenience to consumers. It is expected to become a highlight feature of AI terminals and attract consumers to upgrade.

Editor/Rocky

The translation is provided by third-party software.

The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.

Track the latest AI trends

OpenAI“下一个重大突破”！首个AI助理产品或明年1月发布，人机交互变革已至？

OpenAI's "next major breakthrough"! The first AI assistant product may be released in January next year, is a transformation in human-computer interaction imminent?

Risk Disclaimer

Statement