①The communication between users and AutoGLM is in the form of voice or text, with real-time subtitles displayed; ②Zhixu simultaneously released the more human-like GLM-4-Voice end-to-end emotional voice model; ③There is still a lot of room for improvement in current AI assistants.
On October 28th, near the apple AI functionality (Apple Intelligence) release day, the Chinese unicorn enterprise Zhixu AI (referred to as Zhixu) preemptively launched an AI assistant tool.
On October 25th, Zhixu showcased the convenience brought by AutoGLM in a three-minute video: no manual operation required, users can simply speak into their phones (give commands) to automatically open various apps on their phones, do online shopping, order takeaway, book high-speed rail tickets, even send WeChat messages, grab red envelopes, comment on moments, organize notes to generate strategies, and summarize papers. From the functionality shown in the promotional video, Zhixu has obtained authorization from Taobao, Meituan, WeChat, and other apps. These three companies are also investors in Zhixu.







The communication between users and AutoGLM is in the form of voice or text, with subtitles displayed in real time.
Mobile phones have become AI assistants, only needing to receive text/voice commands to simulate human operations on the phone, helping you solve daily trivial matters. Doesn't this scene resemble J.A.R.V.I.S from the movies coming to reality?
On the same day (October 25), autopilot released the GLM-4-Voice end-to-end emotional speech model.
The biggest highlight of this model is that it focuses more on human touch, with a stronger interactive experience. According to the introduction, GLM-4-Voice can flexibly adjust the emotional, intonation, speed, and dialect features of the voice based on user instructions, with lower latency, supporting real-time interruptions, multilingualism, and multi-dialects. As an end-to-end speech model, GLM-4-Voice avoids the information loss and error accumulation brought by the traditional "speech to text then to speech" cascading scheme, and theoretically has a higher modeling ceiling. It is about to launch video call functions, aiming to create an AI assistant that is both visible and audible.
Autopilot stated, "The appearance of GLM-4-Voice is autopilot's latest step towards AGI."
Currently, users can experience AutoGLM by installing the "Intelligent Propriety Speech" plug-in. AutoGLM has also opened internal testing applications on the Android system and has deep cooperation with smartphone manufacturers such as Honor.
According to the information, Zhidexin and Honor jointly established the AI Large Model Technology Joint Laboratory in September 2024 and deepened cooperation with Honor. On October 23, 2024, the Honor Magic9.0 conference showcased the AI Agent YOYO with autonomous driving capabilities.
Opensource Securities stated that Zhi Spectrum AutoGLM significantly improves the practicality of AI Agents, potentially increasing the penetration rate of AI Agent users and opening up commercial space.
GTJA stated that AutoGLM accurately understands user commands, automatically completes app operations, frees the user's hands, and is expected to accelerate the landing of AI assistants from mobile phone manufacturers. AI assistants with autonomous driving capabilities can help users to avoid cumbersome app operations. With just a voice command, users can meet their needs, truly achieving personalized AI assistants. It is expected to drive a trend of AI mobile phone upgrades, leading to an increase in demand in the upstream industry chain.
However, AutoGLM still has a lot of room for improvement. GTJA mentioned that the 'autonomous driving' on the mobile end still requires clear instructions. According to the test videos released by Digital Life Kha'zix, AutoGLM's autonomous driving requires users to provide specific instruction information, such as booking hotels requiring users to provide time, location, budget, and corresponding room type.
In addition, executing more personalized instructions like 'book a flight home for me' still requires cooperation with mobile phone manufacturers to obtain user personal information usage permissions to achieve. Currently, AutoGLM can automatically execute common operations in apps such as WeChat, Taobao, Meituan, and Xiaohongshu. More personalized operations have not yet been realized, and it is not compatible with Didi, JD.com, WeChat Reading, and other apps. In the future, it will require more third-party vendors and a more comprehensive UI training dataset support.
The terminal is expected to usher in the era of AI assistants, which may bring multi-level industry opportunities.
AI Assistant (AI Agent) is an intelligent entity that can perceive the environment, make decisions, and take actions, with the ability to independently think, call tools, and gradually achieve given goals. It can be decomposed into four components: large model + planning + memory + tool use.
In terms of application scenarios, AI assistants are suitable for sales, supply chain, customer service, finance, human resources, etc.
For the AI asia vets industry, AI assistants can provide personalized applications for consumers, and cost-reducing and efficiency-improving solutions for businesses. For users, the core function of AI assistants is autonomous operation of mobile phones, assisting in completing complex reasoning tasks.
The industry is actively deploying and exploring AI assistants, such as Alibaba's MobileAgent, Tencent's App Agent, Honor's MagicOS 9.0 operating system, Apple's Apple Intelligence, and recently Microsoft and Google have also successively launched AI assistant applications.
In the early morning of October 22nd, Microsoft announced the integration of 10 autonomous AI assistants in Dynamics365, which can help enterprises automate customer service, sales, finance, warehousing, and other operations. These AI assistants support OpenAI's latest model o1 and have autonomous learning capabilities, enabling them to automatically perform cross-platform highly complex tasks.
On October 24, iflytek co.,ltd. launched an AI assistant for the fields of education, medical, judicial, and government services.
At the 2024 China Mobile Global Partner Conference from October 11th to 13th, Richinfo Technology officially launched the all-in-one AI application development platform RichAIBox, which can achieve unified access of multiple base models, seamless connection with enterprise private knowledge bases, and visual editing of multi-scenario intelligent entities to help enterprises quickly build AI applications. On the application side, the company also introduced AIGC products such as 3D digital human generation, music generation based on images, AI dance, and AI call assistant, covering various scenarios of "5G new calls"...
Zheshang Securities stated that recently, the usage of domestic multimodal AI applications has shown an explosive trend - since the end of August, Minimax launched the Wensheng video model abab-video-1 and applied it to Hailuo AI, the usage ecosystem has shown accelerated growth. According to statistics from the "AI Product Rankings", the web version visits of Hailuo AI in September increased by 860% month-on-month; while similarweb data shows that the monthly web visits exceeded 16 million times since the launch of Kuaishou's "Kelvin" large-scale video model in July.
The institution believes that the outbreak of AI assistant applications reflects a significant improvement in the capabilities of domestic multimodal AI large models. The explosive growth in the usage of domestic multimodal AI applications lays a good foundation for AI assistant applications.
HTSC stated that AI assistants are the core foundation for achieving AGI, and the implementation of AI assistants brings various levels of industrial opportunities. Agent+ terminals are expected to drive human-computer interaction revolution, which apart from changes in terminal sales volume and price, may have even more profound impacts on the commercial models of terminal applications.
Denbon Securities believes that as AI assistants built on large-scale underlying platforms achieve large-scale promotion in application scenarios and high-frequency responses, they will generate huge demand for inference computing power, and the inference side may become a long-term blue ocean of future computing power demand.