share_log

报道:谷歌将开发可控制计算机的人工智能

Report: Google will develop ai that can control computers.

wallstreetcn ·  Oct 27 14:47

According to The Information, the project, codenamed 'Project Jarvis,' aims to take over users' browsers to help consumers complete various daily tasks such as research, product purchase, or flight booking. Insiders revealed that in December, Google will also release the next generation flagship Gemini large language model, which will power Jarvis.

On October 26, according to The Information, Google will develop ai that can control computers and plans to preview this new AI product as early as December.

The report states that this product, also known as 'Computer Usage Agent,' is designed to take over users' browsers to help consumers complete various daily tasks such as collecting research, purchasing products, or booking flights. According to three informed sources cited by The Information, the project is codenamed 'Project Jarvis', similar to a product announced by Anthropic this week.

They also revealed that in December, Google will release the next generation flagship Gemini large language model, which will power Jarvis.

Dedicated to catching up with OpenAI, customization adapted to Chrome.

However, the release schedule of Jarvis indicates that despite Google's accumulation of basic research in AI technology, Google is clearly still catching up with its competitors. Currently, Google is still developing AI with so-called 'reasoning ability', while OpenAI already launched this feature as early as September.

Analysis believes that Google's Gemini chatbot is significantly lagging behind OpenAI's ChatGPT in competition, leading companies to turn to OpenAI's large language models (LLMs), making it difficult for Google's Gemini model to catch up. In order to improve AI development efficiency, Google merged the team responsible for the Gemini chatbot into its main AI team, DeepMind, last week.

It is worth noting that currently, AI developers have considered 'agents' (AI systems capable of completing complex tasks without human supervision) as the next stage of the industry. Companies like Salesforce, microsoft, and Workday have been purchasing LLMs from OpenAI and other companies, and actively using this technology to develop AI agents.

Anthropic and Google are trying to push the concept of AI agents to a deeper level through software that interacts directly with personal computers or browsers. OpenAI has also been developing similar software for most of this year.

Insiders say that Google's AI agent product is similar to the one launched by Anthropic, both involving frequent captures of content on user computer screens and providing explanations of the screenshots before taking actions like clicking buttons or entering text to respond to user commands.

However, there are key differences in the agent products of the two companies:

Anthropic states that its product can operate on different applications installed on a computer, while Jarvis can currently only operate on browsers, and has been 'customized' for Google's Chrome browser.

Insiders also mention that at least for now, Jarvis is targeted at users who want to automate daily web tasks. At Google's developer conference this spring, CEO Sundar Pichai hinted that the future Gemini version could autonomously perform multiple tasks, such as helping users return a pair of shoes.

Product response speed is slow, and security is questionable.

Insiders also suggest that the plan for 'Jarvis' is provisional and might change. Reports suggest that Google may initially release the product to a few early testers to help identify and fix its shortcomings. The agent currently runs relatively slow as the model needs to think for several seconds before taking each action.

Furthermore, as Google needs access to customers' private information such as login passwords and credit card details to access different websites to complete tasks or make purchases based on customer requests.

Analysis shows that Google needs to convince people that its ai agents can safely handle their personal data, which is necessary for it to perform tasks.

In addition, LLMs also have some common vulnerabilities, such as the possibility of generating incorrect answers. Previously, Google used LLM-driven conversational answers in its search engine, resulting in many obvious errors.

Edit / ping

The translation is provided by third-party software.


The above content is for informational or educational purposes only and does not constitute any investment advice related to Futu. Although we strive to ensure the truthfulness, accuracy, and originality of all such content, we cannot guarantee it.
    Write a comment