OpenAI has introduced native computer control functionality for the first time in a general-purpose model. GPT-5.4 is capable of directly operating software, browsing the web, and controlling the mouse and keyboard to complete tasks. It can be deeply integrated with enterprise applications such as spreadsheets and financial analysis tools. Its desktop navigation capabilities have surpassed human benchmark levels, achieving record-high scores in web search tests and reaching or exceeding professional standards in occupational knowledge assessments. The introduction of a tool search mechanism significantly reduces token consumption. GPT-5.4 comes in two versions: Thinking, which excels in complex reasoning, and Pro, designed for high performance, with a maximum context window of one million tokens and pricing higher than that of version 5.2. A financial services suite has been launched simultaneously.
Just one day after the release of the faster and more discerning GPT-5 series model, GPT-5.3 Instant, OpenAI launched its new flagship foundational model, GPT-5.4, on Thursday, the 5th Eastern Time, simultaneously rolling it out in ChatGPT, API, and the development tool Codex.
OpenAI described GPT-5.4 as 'the most powerful and efficient cutting-edge model for professional work to date,' with a focus on enterprise office environments and complex knowledge work scenarios. Compared to previous versions, the biggest change in GPT-5.4 is the enhancement of AI agent (Agent) capabilities.

Ask Futubull AI:What is the biggest highlight of GPT-5.4?
GPT-5.4 not only generates text or code but also introduces native computer control capabilities into general-purpose models for the first time, enabling direct operation of computer software, web browsing, mouse and keyboard control to complete tasks, and deep integration with enterprise applications such as spreadsheets and financial analysis tools, deeply embedded in Microsoft Excel and Google Sheets.
The industry believes that this series of upgrades marks AI models transitioning from 'dialogue tools' to digital agent systems capable of automated task execution, further penetrating enterprise productivity software and specialized knowledge work.
On Thursday this week, OpenAI simultaneously launched two versions: GPT-5.4 Thinking, which excels at complex reasoning, and the high-performance GPT-5.4 Pro, targeting paid users and premium enterprise users respectively.
In the computer control benchmark test OSWorld-Verified, GPT-5.4 achieved a success rate of 75.0%, surpassing the human average of 72.4% and marking a significant leap from the 47.3% success rate of the previous generation GPT-5.2. In the concurrently released financial services suite, GPT-5.4's score in OpenAI’s internal investment banking benchmark surged from 43.7% with GPT-5 to 88.0%.

Early testing institutions provided positive feedback. Daniel Swiecki, Head of AI Solutions at investment firm Walleye Capital, stated that GPT-5.4 improved accuracy by 30 percentage points in internal financial and Excel evaluations. Brendan Foody, CEO of AI talent platform Mercor, described it as the 'best model the company has tried to date' and noted that GPT-5.4 ranked first in Mercor’s APEX-Agents benchmark for professional service work.
Native computer control functionality built into general-purpose models for the first time, breaking through the boundaries of single-round question-and-answer interactions.
The most groundbreaking capability of GPT-5.4 lies in its native computer control functionality, marking the first time OpenAI has integrated this feature into a general-purpose model. Through API and Codex, this model can operate computers like humans, completing multi-step workflows across applications.
Specifically, GPT-5.4 can control computers by writing code through libraries such as Playwright or directly respond to screenshots by issuing mouse and keyboard commands. Developers can also configure custom confirmation strategies to adapt to scenarios with varying risk tolerances.
Benchmark data supports the substantial progress of this capability: In OSWorld-Verified, which tests desktop navigation ability, GPT-5.4 achieved a success rate of 75.0%, surpassing both GPT-5.2's 47.3% and the human benchmark of 72.4%. In WebArena-Verified, which evaluates browser control, the success rate was 67.3%, higher than GPT-5.2's 65.4%. In Online-Mind2Web, it achieved a 92.8% success rate based solely on screenshots.
In terms of web search capabilities, BrowseComp testing shows that GPT-5.4 improved by 17 percentage points over GPT-5.2, with GPT-5.4 Pro achieving a score of 89.3%, setting a new record for the highest score in this benchmark test.

Dod Fraser, CEO of Mainstay, a property technology company, stated that in tests covering approximately 30,000 property tax portals, GPT-5.4 achieved a first-attempt success rate of 95% and a 100% success rate within three attempts. This represents a significant improvement compared to previous computer control models (with success rates of about 73% to 79%), while also speeding up completion by about three times and reducing token consumption by approximately 70%.
Reconstruction of the tool search mechanism significantly reduces token consumption.
As the scale of the tool ecosystem expands, how to efficiently manage tool invocation has become a bottleneck for the deployment of agent systems. GPT-5.4 introduces a 'tool search' mechanism in its API, fundamentally changing the way tool definitions are transmitted.
Previously, the model needed to preload all tool definitions in the prompt for each request. In systems with a large number of tools, this would result in the additional consumption of thousands or even tens of thousands of tokens per request, increasing costs, adding latency, and diluting context. Under the new mechanism, the model only receives a lightweight list of tools and retrieves the full definition of a tool on-demand when it is actually needed.
OpenAI provided specific data to support the effectiveness: In Scale’s MCP Atlas benchmark test involving 250 tasks, under a configuration enabling all 36 MCP servers, the tool search mode reduced total token usage by 47% compared to exposing all MCP functionalities directly in the context while maintaining the same accuracy.

Wade, CEO of Zapier, stated that GPT-5.4 performed exceptionally well in the company’s tool-use benchmark tests across hundreds of advanced real-world workflows, calling it 'the most sustainable model to date.'
Financial and enterprise scenarios: Deep integration with Excel doubles performance in investment banking tasks.
Released alongside GPT-5.4 is the 'OpenAI Financial Services' suite for enterprises and financial institutions, with its core products being ChatGPT for Excel and Google Sheets (Beta) — ChatGPT will be directly embedded into spreadsheet cells, supporting the construction, analysis, and updating of complex financial models.
The suite also integrates data partners such as FactSet, MSCI, Third Bridge, and Moody's, and introduces reusable Skills functionality, covering high-frequency financial work scenarios like earnings previews, comparable company analysis, DCF valuation analysis, and investment memorandum drafting.
In internal investment banking benchmarking, GPT-5.4 Thinking's score jumped from 43.7% in GPT-5 to 88.0%; in tests simulating junior investment banking analyst spreadsheet modeling tasks, GPT-5.4 averaged 87.3%, significantly higher than GPT-5.2's 68.4%.

Niko Grupen, Director of Applied Research at legal AI platform Harvey, stated that GPT-5.4 scored 91% in the company’s BigLaw Bench evaluation, 'currently outperforming other models in structured complex transaction analysis, maintaining accuracy across lengthy contracts, and providing the high level of detail required by legal practitioners.'
Knowledge Work and Hallucination Suppression: Fully Aligned with Professional Standards
OpenAI demonstrated the capability boundaries of GPT-5.4 across multiple benchmarks measuring real workplace output. In the GDPval test — which covers knowledge work tasks across 44 professions, including sales presentations, accounting tables, and manufacturing charts as examples of real-world work output — GPT-5.4 matched or exceeded industry professional levels in 83.0% of comparisons, up from GPT-5.2's 71.0%.

In presentation quality assessments, human evaluators preferred GPT-5.4’s outputs in 68.0% of cases, citing stronger visual aesthetics, richer visual diversity, and more effective image generation applications.
In terms of controlling hallucinations and factual errors, OpenAI stated that GPT-5.4 is 'the most factually accurate model we have developed to date': on a de-identified prompt dataset previously flagged by users for factual inaccuracies, GPT-5.4 reduced single-statement error rates by 33% compared to GPT-5.2, and lowered the probability of any errors appearing in complete responses by 18%.
In coding ability, GPT-5.4 performed on par with or better than GPT-5.3-Codex on SWE-Bench Pro, with lower latency across all reasoning intensity settings. Codex's /fast mode can boost token generation speed for GPT-5.4 by up to 1.5 times, using the same model and intelligence but optimized purely for speed. Mario Rodriguez, Chief Product Officer at GitHub, stated that GPT-5.4 excels in logical reasoning and executing complex multi-step tool-dependent workflows, 'a model that enterprises should adopt from day one.'
Two versions cater to different user needs, with context windows reaching up to 1 million tokens.
GPT-5.4 Thinking is designed for general professional scenarios requiring deep reasoning, while GPT-5.4 Pro is tailored for the most complex tasks with a focus on maximizing performance.
On the ChatGPT platform, GPT-5.4 Thinking will be accessible to Plus (monthly fee of 20 USD), Team, and Pro users starting this Thursday, replacing the previous GPT-5.2 Thinking, which will officially retire on June 5, 2026, three months later.
GPT-5.4 Pro is exclusively available to Pro (monthly fee of 200 USD) and Enterprise plan users. Free users may also have limited access to GPT-5.4 when the system automatically routes them. Business and education plan users can enable access in advance through administrator settings.
On the API side, GPT-5.4 is provided under the identifier gpt-5.4, and GPT-5.4 Pro is provided as gpt-5.4-pro, both of which are available on the Codex development platform. The API's maximum output is 128,000 tokens, consistent with prior models. The API and Codex simultaneously support a context window of up to 1 million tokens, the largest context capacity OpenAI has offered to date, suitable for planning, executing, and validating long-chain tasks across multiple steps.
Pricing is higher than the previous generation, with efficiency improvements partially offsetting the increased costs.
In terms of API pricing, the cost of GPT-5.4 has increased compared to GPT-5.2. The details are as follows:
GPT-5.4: Input at 2.50 USD per million tokens, output at 15 USD per million tokens (GPT-5.2 pricing was input at 1.75 USD per million tokens, output at 14 USD per million tokens).
GPT-5.4 Pro: Input at 30 USD per million tokens, output at 180 USD per million tokens (GPT-5.2 Pro was input at 21 USD per million tokens, output at 168 USD per million tokens).
Batch and Flex pricing offers a 50% discount, while Priority processing is billed at twice the standard rate.

Notably, when a single input exceeds 272,000 tokens, the excess portion will be charged at double the standard rate. In Codex, the default compression limit is set at 272,000 tokens, but developers can manually increase the limit to handle larger prompts, triggering higher charges only for the excess portion.
OpenAI provided three explanations for the higher pricing: first, its superior capabilities in complex tasks such as programming, computer control, in-depth research, advanced document generation, and tool invocation; second, significant technological advancements derived from its research roadmap; and third, a more efficient reasoning mechanism that consumes fewer inference tokens for the same tasks, partially offsetting the impact of the increased unit price. OpenAI also stated that even after the price increase, the cost of GPT-5.4 remains lower than that of competing cutting-edge models with equivalent capabilities.
Editor/Stephen