OpenAI Introduces GPT-5.4 for Enterprise Knowledge Work and AI Agents

OpenAI launches GPT‑5.4, a new AI model designed to improve professional knowledge work, coding, and agent-based automation across tools and systems.

author-image
Manisha Sharma
New Update
gpt

OpenAI has introduced GPT-5.4, the latest version of its large language model, designed to support professional knowledge work, software development and AI agent workflows. The model is being made available through ChatGPT, the OpenAI API, and Codex, alongside a higher-performance variant called GPT-5.4 Pro aimed at complex workloads.

Advertisment

The release reflects a broader shift in generative AI development towards models capable of performing multi-step tasks across tools, software systems and enterprise workflows. GPT-5.4 integrates improvements in reasoning, coding and visual understanding while introducing native computer-use capabilities intended to help AI systems interact directly with applications and digital interfaces.

According to OpenAI, the new model combines capabilities from earlier systems such as GPT-5.2 and GPT-5.3 Codex, with a focus on improving accuracy, tool usage and performance in real-world professional tasks.

Improved performance for knowledge work

One of the primary areas of improvement in GPT-5.4 is its ability to handle structured professional tasks such as spreadsheets, documents and presentations.

On GDPval, a benchmark designed to measure how effectively AI systems perform knowledge work across 44 occupations, GPT-5.4 achieved results that matched or exceeded industry professionals in 83% of comparisons. Earlier models such as GPT-5.2 achieved around 70.9% on the same benchmark.

These tasks typically include outputs such as financial models, operational schedules, analytical reports or presentations. OpenAI says the goal of these improvements is to make AI systems more useful for professionals who rely on structured deliverables rather than conversational responses.

Internal testing also suggests improvements in spreadsheet modelling tasks similar to those performed by junior financial analysts. GPT-5.4 recorded an 87.3% score, compared with 68.4% for GPT-5.2.

Advertisment

Human reviewers evaluating presentation outputs also preferred presentations produced by GPT-5.4 in 68% of cases, citing stronger visual structure and clearer formatting.

The model also aims to reduce factual inaccuracies. OpenAI reports that in prompts where users previously flagged incorrect information, GPT-5.4’s responses were 18% less likely to contain errors, while individual claims were 33% less likely to be false compared with GPT-5.2.

Native computer-use capabilities

A notable addition in GPT-5.4 is support for computer-use workflows, enabling AI agents to interact with software environments through screenshots, mouse actions and keyboard inputs.

This capability allows models to navigate user interfaces, complete browser tasks or automate workflows across multiple applications. For developers building autonomous agents, the feature represents a step toward systems that can carry out tasks traditionally handled through manual software interaction.

Performance improvements were recorded in several benchmarks that test an AI model’s ability to control computer environments. On OSWorld-Verified, which measures how effectively a model can operate a desktop environment using screenshots and keyboard or mouse commands, GPT-5.4 achieved a 75% success rate, compared with 47.3% for GPT-5.2. This result slightly exceeds the benchmark’s recorded human performance of 72.4%.

Similarly, in browser interaction tests, GPT-5.4 achieved a 67.3% success rate on WebArena-Verified, which evaluates the ability of models to perform tasks on websites. These improvements are supported by stronger visual understanding. On MMMU-Pro, a benchmark for multimodal reasoning, GPT-5.4 reached 81.2% accuracy, compared with 79.5% for GPT-5.2.

Advertisment

The model also introduces support for higher-resolution image inputs, allowing it to process images up to 10.24 million pixels, which may improve document parsing and interface interpretation.

Coding and developer workflows

GPT-5.4 also builds on the coding capabilities introduced with GPT-5.3, Codex. The model performs competitively on SWE-Bench Pro, a benchmark used to measure how well AI systems resolve real-world software engineering issues. OpenAI says the model has been designed for longer coding workflows where agents can write, execute and refine code with minimal manual input.

Developers using Codex can also enable a “fast mode”, which increases token generation speed by up to 1.5 times while maintaining the same model architecture. The model is particularly intended for complex development workflows, including debugging web applications or testing user interfaces. OpenAI has also introduced an experimental Codex feature called Playwright Interactive, which allows the system to test applications visually while they are being developed.

Advertisment

In demonstrations shared by the company, the model was able to build browser-based simulations and then automatically test functionality using automated browser interaction.

Screenshot 2026-03-06 115640

Tool integration and multi-step workflows

Another area of development in GPT-5.4 is its ability to work with external tools and APIs.

The new model introduces tool search, a mechanism that allows the AI system to dynamically retrieve tool definitions when needed rather than loading them all into the prompt at the start of a request. This approach reduces token usage and helps models operate within larger tool ecosystems.

Advertisment

In testing using Scale’s MCP Atlas benchmark, OpenAI found that tool search reduced token consumption by 47% while maintaining similar task accuracy.

The model also improves agentic tool calling, allowing AI systems to decide when and how to use tools during reasoning. On the Toolathlon benchmark, which evaluates multi-step workflows involving APIs and external tools, GPT-5.4 completed tasks more accurately and with fewer interaction cycles compared with GPT-5.2.

These improvements are particularly relevant for enterprise AI systems that rely on integrations with email platforms, databases, spreadsheets or workflow automation tools.

Expanding web search and research capabilities

GPT-5.4 also improves persistent web research, an area where AI agents attempt to retrieve information across multiple sources. On BrowseComp, a benchmark designed to test how well AI systems find difficult-to-locate information online, GPT-5.4 achieved an 82.7% accuracy score, while the higher-performance GPT-5.4 Pro model reached 89.3%.

The benchmark measures how effectively a system can conduct iterative searches and synthesise information from multiple webpages. OpenAI says these improvements help the model maintain context during longer reasoning tasks that require multiple rounds of information retrieval.

Implications for enterprise AI

The release of GPT-5.4 highlights an ongoing shift in the generative AI landscape from conversational tools towards systems designed for operational tasks. Earlier generative AI models focused largely on text generation or summarisation. Newer models are increasingly built to support multi-step workflows, interact with software environments and integrate with enterprise tools.

For developers and enterprises, this trend could enable AI systems to automate routine tasks such as document preparation, data processing, software testing or research workflows. However, practical deployment will depend on how effectively organisations integrate these capabilities into existing applications and business processes.

OpenAI says GPT-5.4 is now available through ChatGPT, its API platform and Codex, with the GPT-5.4 Pro variant aimed at users requiring higher-performance reasoning for complex workloads.