OpenAI has announced the release of a new series of artificial intelligence models tailored for coding, as part of its strategy to counter rising competition from companies like Google and Anthropic. These models are available to developers through OpenAI’s application programming interface (API).
The new lineup includes three models: GPT 4.1, GPT 4.1 Mini, and GPT 4.1 Nano. Kevin Weil, OpenAI’s chief product officer, disclosed during a livestream that these models surpass the capabilities of OpenAI’s previously most popular model, GPT-4o, and even exceed certain aspects of the more robust GPT-4.5.
GPT-4.1 achieved a score of 55 percent on SWE-Bench, a benchmark extensively used to assess coding models’ capabilities. This score is several points higher compared to other OpenAI models. Weil highlighted that these models excel in coding, complex instruction adherence, and are excellent for developing AI agents.
The ability of AI models to generate and modify code has seen notable advancements recently, providing more streamlined methods for software prototyping and enhancing AI agents’ functionalities. Competitors such as Anthropic and Google have also launched models with proficient coding capabilities.
There were rumors for weeks about the impending release of GPT-4.1. It is reported that OpenAI tested this model under the alias Alpha Quasar on various leaderboards. Users who interacted with this “stealth” model have shared impressive feedback about its coding capabilities, with one user commenting on Reddit about its ability to resolve incomplete code generation issues observed in other models.
The newly released models can process eight times more code simultaneously, thereby refining their ability to enhance and debug code. They also display improved command-following accuracy, reducing the need for users to reiterate commands to achieve desired outcomes. Demos showcased by OpenAI demonstrated GPT-4.1 developing diverse applications, including a flashcard app for language learning.
Michelle Pokrass, from OpenAI’s post-training division, noted during the Monday livestream that developers prioritize coding efficiency, and OpenAI has been refining these models to produce functional code. Efforts have been concentrated on supporting varied formats, exploring repositories, conducting unit tests, and ensuring code compiles correctly.
Furthermore, GPT-4.1 operates 40 percent faster than GPT-4o, OpenAI’s most frequently used model for developers, and the cost of processing input queries has been cut by 80 percent in the latest version.
During the livestream, Varun Mohan, CEO of Windsurf, a widely-utilized AI coding tool, stated that his company tested GPT-4.1 and found it to be “60 percent” more effective than GPT-4o based on its benchmarks. Mohan added that GPT-4.1 demonstrated significantly fewer instances of flawed behavior, especially in terms of misreading and incorrectly editing non-relevant files.
Over recent years, OpenAI has capitalized on the widespread interest in ChatGPT, a chatbot that debuted in late 2022, evolving into a business offering access to advanced chatbots and AI models. According to a TED interview with OpenAI’s CEO, Sam Altman, last week, the company has 500 million weekly active users, with rapid growth in user engagement.