OpenAI's AI Codes Itself: A Breakthrough in AI Agents and the Future of Software Development
The rise of AI coding tools has rapidly transformed the software development landscape, impacting every stage of the process. What’s truly groundbreaking is that these tools are now being used to improve themselves. OpenAI, a leading force in artificial intelligence, is at the forefront of this revolution, leveraging its own AI coding agent, Codex, to build and refine the very tool that powers it. This self-improving cycle represents a significant leap forward in the development of AI agents and promises to reshape how software is created. This article delves into the details of this fascinating development, exploring the capabilities of Codex, its impact on OpenAI’s workflow, and the broader implications for the future of coding.
The Evolution of Codex: From GPT-3 to Agentic Coding
OpenAI initially launched Codex as a research preview in May 2025, establishing it as a cloud-based software engineering agent capable of handling diverse tasks, including feature development, bug fixing, and generating pull requests. Codex operates within secure, sandboxed environments connected to a user’s code repository, enabling parallel task execution. Access to Codex is provided through ChatGPT’s web interface, a command-line interface (CLI), and IDE extensions for popular platforms like VS Code, Cursor, and Windsurf.
The name “Codex” has historical roots, originating from a 2021 OpenAI model based on GPT-3 that powered GitHub Copilot’s tab completion feature. According to Alexander Embiricos, product lead for Codex at OpenAI, the name is internally considered an abbreviation for “code execution.” The connection to the earlier model was intentional, recognizing its pivotal role in demonstrating the potential of AI in coding.
“For many people, that model powering GitHub Copilot was the first ‘wow’ moment for AI,” Embiricos stated. “It showed people the potential of what it can mean when AI is able to understand your context and what you’re trying to do and accelerate you in doing that.”
Codex vs. the Competition: A Dynamic Market
The current command-line version of Codex shares similarities with Claude Code, Anthropic’s agentic coding tool launched in February 2025. While Embiricos acknowledged the competitive landscape, he emphasized OpenAI’s ongoing internal development of web-based Codex features prior to the CLI release. “It’s a fun market to work in because there’s lots of great ideas being thrown around,” he said.
Despite the competition, OpenAI’s Codex has seen remarkable adoption. Usage among external developers surged 20-fold after the interactive CLI extension was released alongside GPT-5 in August 2025. The subsequent launch of GPT-5 Codex on September 15, a specialized version of GPT-5 optimized for agentic coding, further fueled this growth.
Internal Adoption and the Recursive Development Loop
The impact of Codex extends beyond external users. A significant majority of OpenAI’s engineers now regularly utilize the tool in their daily work. Notably, they use the same open-source version available to the public, fostering a collaborative environment where external contributions are directly integrated into the core product. “I really love this about our team,” Embiricos explained. “The version of Codex that we use is literally the open source repo. We don’t have a different repo that features go in.”
This leads to a fascinating recursive development loop. Codex isn’t just generating code; it’s actively involved in improving itself. Embiricos described scenarios where Codex analyzes its own training runs and processes user feedback to determine its next development steps. “We have places where we’ll ask Codex to look at the feedback and then decide what to do,” he said. “Codex is writing a lot of the research harness for its own training runs, and we’re experimenting with having Codex monitoring its own training runs.” OpenAI employees can even assign tasks to Codex through project management tools like Linear, treating it as a full-fledged team member.
A Historical Parallel: From Manual Circuits to Software-Designed Chips
This recursive process echoes a long-standing trend in computing history. Early integrated circuits were painstakingly designed by hand, but the chips that powered the first electronic design automation (EDA) software enabled the creation of far more complex circuits than humans could manage manually. Today’s processors, containing billions of transistors, are a testament to the power of software-driven design. OpenAI’s use of Codex to build Codex follows a similar pattern: each iteration unlocks capabilities that contribute to the next generation.
Navigating the Language of AI: Avoiding Anthropomorphism
Describing what Codex *does* presents a unique challenge. While it’s natural to use human-like language when interacting with the system, it’s crucial to avoid anthropomorphizing AI models. Codex autonomously runs processes, addresses feedback, manages child processes, and produces code that ships in real-world products. OpenAI employees refer to it as a “teammate” and assign it tasks using the same tools they use for human colleagues. However, whether these actions constitute “decisions” or complex conditional logic within a neural network remains a subject of ongoing debate among computer scientists and philosophers.
What’s undeniable is the existence of a semi-autonomous feedback loop: Codex generates code under human guidance, that code becomes part of Codex, and the subsequent version produces different code as a result.
Accelerating Development with “AI Teammates”
A prime example of Codex’s internal impact is the development of the Sora Android app. According to Embiricos, the tool enabled OpenAI to create the app in record time. “The Sora Android app was shipped by four engineers from scratch,” Embiricos told GearTech. “It took 18 days to build, and then we shipped it to the app store in 28 days total.” The team leveraged Codex for architectural planning, sub-plan generation, and component implementation.
However, it’s important to note that independent research on AI coding productivity has yielded mixed results. A METR study published in July found that experienced open-source developers were actually 19 percent slower when using AI tools on complex, mature codebases—though the researchers acknowledged potential benefits for simpler projects.
Codex Integrated into the Workflow
Ed Bayes, a designer on the Codex team, highlighted how the tool has transformed his workflow. Codex now integrates with project management tools like Linear and communication platforms like Slack, allowing team members to directly assign coding tasks to the AI agent. “You can add Codex, and you can basically assign issues to Codex now,” Bayes said. “Codex is literally a teammate in your workspace.”
This integration allows for seamless feedback loops. When feedback is posted in a Slack channel, team members can tag Codex to address the issue. The agent then creates a pull request, which can be reviewed and iterated upon within the same thread. “It’s basically approximating this kind of coworker and showing up wherever you work,” Bayes said.
For Bayes, Codex has empowered him to contribute code directly, rather than relying solely on engineers to implement his designs. “It kind of gives you more leverage. It enables you to work across the stack and basically be able to do more things,” he said. Designers at OpenAI now prototype features by building them directly, using Codex to handle the implementation details.
The Future of Coding: A Junior Developer Today, a Senior Developer Tomorrow?
OpenAI views Codex as a “junior developer” with the potential to evolve into a “senior developer” over time. “If you were onboarding a junior developer, how would you onboard them? You give them a Slack account, you give them a Linear account,” Bayes said. “It’s not just this tool that you go to in the terminal, but it’s something that comes to you as well and sits within your team.”
Will this ultimately displace human developers? Embiricos distinguishes between “vibe coding,” where developers blindly accept AI-generated code, and “vibe engineering,” where humans remain actively involved. “We see a lot more vibe engineering in our code base,” he said. “You ask Codex to work on that, maybe you even ask for a plan first. Go back and forth, iterate on the plan, and then you’re in the loop with the model and carefully reviewing its code.”
He acknowledges the value of “vibe coding” for prototyping and throwaway tools, emphasizing the human’s discretion in determining the level of code review.
Beyond LLMs: Agents as the Path Forward
Recent trends suggest a shift away from “monolithic” large language models (LLMs) like GPT-4.5 towards simulated reasoning models and agentic systems built from multiple AI models running in parallel. We asked Embiricos whether agents like Codex represent the most promising path for maximizing the utility of existing LLM technology.
He dismissed concerns about AI capabilities plateauing. “I think we’re very far from plateauing,” he said. “If you look at the velocity on the research team here, we’ve been shipping models almost every week or every other week.” He cited recent improvements where GPT-5-Codex reportedly completes tasks 30 percent faster than its predecessor at the same intelligence level. During testing, the model has demonstrated the ability to work independently for 24 hours on complex tasks.
A Killer App for LLMs?
OpenAI faces competition from Anthropic’s Claude Code, Google’s Gemini CLI, and startups like Cursor, which have built dedicated AI-powered IDEs. Given the challenges of using LLMs as factual resources, we wondered if coding has emerged as a “killer app” for these models, offering a clear business use case with less risk than applications like writing or emotional companionship.
“We have absolutely noticed that coding is both a place where agents are gonna get good really fast and there’s a lot of economic value,” Embiricos said. “We feel like it’s very mission-aligned to focus on Codex. We get to provide a lot of value to developers. Also, developers build things for other people, so we’re kind of intrinsically scaling through them.”
Will tools like Codex threaten software developer jobs? Bayes acknowledged concerns but stated that Codex hasn’t led to headcount reductions at OpenAI, and “there’s always a human in the loop because the human can actually read the code.” Both men envision a future where Codex serves as an amplifier of human potential, rather than a replacement for it.
The implications of agents like Codex extend far beyond OpenAI. Embiricos envisions a future where coding agents are accessible to individuals with no programming experience. “All humanity is not gonna open an IDE or even know what a terminal is,” he said. “We’re building a coding agent right now that’s just for software engineers, but we think of the shape of what we’re building as really something that will be useful to be a more general agent.”
This article was updated on December 12, 2025 at 6:50 PM to mention the METR study.