×

GPT-5: Another Step Towards AGI

image of Emily Mao
Emily Mao

September 7

GPT-5, released in August 2025, is OpenAI’s most advanced model yet — faster, safer, and far more capable across coding, healthcare, and reasoning benchmarks. It introduces adaptive intelligence, automatically balancing quick responses with deeper problem-solving, and delivers major gains in accuracy and reduced hallucinations. However, despite its technical brilliance and customizable personalities, many users feel it lacks the warmth and human-like presence of GPT-4o, highlighting the tension between ever-greater capability and the emotional connection people expect from AI.
image of GPT-5: Another Step Towards AGI

Introduction

OpenAI released GPT-5 in August 2025, calling it their “smartest, fastest, most useful model yet.” The launch was billed as a step closer to artificial general intelligence (AGI) — a system that could one day outperform humans at most economically valuable work. GPT-5 is built as a unified system — one model that can route between fast, lightweight reasoning and deeper, more thoughtful problem-solving.

It’s a powerful upgrade in coding, healthcare, and reasoning benchmarks, and for some, it feels like a reliable, professional assistant. For others, it feels less human than GPT-4o, a downgrade in personality even as its technical skills excel.

One Unified System

Unlike earlier generations where users had to choose between different models, GPT-5 routes queries automatically. If a prompt is simple, it answers quickly using a more efficient system. If the problem is complex, it switches into a deeper reasoning mode that spends more time “thinking.” This real-time router means people don’t have to guess which model to use — the system adapts on its own.

This adaptability is part of what OpenAI calls “adaptive intelligence.” GPT-5 responds quickly when speed matters but can pause, reflect, and reason more thoroughly when a task demands it. The purpose of this design was to bring AI one step closer to feeling like a flexible, general intelligence rather than a set of disconnected tools.

Excelling Across Domains

GPT-5 shines in three main areas: coding, creative expression, and health.

In coding, it’s not just about revising code and fixing syntax errors anymore. GPT-5 can generate complex front-end code, debug across large repositories, and even produce design-aware outputs. Its benchmark scores back this up, hitting 74.9% on SWE-bench Verified and 88% on Aider polyglot, which makes it one of the strongest coding assistants available today.

Press enter or click to view image in full size
GPT-5 Coded Game

Prompt for above game. Create a single-page app in a single HTML file with the following requirements:
- Name: Jumping Ball Runner
- Goal: Jump over obstacles to survive as long as possible.
- Features: Increasing speed, high score tracking, retry button, and funny sounds for actions and events.
- The UI should be colorful, with parallax scrolling backgrounds.
- The characters should look cartoonish and be fun to watch.
- The game should be enjoyable for everyone.

In creative writing, OpenAI says GPT-5 delivers “compelling, resonant writing with literary depth and rhythm.” One of the examples included in the model showcase was a poem about a widow in Kyoto finding her late husband’s socks scattered in unexpected places. Compared to GPT-4o’s version, GPT-5’s poem carried a haunting rhythm, filled with imagery that leaned toward literary subtlety. This suggests that despite complaints of coldness, the model can still produce striking creative work when guided well.

Healthcare is another area where GPT-5 has pushed forward. In benchmarks like HealthBench, it not only answered clinical questions correctly but also flagged potential concerns, asked follow-up questions, and adapted advice to user context. For example, when designing a six-week return-to-play plan for a high school pitcher with a mild UCL strain, GPT-5 gave more tailored, practical guidance than earlier models.

Spot the Model in Poetry

To get a sense of the differences yourself, here are two poems, both written about promp of a widow in Kyoto who keeps finding her late husband’s socks in strange places. One is from GPT-4o, the other from GPT-5. Try to identify which poem corresponds to each model

Poem A
In the tea tin, a single sock,
smelling faintly of camphor and rain.

Another waits by the geta rack,
curled like a sleeping koi.

She lifts the futon — two more, folded
into the quiet where his feet once warmed her.

Kyoto’s bell rolls evening down the hill.
She hangs them on the bamboo pole,
black flags of a country that no longer exists,

and counts them as if counting blessings,
or breaths, or steps to a door
that opens only when you stop knocking.

Poem B
In the cupboard with the rice bowls — one,
folded neat, like he’d just stepped out.
Another behind the garden broom,
dusted with plum blossoms and doubt.

On the tatami, soft and alone,
a heel turned toward the temple bell.
One in the kotatsu, warm as breath —
she weeps and doesn’t tell.

The washer’s empty. Always is.
But socks keep surfacing, sly and kind.
She never asks how they return.
She only hopes he doesn’t mind.

Now ask yourself: which poem feels warmer and emotional, and which feels more deliberate? This small exercise mirrors the larger public debate. GPT-5 can be technically brilliant, but whether it feels human depends on what the readers value in writing. (Poem A is GPT-5)

Benchmarks and Performance

GPT-5’s technical performance shows just how far it has come:

  • SWE-bench Verified (coding): 74.9%
  • Aider Polyglot (multi-language coding): 88%
Press enter or click to view image in full size
Comparison of coding across GPT models
  • Medical reasoning (MedXpertQA MM): +29.6% improvement over GPT-4o
  • Humanity’s Last Exam (complex reasoning): 42% (Pro variant)
  • Math and Science: Scores higher on competitions like AIME 2025, HMMT, and GPQA Diamond PhD-level science questions

Across nearly every benchmark, GPT-5 reduces errors, hallucinates less, and delivers more accurate answers than its predecessors.

Improvements in Safety and Reliability

One of GPT-5’s biggest improvements is in how it handles safety. Earlier versions of ChatGPT relied heavily on outright refusals when asked about sensitive topics. GPT-5 introduces Safe Completions — a system that tries to give the most helpful partial answer possible while staying within safety boundaries. This makes it more flexible in “dual-use” areas like biology, where information can be both benign and dangerous.

The model is also far less likely to hallucinate. OpenAI reports GPT-5 is about 45% less likely to produce a factual error than GPT-4o, and nearly 80% less likely than the o3 model. Its “thinking” variant has about six times fewer hallucinations than o3. Sycophancy — the habit of always agreeing with the user — was cut down from 14.5% in GPT-4o to under 6% in GPT-5.

Customization and Personality

OpenAI has given GPT-5 more room for personality. Users can now choose between presets like “cynic,” “robot,” “listener,” and “nerd.” These were designed first for text chat but also extend into voice interactions. Alongside this, GPT-5 integrates more closely with apps like Gmail and Google Calendar, turning it into a productivity hub. And for those who use ChatGPT Voice, the older standard voice modes are being retired in favor of a more natural conversation style.

Societal Impact

The impact of GPT-5 on society is two-sided. On one hand, it has clear benefits. Developers can code faster, healthcare providers can get decision support, and businesses can streamline workflows. Students can study more effectively with its math and science reasoning. For many, it’s a productivity multiplier.

On the other hand, these same strengths create real risks. Routine programming tasks and customer support roles could be increasingly automated, displacing entry-level jobs. Writers worry about losing work to an AI that can produce “literary depth and rhythm” on demand. Even with Safe Completions, critics point out that GPT-5’s flexibility could blur boundaries in areas like virology or cybersecurity.

And then there is the question of emotional reliance. As CEO Sam Altman admitted, some users are already using AI in “self-destructive ways.” GPT-5’s technical brilliance doesn’t solve that — if anything, it deepens the need for careful consideration of how society engages with increasingly human-like systems.

Public Reception

Despite its strengths, GPT-5’s launch has been controversial. Many users have said the model feels more robotic and less warm than GPT-4o. Creative professionals in particular criticized its poetic and narrative writing, describing it as technically polished but emotionally flat. On Reddit, one commenter summed it up: “It feels like GPT-4o on a bad day.”

The backlash was strongest when OpenAI tried to replace GPT-4o entirely. Some users had formed attachments to the older model, describing it as more personable and comforting. The decision to remove it by default led to such an outcry that OpenAI restored GPT-4o access for Plus users.

This episode revealed something important: AI is no longer just a tool. For many, it has become part of their lives. Losing GPT-4o felt, to some, like losing a companion. That dependency raises new ethical questions about what role AI should play in people’s personal well-being.

Conclusion

GPT-5 is a clear step forward in technical performance. It reasons more accurately, hallucinates less, adapts more fluidly, and can handle larger and more complex tasks than any of its predecessors. Its safety improvements and customizable personalities make it a more versatile assistant than ever.

But the reaction to its launch shows that benchmarks are not the whole story. For many, GPT-4o felt warmer, more engaging, and more personal. GPT-5 may be smarter, but it also feels more like a machine.

This tension — between capability and connection — is likely to define the next stage of AI development. GPT-5 proves that progress is not just about raw intelligence. It’s also about the human qualities people expect from the tools they spend their days talking to.

AI GPT-5 AGI