ASAPAi Soon As Possible · AI & tech, delivered fastest
Article

OpenAI Strengthens ChatGPT's "Health Intelligence" — 28% Gain on HealthBench

AASAP
2026-06-19 · 2 min read

OpenAI announced in June 2026 that it has sharpened ChatGPT's ability to answer health questions. The headline is a 28% performance gain on HealthBench, an evaluation standard built with more than 260 physicians — a bigger leap than the jump from GPT-3.5 Turbo to GPT-4o. Because trust in medicine is inseparable from safety, the company emphasized safeguard design as much as raw performance.

What Changed

OpenAI said it has improved both the accuracy and the safety of ChatGPT's health-related answers. HealthBench Professional, unveiled in April 2026 by co-founder Greg Brockman, evaluates AI on real clinical tasks such as symptom assessment and treatment recommendations.

The size of the gain is striking. Over recent months, OpenAI's frontier models improved 28% on HealthBench — a larger leap than the gap between GPT-4o and GPT-3.5 Turbo in August 2024.

What HealthBench Is

HealthBench is an evaluation framework that scores AI health answers from the perspective of medical professionals. More than 260 physicians built it together with OpenAI.

The scoring criteria are the crux. It evaluates the safety and clarity of an answer, along with whether the model recommends follow-up care with a clinician when appropriate. In other words, it measures not just raw accuracy but whether the AI "guides safely."

The Connection to ChatGPT Health

Launched in January 2026, ChatGPT Health integrates a user's medical records and wellness apps. By connecting services like Apple Health and MyFitnessPal, it helps explain test results, prepare for appointments, build workout routines, and compare insurance plans.

Security was strengthened separately. Health conversations are covered by dedicated encryption and isolation, keeping them protected and partitioned from ordinary chats.

Limits and Caveats

Even with the performance gains, responsibility for medical advice still rests with people. OpenAI stressed that it designed the model to recommend follow-up care — a reflection of the premise that AI does not replace doctors.

What matters for users is that an "AI answer is not a confirmed diagnosis." Health information should be treated as reference; diagnosis and prescriptions require confirmation from a professional.

Why It Matters

The announcement shows that AI's next battleground is the evaluation standard itself. As with HealthBench, built by 260 physicians, the more trust matters in a domain, the more what you score — and how — matters as much as raw performance.

The lesson holds for using AI in high-stakes fields like medicine, law, and finance. Making sources and limits explicit, and keeping the final judgment in human hands, is what builds trust.


References: OpenAI — Introducing ChatGPT Health · OpenAI — HealthBench · Healthcare Dive

← All posts