Artificial intelligence is revamping healthcare, with large language models (LLMs) like OpenAI’s ChatGPT leading the charge. These models are transforming how medical professionals approach diagnosis and treatment planning, offering faster and often more accurate assessments than traditional methods. OpenAI’s recent introduction of HealthBench underscores this shift, providing a benchmark to evaluate AI performance in realistic healthcare scenarios.
In this article, you’ll learn:
- How LLMs like ChatGPT are transforming diagnosis and treatment planning
- What HealthBench is and why it’s a game-changer for evaluating AI in real-world medical conversations
- Where AI is already used in healthcare, from radiology to patient triage to drug discovery
- Why LLMs are beginning to outperform doctors in speed and accuracy (and what that means for the future)
- What hurdles still block widespread AI adoption in clinics, hospitals, and health systems
- Whether OpenAI’s lead in the healthcare AI space makes it too competitive, or full of new opportunities
- How Mitrix helps healthcare providers build secure, AI-powered software that improves outcomes and reduces costs
- And why collaboration, ethics, and trust are the keys to unlocking AI’s full potential in healthcare
The emergence of HealthBench
In May 2025, OpenAI unveiled HealthBench, an open-source benchmark designed to assess the capabilities of LLMs in healthcare settings. Developed in collaboration with 262 physicians from 60 countries, HealthBench comprises 5,000 multi-turn conversations that simulate real-world medical interactions. Each conversation is evaluated using physician-created rubrics, ensuring that AI responses align with clinical standards and priorities.
HealthBench stands out by focusing on realistic, complex scenarios rather than multiple-choice questions. This approach provides a more accurate measure of an AI model’s ability to handle nuanced medical conversations, reflecting the intricacies of actual clinical environments. But first, let’s observe the current use of AI in healthcare.
Current use of AI in healthcare
AI is already making waves in healthcare, and I’m not talking about labs or research papers, but real-world clinics, hospitals, and telehealth platforms. Its applications are broad and growing fast.
1. Diagnostic assistance
Radiology, pathology, and dermatology are among the first areas where AI has found its groove. Tools like Aidoc, Zebra Medical Vision, and Google’s DeepMind are helping analyze medical images with precision. In many cases, they flag abnormalities like tumors, fractures, or signs of stroke faster than human radiologists. This doesn’t just reduce diagnostic errors: it speeds up treatment decisions when time is critical.
2. Clinical decision support
AI models are now embedded into electronic health record (EHR) systems to offer decision support. For instance, IBM’s Watson Health (before its divestiture) and newer LLM-based tools can suggest evidence-based treatment options, flag drug interactions, and personalize care plans based on patient data. They’re like having a super-fast, always-up-to-date medical encyclopedia whispering in the doctor’s ear.
3. Virtual health assistants
Chatbots powered by LLMs like Florence from the WHO or symptom-checkers from Ada and Babylon Health are being used to triage patients, answer health questions, and remind people to take medications. While they don’t replace doctors, they help manage patient loads and provide basic care access 24/7.
4. Predictive analytics
Hospitals are using AI to predict patient deterioration, readmission risks, or the likelihood of developing conditions like sepsis. These predictions allow for early interventions that can save lives and reduce healthcare costs. For example, the Mayo Clinic and Mount Sinai have deployed machine learning systems to monitor ICU patients in real time.

AI in predictive analysis
5. Drug discovery and clinical trials
AI has shortened the timeline for drug development (from target discovery to molecule generation) by years. Startups like Insilico Medicine and major players like NVIDIA and AstraZeneca are collaborating to find novel treatments faster and cheaper. AI also helps optimize clinical trial recruitment and identify potential responders based on genetic and clinical data.
6. Administrative workflow automation
Not everything AI does in healthcare involves diagnosing a rare disease. A lot of it is grunt work: for instance, automating claims processing, transcribing notes via voice recognition, coding medical records, and managing scheduling. This saves clinicians time and helps combat burnout by letting them focus more on patient care and less on paperwork.
LLMs surpassing human performance
Now let’s get back to the HealthBench report. One of the most significant revelations is the performance of advanced LLMs like GPT-4o. These models have demonstrated the ability to diagnose conditions and suggest treatment plans with a speed and accuracy that rivals, and in some cases surpasses, that of human physicians. For instance, GPT-4.1 nano, a smaller and more cost-effective model, has outperformed its predecessors, indicating rapid advancements in AI capabilities.

LLMs surpassing human performance
This progression suggests that LLMs are not only becoming more efficient but also more accessible, potentially democratizing high-quality medical advice and reducing disparities in healthcare delivery.
Implications for the healthcare industry
The integration of LLMs into healthcare has profound implications:
- Enhanced efficiency. AI can process vast amounts of medical data swiftly, leading to quicker diagnoses and treatment plans.
- Cost reduction. Automating routine tasks can lower operational costs, allowing healthcare providers to allocate resources more effectively.
- Improved access. AI-driven tools can extend medical expertise to underserved regions, bridging gaps in healthcare availability.
However, these benefits come with challenges, including ensuring data privacy, maintaining ethical standards, and integrating AI seamlessly into existing healthcare systems.
Challenges in AI adoption in healthcare
While AI in healthcare holds incredible promise, the road to full-scale adoption is anything but smooth. From technical hurdles to ethical dilemmas, here are the key challenges slowing things down.
Data quality and availability
AI systems are only as good as the data they’re trained on. What about healthcare? Well, data in this sector is notoriously messy. Patient records often contain errors, are stored in incompatible formats, or are incomplete. Many AI models rely on structured, labeled datasets, but most real-world medical data is unstructured and siloed across different systems. The lack of standardized and interoperable data is a massive bottleneck.
Regulatory and legal barriers
Healthcare is one of the most tightly regulated industries in the world, and for good reason. Any AI that influences diagnosis, treatment, or patient outcomes may be classified as a medical device, requiring rigorous testing and FDA/EMA approval. This process can be lengthy, expensive, and discouraging for startups trying to move fast. Additionally, the legal framework around AI accountability is murky. For instance, who’s liable if the AI makes a mistake: the doctor, the developer, or the healthcare provider?
Explainability and trust
Doctors, patients, and regulators all want to know how an AI arrived at a particular recommendation. Unfortunately, many powerful models (especially deep learning and large language models) operate as black boxes, offering little visibility into their inner workings. This lack of explainability makes it hard for clinicians to trust AI output, particularly when patients’ lives are at stake.
Ethical concerns and bias
Bias in AI models can lead to unequal healthcare outcomes. If training data underrepresents certain populations (e.g., women, minorities, or patients in low-income regions), the AI may make inaccurate predictions or overlook critical conditions in those groups. There are also concerns about consent and transparency. For instance, are patients aware that AI is being used in their care? Did they opt in?
Integration into clinical workflows
Even the most accurate AI tools will fail if they don’t fit seamlessly into a clinician’s day-to-day workflow. Doctors are already overwhelmed with interfaces, checklists, and decision aids. Adding “yet another dashboard” often creates more friction than benefit. Successful integration requires thoughtful design, clinician training, and change management. In other words, it takes time.
Cybersecurity and privacy risks
AI systems need vast amounts of data, and that data is often sensitive. With the rise of cyberattacks targeting healthcare systems, any new digital pipeline becomes a potential vulnerability. Protecting patient data while still making it usable for AI training and inference is a tough balancing act, especially in cross-border scenarios where data sovereignty becomes a concern.
The competitive landscape and OpenAI’s position
OpenAI’s proactive approach with HealthBench positions it as a frontrunner in the healthcare AI domain. By establishing a comprehensive benchmark, OpenAI not only showcases its models’ capabilities but also sets a standard for evaluating AI in medical contexts. This move could potentially deter new entrants due to the high bar set for performance and safety.
Nevertheless, the healthcare AI field remains dynamic, with opportunities for innovation, especially in niche areas or through collaborations that combine AI expertise with medical specialization.
Is it worth entering the healthcare AI space?
Despite OpenAI’s advancements, the healthcare AI sector is vast and multifaceted. There are numerous opportunities for startups and researchers to make meaningful contributions, particularly in areas like rare disease diagnosis, personalized medicine, and patient engagement tools.
Moreover, the open-source nature of HealthBench allows for collaborative development and benchmarking, fostering an environment where diverse solutions can thrive. Engaging with this ecosystem can lead to innovations that complement existing models and address specific healthcare challenges.
How Mitrix can help
Here at Mitrix, we know the intricacies of developing medical software. Contact us to build secure, HIPAA-compliant solutions that enhance patient care, streamline workflows, and meet the highest industry standards.
Let’s illustrate this with one of our projects. Take a look at how we helped our client develop the healthcare medical record management platform that enables healthcare providers to track patient histories, monitor progress, and make informed decisions.

Mitrix healthcare project
Results:
- 93% of delivery orders are completed by suppliers within 6 hours.
- 64% faster order creation process achieved by automating patient record entry.
- Up to 25% cost reduction in equipment spending through the use of an analytics system.
- The velocity of settlements has increased twofold, leading to a reduction in floating capital.
From refining your concept to delivering a user-centric solution, we’re with you every step of the way. With our expertise, we don’t just build software but align technology with your business goals. Are you ready to bring your idea to life?
Summing up
The integration of LLMs into healthcare marks a transformative period in medical practice. With tools like HealthBench, the potential for AI to enhance diagnosis and treatment is becoming a reality. While OpenAI leads the charge, the field is ripe with opportunities for innovation and collaboration. As AI continues to evolve, its role in delivering efficient, accurate, and accessible healthcare is poised to expand, benefiting patients and providers alike.
Collaboration between clinicians, researchers, technologists, and policymakers is essential to build trustworthy AI systems rooted in ethical principles. Advancing AI in healthcare isn’t just about mastering smarter models: it requires ongoing research, transparent innovation, and strong interdisciplinary teamwork. When managed right, AI can transform healthcare delivery by enabling more accurate diagnoses, faster treatments, and broader access to personalized, high-quality care. What’s the payoff? Health systems that are not only more efficient but genuinely centered around better patient outcomes.
For more information on HealthBench and its implications, visit OpenAI’s official announcement: Introducing HealthBench.