Artificial intelligenceHealthcare

AI and healthcare: LLMs, HealthBench, and what the future holds

Artificial intelligence is revamping healthcare, with large language models (LLMs) like OpenAI’s ChatGPT leading the charge. These models are transforming how medical professionals approach diagnosis and treatment planning, offering faster and often more accurate assessments than traditional methods. OpenAI’s recent introduction of HealthBench underscores this shift, providing a benchmark to evaluate AI performance in realistic healthcare scenarios.

In this article, you’ll learn:

  • How LLMs like ChatGPT are transforming diagnosis and treatment planning
  • What HealthBench is and why it’s a game-changer for evaluating AI in real-world medical conversations
  • Where AI is already used in healthcare, from radiology to patient triage to drug discovery
  • Why LLMs are beginning to outperform doctors in speed and accuracy (and what that means for the future)
  • What hurdles still block widespread AI adoption in clinics, hospitals, and health systems
  • Whether OpenAI’s lead in the healthcare AI space makes it too competitive, or full of new opportunities
  • How Mitrix helps healthcare providers build secure, AI-powered software that improves outcomes and reduces costs
  • And why collaboration, ethics, and trust are the keys to unlocking AI’s full potential in healthcare

The emergence of HealthBench

In May 2025, OpenAI unveiled HealthBench, an open-source benchmark designed to assess the capabilities of LLMs in healthcare settings. Developed in collaboration with 262 physicians from 60 countries, HealthBench comprises 5,000 multi-turn conversations that simulate real-world medical interactions. Each conversation is evaluated using physician-created rubrics, ensuring that AI responses align with clinical standards and priorities.

HealthBench stands out by focusing on realistic, complex scenarios rather than multiple-choice questions. This approach provides a more accurate measure of an AI model’s ability to handle nuanced medical conversations, reflecting the intricacies of actual clinical environments. But first, let’s observe the current use of AI in healthcare.

Current use of AI in healthcare

AI is already making waves in healthcare, and I’m not talking about labs or research papers, but real-world clinics, hospitals, and telehealth platforms. Its applications are broad and growing fast.

1. Diagnostic assistance

Radiology, pathology, and dermatology are among the first areas where AI has found its groove. Tools like Aidoc, Zebra Medical Vision, and Google’s DeepMind are helping analyze medical images with precision. In many cases, they flag abnormalities like tumors, fractures, or signs of stroke faster than human radiologists. This doesn’t just reduce diagnostic errors: it speeds up treatment decisions when time is critical.

2. Clinical decision support

AI models are now embedded into electronic health record (EHR) systems to offer decision support. For instance, IBM’s Watson Health (before its divestiture) and newer LLM-based tools can suggest evidence-based treatment options, flag drug interactions, and personalize care plans based on patient data. They’re like having a super-fast, always-up-to-date medical encyclopedia whispering in the doctor’s ear.

3. Virtual health assistants

Chatbots powered by LLMs like Florence from the WHO or symptom-checkers from Ada and Babylon Health are being used to triage patients, answer health questions, and remind people to take medications. While they don’t replace doctors, they help manage patient loads and provide basic care access 24/7.

4. Predictive analytics

Hospitals are using AI to predict patient deterioration, readmission risks, or the likelihood of developing conditions like sepsis. These predictions allow for early interventions that can save lives and reduce healthcare costs. For example, the Mayo Clinic and Mount Sinai have deployed machine learning systems to monitor ICU patients in real time.

AI in predictive analysis

5. Drug discovery and clinical trials

AI has shortened the timeline for drug development (from target discovery to molecule generation) by years. Startups like Insilico Medicine and major players like NVIDIA and AstraZeneca are collaborating to find novel treatments faster and cheaper. AI also helps optimize clinical trial recruitment and identify potential responders based on genetic and clinical data.

6. Administrative workflow automation

Not everything AI does in healthcare involves diagnosing a rare disease. A lot of it is grunt work: for instance, automating claims processing, transcribing notes via voice recognition, coding medical records, and managing scheduling. This saves clinicians time and helps combat burnout by letting them focus more on patient care and less on paperwork.

LLMs surpassing human performance

Now let’s get back to the HealthBench report. One of the most significant revelations is the performance of advanced LLMs like GPT-4o. These models have demonstrated the ability to diagnose conditions and suggest treatment plans with a speed and accuracy that rivals, and in some cases surpasses, that of human physicians. For instance, GPT-4.1 nano, a smaller and more cost-effective model, has outperformed its predecessors, indicating rapid advancements in AI capabilities.

LLMs surpassing human performance

This progression suggests that LLMs are not only becoming more efficient but also more accessible, potentially democratizing high-quality medical advice and reducing disparities in healthcare delivery.

Implications for the healthcare industry

The integration of LLMs into healthcare has profound implications:

  • Enhanced efficiency. AI can process vast amounts of medical data swiftly, leading to quicker diagnoses and treatment plans.
  • Cost reduction. Automating routine tasks can lower operational costs, allowing healthcare providers to allocate resources more effectively.
  • Improved access. AI-driven tools can extend medical expertise to underserved regions, bridging gaps in healthcare availability.

However, these benefits come with challenges, including ensuring data privacy, maintaining ethical standards, and integrating AI seamlessly into existing healthcare systems.

Challenges in AI adoption in healthcare

While AI in healthcare holds incredible promise, the road to full-scale adoption is anything but smooth. From technical hurdles to ethical dilemmas, here are the key challenges slowing things down.

Data quality and availability

AI systems are only as good as the data they’re trained on. What about healthcare? Well, data in this sector is notoriously messy. Patient records often contain errors, are stored in incompatible formats, or are incomplete. Many AI models rely on structured, labeled datasets, but most real-world medical data is unstructured and siloed across different systems. The lack of standardized and interoperable data is a massive bottleneck.

Regulatory and legal barriers

Healthcare is one of the most tightly regulated industries in the world, and for good reason. Any AI that influences diagnosis, treatment, or patient outcomes may be classified as a medical device, requiring rigorous testing and FDA/EMA approval. This process can be lengthy, expensive, and discouraging for startups trying to move fast. Additionally, the legal framework around AI accountability is murky. For instance, who’s liable if the AI makes a mistake: the doctor, the developer, or the healthcare provider?

Explainability and trust

Doctors, patients, and regulators all want to know how an AI arrived at a particular recommendation. Unfortunately, many powerful models (especially deep learning and large language models) operate as black boxes, offering little visibility into their inner workings. This lack of explainability makes it hard for clinicians to trust AI output, particularly when patients’ lives are at stake.

Ethical concerns and bias

Bias in AI models can lead to unequal healthcare outcomes. If training data underrepresents certain populations (e.g., women, minorities, or patients in low-income regions), the AI may make inaccurate predictions or overlook critical conditions in those groups. There are also concerns about consent and transparency. For instance, are patients aware that AI is being used in their care? Did they opt in?

Integration into clinical workflows

Even the most accurate AI tools will fail if they don’t fit seamlessly into a clinician’s day-to-day workflow. Doctors are already overwhelmed with interfaces, checklists, and decision aids. Adding “yet another dashboard” often creates more friction than benefit. Successful integration requires thoughtful design, clinician training, and change management. In other words, it takes time.

Cybersecurity and privacy risks

AI systems need vast amounts of data, and that data is often sensitive. With the rise of cyberattacks targeting healthcare systems, any new digital pipeline becomes a potential vulnerability. Protecting patient data while still making it usable for AI training and inference is a tough balancing act, especially in cross-border scenarios where data sovereignty becomes a concern.

The competitive landscape and OpenAI’s position

OpenAI’s proactive approach with HealthBench positions it as a frontrunner in the healthcare AI domain. By establishing a comprehensive benchmark, OpenAI not only showcases its models’ capabilities but also sets a standard for evaluating AI in medical contexts. This move could potentially deter new entrants due to the high bar set for performance and safety.

Nevertheless, the healthcare AI field remains dynamic, with opportunities for innovation, especially in niche areas or through collaborations that combine AI expertise with medical specialization.

Is it worth entering the healthcare AI space?

Despite OpenAI’s advancements, the healthcare AI sector is vast and multifaceted. There are numerous opportunities for startups and researchers to make meaningful contributions, particularly in areas like rare disease diagnosis, personalized medicine, and patient engagement tools.

Moreover, the open-source nature of HealthBench allows for collaborative development and benchmarking, fostering an environment where diverse solutions can thrive. Engaging with this ecosystem can lead to innovations that complement existing models and address specific healthcare challenges.

How Mitrix can help

Here at Mitrix, we know the intricacies of developing medical software. Contact us to build secure, HIPAA-compliant solutions that enhance patient care, streamline workflows, and meet the highest industry standards.

Let’s illustrate this with one of our projects. Take a look at how we helped our client develop the healthcare medical record management platform that enables healthcare providers to track patient histories, monitor progress, and make informed decisions.

Mitrix healthcare project

Results:

  • 93% of delivery orders are completed by suppliers within 6 hours.
  • 64% faster order creation process achieved by automating patient record entry.
  • Up to 25% cost reduction in equipment spending through the use of an analytics system.
  • The velocity of settlements has increased twofold, leading to a reduction in floating capital.

From refining your concept to delivering a user-centric solution, we’re with you every step of the way. With our expertise, we don’t just build software but align technology with your business goals. Are you ready to bring your idea to life?

Summing up

The integration of LLMs into healthcare marks a transformative period in medical practice. With tools like HealthBench, the potential for AI to enhance diagnosis and treatment is becoming a reality. While OpenAI leads the charge, the field is ripe with opportunities for innovation and collaboration. As AI continues to evolve, its role in delivering efficient, accurate, and accessible healthcare is poised to expand, benefiting patients and providers alike.

Collaboration between clinicians, researchers, technologists, and policymakers is essential to build trustworthy AI systems rooted in ethical principles. Advancing AI in healthcare isn’t just about mastering smarter models: it requires ongoing research, transparent innovation, and strong interdisciplinary teamwork. When managed right, AI can transform healthcare delivery by enabling more accurate diagnoses, faster treatments, and broader access to personalized, high-quality care. What’s the payoff? Health systems that are not only more efficient but genuinely centered around better patient outcomes.

For more information on HealthBench and its implications, visit OpenAI’s official announcement: Introducing HealthBench.



You might also like

Artificial intelligenceSoftware development
How OpenAI o4-mini speeds up legacy system migration by 30%

Migrating legacy systems used to be a painful, months-long ordeal filled with brittle code, undocumented spaghetti logic, and developers whispering “Why?” into the void. But OpenAI’s latest model, o4-mini, is turning that script on its head. Specifically, its 30% faster code generation is proving to be a breakthrough for companies staring down the barrel of […]

Artificial intelligenceHiring & Talent
Not enough devs? Here’s how AI supercharges your tech team

Hiring developers is tough. And what about hiring senior ones? Even tougher. Whether you’re a startup or SME, the tech talent gap can feel like a wall between your product roadmap and reality. But here’s the good news: in 2025, AI is the extra set of hands (and brains) your team didn’t know it needed. […]

Artificial intelligenceBusiness intelligence
The hidden cost of free AI tools: 4 risks every founder misses

“If it’s free, you’re the product.” That old internet adage has never been more relevant than it is in the age of AI. As founders rush to adopt free AI tools to automate tasks, answer customer queries, or generate content, the promise is clear: fast value at zero cost. But here’s the thing: something “free” […]

AI agentArtificial intelligence
How Mitrix AI agents are capable of actions only humans can do

In 2025, the spotlight has moved beyond large language models (LLMs) to the rapid rise of autonomous AI agents. Many tasks we once thought required a human touch, such as interpreting context, making decisions, or coordinating between systems, are now within the grasp of AI agents. Created using modern agent-building platforms, they use machine learning […]

MitrixGPT

MitrixGPT

Ready to answer.

Hey, how I can help you?