AI & Technology

AI in Healthcare: What's Actually Working (and What Isn't Yet)

Separating the clinical breakthroughs from the vendor hype — a clear-eyed look at where AI is genuinely transforming medicine and where it still falls short.

AIhealthcaremachine learning

Healthcare is one of the most consequential domains where AI is being deployed, and also one of the most prone to inflated expectations. A press release about an AI model that "detects cancer better than radiologists" is often true in a narrow and unrepresentative setting, and misleading in the clinical reality where it would actually be deployed. At the same time, dismissing the progress is also wrong — some of the advances in medical AI over the last five years are genuinely remarkable and will save lives at scale.

This article is an attempt at honest accounting: where has AI made real, validated, clinically deployed progress, and where is the gap between demonstration and deployment still wide?

Where AI Has Genuinely Delivered

Medical imaging is the area with the strongest and most validated progress. Deep learning models trained on millions of labeled images have reached or exceeded specialist-level performance on specific, well-defined diagnostic tasks.

Google's DeepMind demonstrated that a model trained on retinal scans could detect over 50 eye diseases with diagnostic accuracy equivalent to world-leading specialists. That model, now deployed in clinics in the UK and elsewhere, is helping address a real shortage: there are not enough ophthalmologists in the world to screen everyone at risk for diabetic retinopathy. AI acts as a force multiplier, handling high-volume screening so specialists can focus on confirmed cases.

In radiology, FDA-approved AI tools are now integrated into imaging workflows at major health systems. Tools from companies like Aidoc flag potential pulmonary embolisms, intracranial hemorrhages, and aortic dissections in CT scans, alerting radiologists to critical findings that might otherwise wait hours in a review queue. Studies of emergency department deployments show measurable reductions in time-to-treatment for life-threatening conditions.

Pathology is following a similar trajectory. Paige, a company with FDA clearance, has built models that detect prostate cancer in biopsy slides with higher sensitivity than pathologists reviewing under time constraints. Memorial Sloan Kettering has deployed AI pathology tools in clinical workflows. The efficiency gains are real: a pathologist reviewing 200 slides a day can't give equal attention to all of them. AI can flag the slides most likely to contain abnormalities, directing human attention where it matters.

Drug discovery has been transformed more rapidly than almost any other area. AlphaFold2, released by DeepMind in 2020, effectively solved the protein folding problem — predicting the 3D structure of proteins from their amino acid sequences with accuracy that took decades of experimental work to achieve previously. The structural biology community released a collective breath and then got to work; the AlphaFold database now contains predicted structures for virtually every known protein, a resource that's accelerating drug target identification across dozens of diseases.

Isomorphic Labs (a DeepMind spinout) and companies like Recursion Pharmaceuticals are using AI to identify potential drug candidates from massive molecular libraries at a scale and speed that traditional computational chemistry couldn't approach. It's too early to call these companies validated — the pipeline from target to clinical trial to FDA approval takes over a decade — but the early-stage evidence is credible.

Clinical documentation may seem unglamorous compared to diagnostics, but the burden of documentation is a primary driver of physician burnout. The average physician spends 2 hours on documentation for every 1 hour of patient care. AI-powered ambient documentation tools — which listen to the patient-physician conversation and generate structured clinical notes — are being rapidly adopted. Nuance (owned by Microsoft) and Suki are seeing broad health system deployments, and early physician satisfaction data is consistently positive. This is unsexy but high-impact.

Where the Hype Outpaces the Reality

Sepsis prediction provides an instructive cautionary tale. Epic's Sepsis Model, deployed across hundreds of hospitals, predicted which patients would develop life-threatening sepsis — a condition where every hour of delayed treatment increases mortality. The algorithm showed strong performance in internal validation studies.

Independent researchers analyzing real-world deployments found something different: false positive rates so high that clinical staff began ignoring the alerts entirely, a phenomenon known as alert fatigue. A 2021 study in JAMA Internal Medicine found the model missed a substantial portion of true sepsis cases while generating far more alerts than it had in validation. The model had trained on data from one institution and generalized poorly to others.

This gap between controlled validation and messy real-world deployment is the central challenge in clinical AI. Models are validated on clean, curated datasets. Hospitals have inconsistent data entry, incomplete records, and demographic distributions that don't match training data. A model that achieves 95% AUC in a controlled trial can perform dramatically worse in clinical practice.

Mental health AI is a particularly concerning area. Chatbots marketed as mental health support — some of which have had millions of users — have been deployed with minimal clinical validation, mixed evidence, and documented harms including inappropriate responses to crisis situations. The regulatory framework has lagged the market. The FDA has cleared tools for narrow, specific applications (like AI-assisted therapy for specific phobia), but the broader wellness-chatbot space operates largely outside clinical oversight.

Rare disease and underrepresented population performance remains poor. Most medical AI has been trained predominantly on data from large academic medical centers in the US and Europe, which skews heavily toward certain demographics. Studies have repeatedly found that diagnostic AI performs significantly worse for darker skin tones (particularly in dermatology), for female patients (who were historically underrepresented in cardiovascular imaging training sets), and for patients with atypical presentations. Deploying these tools at scale without addressing these biases risks automating and amplifying existing healthcare disparities.

The Infrastructure Problem No One Talks About

Even validated AI tools face significant deployment barriers in healthcare. Electronic health record (EHR) systems — Epic, Cerner, Meditech — were not designed for AI integration. Getting an AI model to reliably receive the right input data from an EHR in real time is an engineering challenge that absorbs enormous resources.

Regulatory approval adds another layer. The FDA's classification of AI-based diagnostic tools as Software as a Medical Device (SaMD) means full 510(k) clearance or De Novo authorization for clinical applications — a process that takes 12-18 months even for straightforward submissions. Many promising AI tools are stuck in this pipeline for years.

The path forward likely involves three things: tighter collaboration between AI developers and the clinical teams who will actually use the tools, investment in data infrastructure that allows models to be updated as clinical environments change, and regulatory frameworks that can distinguish between high-risk diagnostic tools (which should face rigorous review) and lower-risk workflow assistance tools (which can be adopted more rapidly).

The Bottom Line

Medical AI is not magic, and it's not hype. It's a category of tools that, applied in the right contexts with appropriate validation and oversight, are demonstrably improving patient outcomes. The imaging applications in particular are mature enough to be reliable. The drug discovery applications are early but credible. The ambient documentation tools are already reducing one of the most persistent quality-of-life problems in clinical practice.

The failures are also real. Algorithmic bias, poor generalization across institutions, and the deployment of inadequately validated consumer tools are all serious problems that require serious responses. The answer is not to abandon the technology but to build the evaluation and deployment frameworks that the technology's consequences demand.

The stakes are high enough to warrant both ambition and rigor.

AIhealthcaremachine learningmedical technology