LLMs are not ready to automate clinical coding, says Mount Sinai study

Laine Mongold May 1, 2024

8 2 minutes read

LLMs are not ready to automate clinical coding, says Mount Sinai study

A new study from Mount Sinai suggests that using generative artificial intelligence to help with coding automation has some significant limitations.

WHY IT MATTERS

For the research, Mount Sinai’s Icahn School of Medicine evaluated the potential application for large language models in healthcare to automate medical code assignments – based on clinical text – for reimbursement and research purposes.

The study compared LLMs from OpenAI, Google and Meta to assess whether they could effectively match the right medical codes to their corresponding official text descriptions.

To assess and benchmark the performance of GPT-3.5, GPT-4, Gemini Pro and Llama2-70b, researchers extracted more than 27,000 unique diagnosis and procedure codes from 12 months of routine care in the Mount Sinai Health System, excluding patient data.

“Previous studies indicate that newer large language models struggle with numerical tasks,” Dr. Eyal Klang, director of Icahn Mount Sinai’s Data-Driven and Digital Medicine Generative AI Research Program and senior co-author of the study, explained in an announcement last week.

“However, the extent of their accuracy in assigning medical codes from clinical text had not been thoroughly investigated across different models.”

In assessing whether the four available models could effectively match medical codes through qualitative and quantitative methods, the researchers determined all LLMs scored below 50% accuracy in generating unique diagnosis and procedure codes.

While GPT-4 performed the best in the study with the highest exact match rates for ICD-9-CM at 45.9%, ICD-10-CM at 33.9% and CPT codes at 49.8%, “unacceptably large” errors remained.

The researchers said GPT-4 produced the most incorrectly generated codes, while GPT-3.5 had the greatest tendency to be vague, identifying more general rather than precise codes.

The study results, which the New England Journal of Medicine AI published last week, led the researchers to caution that the performance of LLMs in real-world medical coding could have worse results.

“LLMs are not appropriate for use on medical coding tasks without additional research,” the researchers said in the report.

“While AI holds great potential, it must be approached with caution and ongoing development to ensure its reliability and efficacy in healthcare,” Dr. Ali Soroush, assistant professor of D3M and medicine, cautioned in a statement.

Mount Sinai noted that the researchers will look to develop tailored LLM tools for accurate medical data extraction and billing code assignment.

THE LARGER TREND

Despite the findings of the Mount Sinai study, others see value in AI-enabled coding, and say AI systems can help physician groups avoid missing revenue opportunities and elevate their documentation compliance.

Dr. Bruce Cohen, a surgeon and former CEO at OrthoCarolina in Charlotte, North Carolina.

“As annual coding requirements are instituted, an AI-based system will integrate and implement those changes in real-time,” Dr. Bruce Cohen, a surgeon and former CEO at OrthoCarolina in Charlotte, North Carolina, told Healthcare IT News.

AI-based systems do not eliminate coders’ jobs, he added: “It expands the oversight and accuracy of every charge going out based on evaluation and management coding.”

ON THE RECORD

“Our findings underscore the critical need for rigorous evaluation and refinement before deploying AI technologies in sensitive operational areas like medical coding,” Soroush asserted in a statement about the Mount Sinai research.

“This study sheds light on the current capabilities and challenges of AI in healthcare, emphasizing the need for careful consideration and additional refinement prior to widespread adoption,” added Dr. Girish Nadkarni, director of The Charles Bronfman Institute of Personalized Medicine and system chief of D3M.

Andrea Fox is senior editor of Healthcare IT News.
Email: [email protected]
Healthcare IT News is a HIMSS Media publication.

LLMs are not ready to automate clinical coding, says Mount Sinai study

Laine Mongold

HBO’s ‘The Last of Us’ Review: The Greatest Video Game Adaptation Ever Made

The Emergence of New Technologies: Blockchain and Cryptocurrency

The Potential of 5G Networks in Transforming Connectivity and Communication

The Growth of the Internet of Things (IoT) and its Impact on Daily Life

Reese Witherspoon, Her Mom, and Her Daughter Could Pass for Triplets in Latest Instagram Post

AMD finally compares Radeon RX 7900 XT and Radeon RX 7900 XTX with NVIDIA GeForce RTX 4080 and shares new gaming benchmarks

Laine Mongold

With Product You Purchase

Subscribe to our mailing list to get the new updates!

Scorpio Tarot Horoscopes: May 2024

Is Cam Ward the next great Miami quarterback?

Related Articles

What Does the Word ‘Himbo’ Mean?

Indian states ban cough syrup linked to child deaths

‘You’re Giving Yourself Expensive Urine’: What We Heard This Week

Want Bigger, Stronger Arms? Use the Biceps Cable Curl to Finish Off Your Workouts.