The scarcity of subspecialist medical expertise poses a considerable challenge for healthcare delivery. This issue is particularly acute in cardiology, where timely, accurate management determines outcomes.
The scarcity of subspecialist medical expertise poses a considerable challenge for healthcare delivery. This issue is particularly acute in cardiology, where timely, accurate management determines outcomes.
We explored the potential of Articulate Medical Intelligence Explorer , a large language model-based experimental medical artificial intelligence system, to augment clinical decision-making in this challenging context. We conducted a randomized controlled trial comparing large language model-assisted care with the usual care of complex patients suspected of having a genetic cardiomyopathy, and we curated a real-world dataset of complex cases from a subspecialist cardiology practice. Nine participating general cardiologists were provided with access to both clinical text reports and raw diagnostic data—including electrocardiograms, echocardiograms, cardiac magnetic resonance imaging scans and cardiopulmonary exercise testing—and were randomized to manage these cases, either with or without assistance from AMIE. We developed a ten-domain evaluation rubric used by three blinded subspecialists to evaluate the quality of triage, diagnosis and management. In our randomized controlled trial with retrospective patient data, subspecialists favored large language model-assisted responses overall, and for the management plan and diagnostic testing domains, with the remaining domains considered a tie. Overall, subspecialists preferred AMIE-assisted cardiology assessments 46.7% of the time, compared with 32.7% for cardiologists alone , with 20.6% rated as a tie. Subspecialists also quantified errors, extra and missing content, reasoning and potential bias. Cardiologists alone had more clinically significant errors than cardiologists assisted by AMIE. Lastly, cardiologists who used AMIE reported that AMIE helped their assessment more than half the time and saved time in 50.5% of cases.. The World Health Organization predicts a deficit of 18 million providers by 2030, with shortages being most acute in resource-limited and rural areas. This disparity is exacerbated for rarer and more complex conditions, particularly those for which timely treatment prevents morbidity and mortality. For instance, hypertrophic cardiomyopathy is one of the leading causes of sudden cardiac death in young adults, cardiac conditions such as HCM exemplify an urgent unmet need in healthcare delivery, namely timely and widely available access to subspecialist expertise. While cardiac conditions serve as an indicative example, the consequences of delayed access to subspecialist care are profound across all specialties, often resulting in increased morbidity and mortality, as patients miss critical diagnostic and treatment windows. Navigating the cascade of referrals required to access subspecialist expertise creates undue stress and anxiety while presenting a time-consuming and resource-intensive process for both patients and healthcare providers.as assistive tools for summarization and communication. Despite the potential of LLMs to enhance medical expertise, rigorous assessment of their performance remains scarce in medical specialties, with few openly available datasets for model evaluation and almost no randomized controlled trials performed. It remains unclear whether LLMs possess the nuanced understanding and intricate knowledge base required to effectively replicate the decision-making process of experts in highly specialized medical fields This study probes the potential of LLMs to democratize subspecialist-level expertise by focusing on an indicative example: the domain of genetic cardiomyopathies like HCM. We conducted an RCT in which cardiologists clinically assessed patients using comprehensive real-world clinical data with artificial intelligence assistance. This included both physician text reports and data for electrocardiograms , transthoracic echocardiograms , cardiac magnetic resonance imaging scans and cardiopulmonary stress tests . Compared to earlier studies relying on simulated or text-only data, our design represents a substantial advancement toward real-world clinical applicability. We introduce an open-source dataset encompassing cardiac testing and genetic information from real-world patients at the Stanford Center for Inherited Cardiovascular Disease , enabling further research in this specialized field. We use Articulate Medical Intelligence Explorer , an LLM-based system built upon Gemini 2.0 Flash, to generate detailed assessments of patients with suspected complex cardiovascular disease. We propose detailed rubrics that subspecialists used to compare and evaluate the quality of diagnosis, triage and clinical management proposals for complex cardiology cases. Finally, we explore the potential of LLMs to upskill general cardiologists by evaluating whether interaction with AMIE improves their clinical decision-making. In our RCT, general cardiologists’ clinical assessments aided by AMIE were preferred overall, saved time in 50.5% of cases and had fewer clinically significant errors and fewer omissions of important content.shows the study overview in which 107 consecutive patients were assessed by two general cardiologists, one with and one without AMIE assistance, using the interface shown in Extended Data Fig.Text reports from the cardiac testing data of 107 patients with suspected genetic cardiovascular disease were provided to AMIE, and AMIE completed the assessment form listed in Extended Data Fig.. For the RCT, a pool of general cardiologists was randomized to either AI assistance or no assistance across the 107 cases, with each case being completed by two general cardiologists . All general cardiologists had access to text reports from the cardiac testing data, as well as the raw multimodal artifacts. The general cardiologists in each arm completed the assessments and additional questions listed in Extended Data Fig.. In the arm with AI assistance, these cardiologists could view AMIE’s assessment and interact live with the system over a text-based chat interface . Blinded subspecialist cardiologists from SCICD provided individual ratings and direct preferences between the assessments produced from each arm using the forms listed in Extended Data Fig.The median age of the patients was 59 years . The number and percentage of patients with available clinical text data for each test were as follows: CMR 64 , CPX 65 , resting TTE 90 , exercise TTE 69 , ECG 99 , ambulatory Holter monitor 79 and genetic testing 77 . Of the 107 patients, 39 had a variant adjudicated to be pathogenic or likely pathogenic as per the interpretation of variant criteria by the American College of Medical Genetics and Genomicsexpressed a favorable view of LLM integration into their clinical workflows. In a majority of cases, cardiologists reported that the AI improved their clinical assessments , while only 12.1% indicated it was unlikely to be helpful. Similarly, in 52.3% of cases, the general cardiologists stated that the AI increased their confidence in decision-making , while only 14.9% reported any decrease in confidence. With respect to efficiency, the general cardiologists indicated time savings by the AI in 50.5% of cases; notably, they reported saving more than 50% of their time in 23.4% of these cases. Only 18.7% of the cases noted any delays attributable to AI use, and the incidence of AI hallucinations was low. In 91.6% of cases, there were no reported hallucinations, while in 6.5% of cases, a likely clinically significant hallucination was observed. Similarly, in 93.5% of cases, the general cardiologists reported that the LLM did not miss anything . Similarly, for the diagnosis domain related to further testing, AMIE-assisted responses were preferred for 43.9% of the cases compared with 30.8% for the unassisted . General cardiologist responses alone were not preferred across any of the domains, although their responses were considered equivalent to the AMIE-assisted responses across the remaining domains, including consult question, triage, diagnosis and further diagnostic questions for the patient . Similarly, there was significantly less missing content for AMIE-assisted responses: 17.8%, compared with 37.4% for unassisted responses . There was no statistically significant difference in extra content between the responses, and both responses contained equivalent clinical reasoning steps and a lack of demographic bias , unwarranted assumptions regarding patient demographics, such as gender when such information was not provided, and misinterpretation of quantitative measurements, including confusion between exercise and resting aortic parameters. The identified errors ranged from subtle misinterpretations to more significant hallucinations, wherein AMIE generated medical information unsupported by the original text-imaging reports. Notably, these hallucinations were often amenable to correction when challenged by the cardiologists. For instance, when AMIE initially fabricated the presence of left ventricular hypertrabeculation, direct questioning by the general cardiologist prompted the system to self-correct and acknowledge that no abnormal trabeculation was present in the imaging report data. The general cardiologists provided qualitative feedback on seven examples where the LLM missed clinical information . These comments highlighted AMIE’s tendency to sometimes overlook or inadequately process existing diagnostic information, including failing to recognize that CMR and stress echocardiography had already been completed, providing insufficient detail about important findings such as regional wall motion abnormalities and trabeculations on echo, and missing critical historical information such as prior myocardial infarction mentioned in stress tests. Certain omissions resulted from AMIE’s lack of access to imaging test dates, leading to erroneous clinical interpretations. For instance, AMIE incorrectly concluded that discrepancies existed between left ventricular ejection fraction measurements obtained via CMR and TTE, asserting that one dataset contained errors. However, chart review by the general cardiologists revealed that both left ventricular ejection fraction measurements were accurate, with the apparent discrepancy attributable to a therapeutic intervention that occurred between the two studies—information that was not accessible to AMIE. Lastly, AMIE made redundant recommendations for tests already completed . Overall, omissions were considered mild and minimal, in line with systematic quantitative feedback that showed AMIE did not miss anything in 93.4% of cases.The subspecialist cardiologists provided 138 comments regarding their preference ratings between the assisted and unassisted responses, spanning 43 of the 107 cases. They also left specific comments on 55 of the 107 assessments where cardiologists had AMIE assistance and 69 of the 107 assessments without assistance, giving a total of 184 informative comments across the individual assessments. We used Gemini 2.5 Pro to categorize and summarize this feedback, with the prompt and full generated report shown in Supplementary Section In summary, the AI-assisted cardiologists were preferred more often, seemingly driven by their more comprehensive responses, fewer omissions and incorporation of modern diagnostics and advanced treatments. However, some responses were also flagged due to excessive detail, diagnostic overreach and occasional leaps in logic. The unassisted cardiologists were often praised for clear and concise reasoning but tended to have clinically significant omissions and errors related to key management decisions. Many of the cited types of errors, such as missing and/or incorrect diagnoses or screening and/or testing errors, were seen with similar frequencies in both study arms, suggesting that AI assistance did not have a consistent impact in those areas.In this study, we probe the ability of LLMs to provide additive support to generalists in the assessment of rare, life-threatening cardiac diseases that typically require subspecialty cardiac care. Further, we address the unmet need for the randomized evaluation of LLMs for challenging medical applications. To this end, we curate an open-source, de-identified, real-world clinical dataset for patients suspected to have inherited cardiomyopathies and propose an evaluation rubric for the quality of diagnosis, triage and clinical management of such patients. Blinded subspecialists employed this evaluation rubric to assess clinical assessments performed by general cardiologists, both with and without LLM assistance. The blinded evaluation by the subspecialist cardiologists demonstrated an overall preference for LLM-assisted clinical assessments. Specifically, the subspecialty cardiologists found that AMIE-assisted clinical assessments demonstrated fewer clinically significant errors and missed less important content while maintaining equivalent clinical reasoning quality and not introducing erroneous extraneous information. Furthermore, general cardiologists who utilized AMIE reported that the system helped their assessments in more than half of cases , did not miss clinically significant findings in 93.5% of cases and reduced assessment time in over half of cases . Our results demonstrate the feasibility of using LLMs to assess patients with rare and life-threatening cardiac conditions. Adapting AMIE to this subspecialist and rarified domain was highly data-efficient, leveraging iterative feedback from subspecialist experts to enhance the quality of AMIE’s responses using just nine cases. This iterative process, combined with a self-critique and the incorporation of search functionality, enabled AMIE to assist general cardiologists in upskilling their clinical assessments to a preferred level. This contrasts with earlier studies using generic, nonspecialized LLMs, which did not achieve comparable clinical performance Our RCT results indicate that LLMs can assist general cardiologists in diagnosing and managing complex cardiac patients. Our evidence suggests that LLMs could help bridge unmet needs in genetic cardiovascular disease and possibly in cardiac care more broadly. While further research could extrapolate our approach to a broader group of specialties, cardiology is a useful indicative example because it features highly preventable morbidity and mortality, a reliance on an array of clinical investigations and a substantial deficit in the cardiology workforce. Our findings are particularly noteworthy because access to subspecialist care is a global challenge. The American College of Cardiology has identified a"cardiology workforce crisis," with the lack of access to subspecialty cardiologists an acute concern. In the USA, despite five HCM centers of excellence in both California and New York, there are none across 27 states. This has led to more than 60% of patients with HCM in the USA being undiagnosed, with estimates higher globally. The propensity of inherited cardiomyopathy to cause sudden cardiac death , exacerbates the problem. Lack of access to appropriate care and long wait times can lead to preventable, premature mortality. LLMs may help identify undiagnosed cases, assist with the triage and prioritization of urgent cases, and streamline management. In this way, LLMs could improve access to specific care by assisting generalists. For researchers, our results have a number of implications. First, we have made our data openly available, facilitating the rigorous evaluation of our results and providing data for other models to use in their tests. We have also created and validated a 10-domain evaluation rubric that may be used for future studies. More broadly, this study demonstrates the feasibility of conducting RCTs to evaluate LLMs, establishing a gold-standard evidence framework that should guide future research in this domain. Currently, LLMs are being used in many US health systems via their implementation in electronic medical records software. This implementation has occurred without a similar scale of scientific evaluation; the benefits and the possible harms are only partially knownThe implications for clinicians are twofold. First, our results show a clear, albeit modest, improvement in overall clinical assessment quality. The significantly fewer errors and extra content, as well as the significant improvement in management plan quality, gives insights into how LLMs can assist clinically. The general cardiologists demonstrated high precision in diagnostic accuracy and triage decisions; however, the nuanced clinical management of complex patients was associated with increased omission errors compared to the AMIE-assisted assessments. These clinical improvements were accompanied by enhanced efficiency and increased clinician confidence. As such, we present RCT-level evidence for LLMs improving clinical care overall, specifically driven by improvements in management and a reduction in clinical errors and erroneous extra content, with simultaneous improvements in the time and confidence of providers. It seems premature to deploy LLMs autonomously, and our RCT did not address this design directly, as general cardiologists reported that 6.5% of AMIE’s responses contained clinically significant hallucinations. Reassuringly, we showed that when LLMs are deployed with cardiologist oversight, hallucinations are most often identified, and this combination results in overall fewer errors and more preferred assessments. We qualitatively explored the nature of these hallucinations and found that the general cardiologists often described them as ‘mild’, ranging from assuming the patient’s sex to hallucinating the presence of a CMR feature . Notably, the general cardiologists found that when asked about the hallucination, AMIE would correct itself. Our study addresses a meaningful wider gap in earlier literature. Prior research has evaluated LLMs in a number of different settings in medicine, from assessing quality in question-answering to clinical image and complex diagnostic challenges. There is a paucity of prior RCTs in medicine and cardiology. Despite more than 500 observational LLM papers published in 2024, systematic reviews of LLMs in medicine have consistently shown a lack of RCTs, while another concluded that “randomized trials or prospective real-world evaluations are needed to establish the clinical utility and safety of model use” and that “real-world trials addressing this possibility remain sparse”randomized physicians to use GPT-4 versus conventional resources alone for diagnostic reasoning on just six non-real-world clinical vignettes and found no significant improvement in diagnostic performance , with management decisions improving by 6.1% and diagnostic decisions by 12.1%, demonstrating that LLMs may be more effective for treatment planning than initial diagnostic reasoning, in line with our results.evaluated ChatGPT in providing accurate cancer treatment recommendations concordant with authoritative guidelines with fixed question prompts. Another study investigated the diagnostic and triage accuracy of the GPT-3 relative to physicians and laypeople using synthetic case vignettes of both common and severe conditionscompared GPT-4 performance with human experts in answering cardiology-specific questions from general users’ web queries. Our study is not only one of the first RCTs of LLMs in subspecialty domains, it is also, to our knowledge, one of the first to use real-world data and to make this data for LLM evaluation available open-source.. A recent study showed the potential and safety concerns of using LLMs to provide an on-demand consultation service that assists clinicians’ bedside decision-making based on patient electronic health record data. A 2024 study assessed the ability of LLMs to diagnose abdominal pathologies and showed that LLMs were inferior to clinicians, though that study was not an RCT, and the authors noted that their results may be improved with fine-tuned LLMs. Although we did not fine-tune for this particular downstream task, our approach, which included using a general-purpose LLM equipped with web search and a multistep reasoning chain at inference time, may help explain our contrasting results. Our study contains a number of important limitations, and the findings should be interpreted with appropriate caution and humility. First, our LLM system was constrained to reviewing text-based reports of investigations rather than the raw multimodal investigations themselves. This presents the possibility of upstream errors; however, we attempted to mitigate this by allowing cardiologists in both assisted and unassisted groups to review the raw imaging and clinical data themselves; general cardiologists noted a clinically significant omission in the text reports in fewer than 8% of cases. History and physical examination are indispensable components of real clinical practice, but they were not included in this study. This is a limitation of the applicability of our work, and future studies should consider the settings in which there is prospective interaction with these patients. However, we did offer cardiologists the ability to interact with our LLM. While our study was conducted on real patient cases, we do not consider LLMs ready for safe deployment and thus we did not deploy our LLM into live, prospective clinical care. If safety standards are met, future studies should assess the performance of LLMs in live, prospective clinical care. An additional limitation is that cardiologists were not blinded to their intervention assignment, thereby introducing a potential performance bias that may have influenced cardiologists’ subjective reports about LLM usefulness and time savings. However, our primary efficacy outcomes were protected from this bias through blinded subspecialist evaluation, and the subjective measures of user experience represent clinically meaningful assessments of technology acceptability that are relevant for real-world implementation. Further, our study relies on subspecialist preference as the primary outcome measure, which introduces inherent subjectivity despite our expert-developed evaluation rubrics. While this approach aligns with recent RCTs of AI-assisted clinical decision-making, preference-based evaluation cannot definitively establish real-world clinical benefit. Evaluating downstream patient outcomes would require prospective studies with long-term follow-ups that are beyond the current scope. Further limitations of our work include a biased sample of patients—patients were selected from one US center, using only English text. It is unclear how well our results will extrapolate to other non-US settings. Additionally, subspecialist evaluators were from the same institution where AMIE’s prompt engineering was developed, potentially introducing institutional bias, though this is mitigated by the use of different specialists for development versus evaluation along with minimal, held-out examples . Further, our dataset contained patients who were indeed referred for a suspicion of an inherited cardiac disease . A less biased population may be from a general cardiology clinic, where the prevalence of inherited disease is lower and with it possibly a higher chance of a false positive referral rate. However, this patient selection strategy was intentional and aligned with our research objective of evaluating whether general cardiologists could appropriately manage cases they would typically refer to subspecialty care when supported by an LLM. A similar limitation is that our patients had already completed a number of cardiac diagnostic tests. To help identify undiagnosed cases, LLMs would have to be studied in populations with less complete cardiac investigations. There was insufficient demographic or regional variation in the single-center population in our study to assess the potential for bias or health inequity, which is an important topic for AI systems in healthcare. This limitation is important, as disparities are well documented in the care of patients with inherited cardiomyopathiesand should be addressed in prospective studies. A further consideration in the implementation of AMIE is the risk of automation bias, where clinicians may overly rely on AI outputs without sufficient scrutiny, potentially leading to inappropriate or unnecessary tests and management decisions. This bias has implications for patient safety, as it may result in increased healthcare costs, procedural risks and heightened patient anxiety. While AMIE demonstrated potential in enhancing cardiologists’ assessments, its use as a clinical aid requires careful oversight to prevent overreliance. For example, AMIE’s sensitive and detailed suggestions could lead to additional tests that are not clinically indicated. To mitigate these risks, clinicians interacting with AMIE must receive appropriate training to critically evaluate its outputs, ensuring they supplement clinical judgment rather than replace it. Additionally, our research did not explore the potential benefits and risks from the perspective of patients. The early potential here demonstrates an opportunity for participatory research including the patient perspective on many potentially different workflows that could be enabled for subspecialist consultation. While AMIE’s performance was promising, our evaluation rubric highlighted notable areas for improvement, including the diagnosis and triage. The complementary and assistive utility of the technology requires extensive further study before it could be considered safe for real-world use, and there are many other considerations beyond the scope of this work, including regulatory and equity research and validation in a wider range of clinical environments. In conclusion, AMIE, a research LLM-based AI system, can improve general cardiologists’ assessments of complex cardiac patients. Assistance from AMIE led general cardiologists to have significantly fewer errors, faster assessments, lower rates of erroneous extra content and equivalent clinical reasoning.. In this study, AMIE was built on top of Gemini 2.0 Flash without any additional domain-specific fine-tuning. Instead, AMIE used a multistep inference procedure involving web search and self-critique to adapt it to this subspecialist domain. Further details on the inference procedure are provided in Supplementary Section. The study comprised several phases: recruitment and de-identification of clinical data from a subspecialized inherited cardiovascular center, completion of an assessment of real clinical cases by general cardiologists and subspecialist evaluation and analysis of clinical assessments who were blinded to the source of each assessment. For the RCT, the general cardiologists were tasked with interpreting clinical text and raw clinical data from real-world patients, including ECGs, rest and stress TTEs, CMRs, ambulatory Holter monitors and CPXs. The general cardiologists were randomized to complete this assessment with assistance from AMIE or without. Assistance consisted of access to AMIE’s comprehensive clinical assessment report and the ability for general cardiologists to conversationally interact with AMIE directly in a web interface. The user interfaces are shown in Extended Data Fig. The data for this study were obtained from patients referred to SCICD, encompassing patients with both suspected and confirmed inherited cardiovascular diseases and general cardiology patients. To facilitate scientific progress and reproducibility of our results, we have made all data publicly available . Our RCT followed the CONSORT RCT guidelines and is registered at ClinicalTrials.gov . We utilized 107 real-world patient cases in the test set for the RCT after first exposing the model to nine different patient cases for model refinement. No cases used in the model refinement were used for model testing.For our RCT, we exposed general cardiologists to text and raw data from 107 consecutive, real-world patients. General cardiologists were selected from a pool of nine Stanford general cardiologists, with two general cardiologists assessing each patient. The test population consisted of patients suspected or confirmed to have inherited cardiovascular disease as well as a mixture of patients without genetic cardiovascular disease. The data consisted of physician text reports and the raw data for CMRs, rest and stress TTEs, CPXs, ECGs and ambulatory Holter monitors. General cardiologists were tasked with completing the same standardized assessment form shown in Extended Data Fig.. One of the two general cardiologists was randomized to complete the assessment with the assistance of AMIE. Assistance consisted of access to AMIE’s completed assessment form and a conversational web interface where the general cardiologist could interact with AMIE.) prompted general cardiologists to provide their assessment across a range of domains, including triage, diagnosis and management of these patients with potential inherited cardiovascular disease. The general cardiologists were asked to provide an overall impression of the patient’s case and answer a consult question regarding the likelihood of a genetic cardiac disease. The Triage Assessment section prompted the general cardiologists to determine the necessity of referral to a specialist center. In the Diagnosis section, they were asked to list their most likely diagnosis along with any additional information they would need to ask the patient or gather from tests. In the Management section, they were asked to describe their management plan and any additional information they would need to guide their management. Lastly, the general cardiologists were asked whether the raw imaging contained any findings that were missing in the text reports, and if so, to comment on whether this missed information was clinically significant or insignificant. If AMIE assistance was provided, the general cardiologists were asked whether the AI use was helpful, made them more confident, saved them time and had hallucinations or omissions. They were encouraged to leave free-text comments regarding their use of AI and any hallucinations or omissions.The general cardiologists’ assessments were evaluated by one of three SCICD subspeciality cardiologists who evaluated pairs of general cardiologist assessments on the same patient . The subspecialists were blinded to the source of each assessment, and the assessments were provided in a randomized order. Subspecialists completed two types of evaluation per case: direct A/B preference comparisons between the two responses and individual assessments per response, as described inWe developed rubrics for subspecialists to evaluate the responses under two conditions: a direct A/B comparison of general cardiologist’s responses with and without AMIE assistance both overall and across each domain, and an individual evaluation of each general cardiologist’s response.), the domains of the evaluation rubric mirrored the clinical assessment form completed by general cardiologists. For each domain, the subspecialist evaluators indicated a direct preference between the two assessments, one completed independently by a general cardiologist and one completed by a general cardiologist with assistance from AMIE. Evaluations were completely blinded. We designed this preference comparison with a third option for a tie to facilitate greater potential discrimination in performance between the general cardiologists as compared with Likert scales, while also allowing equivalence to be expressed. To investigate more nuanced qualities of each response, the subspecialists also evaluated the general cardiologist’s responses individually. To do this, we developed an individual evaluation rubric , which in our case led to five major themes. Instructions for evaluators were written and piloted using the nine development cases. These cases were piloted and used as worked examples in interactive feedback sessions with expert evaluators. Feedback on instructions and the evaluation rubric was sought from experts and changes implemented if concordance from experts was present. The above approach led to iterative improvement of our individual evaluation rubric, spanning five crucial domains of LLM evaluation: errors, addition, omission, reasoning and intelligence and bias. Subspecialist experts were asked to answer “Yes” or “No” to direct questions spanning these five domains and then given free-text responses to quantify and explain their selections.To test the hypothesis that the assisted condition outperformed the unassisted condition , we employed several statistical analyses. For subspecialist preference ratings , we used two-proportion-tests to compare the selection frequencies of cardiologist + AMIE versus cardiologist alone for each criterion. To analyze individual criteria , McNemar’s tests were performed on 2 × 2 contingency tables of paired “Yes”/“No” responses from both conditions. Error bars in Fig.The clinical subspecialist evaluator component of this research involved the participation of physicians. This study adhered to the principles outlined in the Declaration of Helsinki. Informed consent was obtained from each physician before their participation. This study used only retrospective, de-identified data that fell outside the scope of institutional review board oversight.Data consists of clinical test text data . All data are open-source and are available atAMIE was built on top of Gemini 2.0 Flash, which is publicly available for use, without any additional fine-tuning. The inference process used to generate assessments is detailed in Supplementary SectionHealth Workforce Requirements for Universal Health Coverage and the Sustainable Development Goals.Charlton, M., Schlichting, J., Chioreso, C., Ward, M. & Vikas, P. Challenges of rural cancer care in the united states.Writing Committee Members et al. 2024 AHA/ACC/AMSSM/HRS/PACES/SCMR Guideline for the Management of Hypertrophic Cardiomyopathy: a report of the American Heart Association/American College of Cardiology Joint Committee on Clinical Practice Guidelines. Massera, D., Sherrid, M. V., Maron, M. S., Rowin, E. J. & Maron, B. J. How common is hypertrophic cardiomyopathy… really?: disease prevalence revisited 27 years after cardia.Maddox, T. M., Fry, Edward T. A. & Wilson, B. H. The cardiovascular workforce crisis: navigating the present, planning for the future.Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the american college of medical genetics and genomics and the association for molecular pathology.Schoonbeek, R. et al. Completeness, correctness and conciseness of physician-written versus large language model generated patient summaries integrated in electronic health records. Preprint atLuk, D. W. A., Chin Tung Ip, W. & Shea, Y.-F. Performance of GPT-4 and GPT-3.5 in generating accurate and comprehensive diagnoses across medical subspecialties.Omar, M., Nadkarni, G. N., Klang, E. & Glicksberg, B. S. Large language models in medicine: a review of current clinical trials across healthcare applications.Santos, J. F., Ladeiras-Lopes, R., Leite, F. & Dores, H. Applications of large language models in cardiovascular disease: a systematic review.Ahmed, M., Lam, J., Chow, A. & Chow, C.-M. A primer on large language models and ChatGPT for cardiovascular healthcare professionals.Stephen, R. A., Dobbs, T. D., Hutchings, H. A. & Whitaker, I. S. Using ChatGPT to write patient clinic letters.Cox, A., Seth, I., Xie, Y., Hunter-Smith, D. J. & Rozen, W. M. Utilizing ChatGPT-4 for providing medical information on blepharoplasties to patients.Giannos, P. Evaluating the limits of AI in medical specialisation: ChatGPT’s performance on the UK Neurology Speciality Certificate Examination.Ka-Lok Tao, B., Hua, N., Milkovich, J. & Micieli, J. A. ChatGPT-3.5 and Bing Chat in ophthalmology: an updated evaluation of performance, readability, and informative sources.Suchman, K., Garg, S. & Trindade, A. J. Chat generative pretrained transformer fails the multiple-choice American College of Gastroenterology self-assessment test.Kim, J., Yang, J., Wang, K., Weng, C. & Liu, C. Assessing the utility of large language models for phenotype-driven gene prioritization in rare genetic disorder diagnosis.We would like to thank J. Lugo for his instrumental support. We would also like to thank J. Gortat and S. Edmiston for their support, particularly in facilitating the open-sourcing of our data. We thank F. Zhang and C. Hughes for their comprehensive review and detailed feedback on the paper.Stanford University, Stanford, CA, USA Jack W. O’Sullivan, Daniel K. Amponsah, Evaline Cheng, Emily Chu, Yaanik Desai, Aly Elezaby, Muhammad Fazal, Tasmeen Hussain, Sneha S. Jain, Daniel Seung Kim, Roy Lan, Jiwen Li, Wilson Tang, Natalie Tapaskar, Victoria Parikh, Ryan Sandoval, Gabriella Spencer-Bonilla, Bryan Wu & Euan AshleyJ.W.O., A.P., A.K., V.N., E.A. and T.T. contributed to the conception and design of the work; J.W.O., A.P., A.K., E.A., T.T., K.K., S.S.M. and V.N. contributed to the data acquisition and curation; J.W.O., A.P., A.K., E.A., T.T., K.S., R.T., W.-H.W, M.S. and Y.C. contributed to the technical implementation; E.A., A.K., V.N., J.W.O., T.T. and A.P. contributed to the evaluation framework used in the study; J.W.O., A.K., E.A., D.K.A., E. Cheng, E. Chu, Y.D., A.E., M.F., T.H., S.S.J., D.S.K., R.L., J.L., W.T., N.T., V.P., R.S., G.S.-B. and B.W. provided clinical inputs to the study; P.M., D.W., J.G. and J.B. contributed to the ideation and execution of the work. All authors contributed to the drafting and revising of the paper.This study was funded by Alphabet Inc. and/or a subsidiary thereof . A.P., K.S., W.-H.W., Y.C., K.K., P.M., D.W., J.G., J.B., R.T., M.S., S.S.M., V.N., A.K. and T.T. are employees of Alphabet and may own stock as part of the standard compensation package. D.S.K. reports grant support from Amgen and the Bristol Myers Squibb Foundation , outside the submitted work. D.S.K. is supported by the Wu-Tsai Human Performance Alliance as a Clinician-Scientist Fellow, the Stanford Center for Digital Health as a Digital Health Scholar, the Pilot Grant from the Stanford Center for Digital Health, NIH 1L30HL170306, the Robert A. Winn Excellence in Clinical Trials Career Development Award, the American Heart Association Career Development Award and the American Diabetes Association Pathway to Stop Diabetes Initiator Award . E.A. reports advisory board fees from Apple and Foresite Labs. E.A. has ownership interest in SVEXA, Nuevocor, DeepCell and Personalis, outside the submitted work. E.A. is a board member of AstraZeneca. J.O.S. is supported by the AHA Postdoctoral fellowship and ACC postdoctoral fellowship and has had consultancy relationships with Google AI, and Foresite Labs . V.P. has consulting and advisory relationships with BioMarin, Lexeo Therapeutics and Viz.ai and receives funding from BioMarin, the John Taylor Babbitt Foundation, the Sarnoff Cardiovascular Research Foundation and NHLBI R01HL168059 and K08HL143185. S.S.J. reports consulting fees from Bristol Myers Squibb, ARTIS Ventures and Broadview Ventures outside of the submitted work. The other authors declare no competing interests.thanks the anonymous reviewers for their contribution to the peer review of this work. Primary Handling Editor: Lorenzo Righetto, in collaboration with the ) UI for AMIE assistance. This includes the case text, AMIE’s assessment, and a text-chat interface with AMIE. An ID is provided linking the case to the raw imaging from Stanford .) UI for subspecialist evaluators. This includes the case text and two assessments .) Preference form for subspecialist evaluators. Subspecialist cardiologists rate their preference between the two assessments on the preference criteria in Extended Data Fig.) Individual assessment form for subspecialist evaluators. Subspecialist cardiologists rate each assessment on the set of individual criteria listed in Extended Data Fig.Prior to the randomized study, AMIE was provided with clinical text from various cardiac testings for each patient and asked to complete the top portion of the assessment form. During the RCT, general cardiologists had access to the clinical text from these cardiac testings as well as the raw imaging, and they completed the full assessment form. The ‘AI Use’ questions were only completed by the study arm that had access to AMIE.First, subspecialists were provided with two different responses and asked to supply their preference for 9 different aspects of the response as well as the entire response. Next, subspecialists independently answered each of these 5 questions for each response. They were blinded to the source of each response when performing these evaluations.This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit
United States Latest News, United States Headlines
Similar News:You can also read news stories similar to this one that we have collected from other news sources.
Everyone Who Joined Bad Bunny's Super Bowl Halftime Set: Lady Gaga, Ricky MartinFrom Lady Gaga (!) to Ricky Martin to Cardi B and beyond.
Read more »
Jessica Alba, Cardi B and Karol G Bring Designer Looks for Bad Bunny’s Super Bowl Halftime ShowCardi B, Karol G and more were spotted during Bad Bunny's halftime performance.
Read more »
Cardi B and Stefon Diggs Reportedly Split After Super BowlCardi B and Stefon Diggs.
Read more »
Cardi B | ScreenRantHeadshot Of Cardi B In The MTV Video Music Awards 2023
Read more »
Cardi B and Stefon Diggs spark breakup rumors post-Super Bowl 2026Are Cardi B and Stefon Diggs still together?
Read more »
50 Cent trolls Stefon Diggs as Cardi B sparks split rumors at Super Bowl 2026Are Cardi B and Stefon Diggs still together?
Read more »
