LLM-based generation of USMLE-style questions with ASPET/AMSPC knowledge objectives: All RAGs and no riches

Br J Clin Pharmacol. 2025 Jun 8. doi: 10.1002/bcp.70119. Online ahead of print.

ABSTRACT

Developing high-quality pharmacology multiple-choice questions (MCQs) is challenging in large part due to continually evolving therapeutic guidelines and the complex integration of basic science and clinical medicine in this subject area. Large language models (LLMs) like ChatGPT-4 have repeatedly demonstrated proficiency in answering medical licensing exam questions, prompting interest in their use for generating high stakes exam-style questions. This study evaluates the performance of ChatGPT-4o in generating USMLE-style pharmacology questions based on American Society for Pharmacology and Experimental Therapeutics/Association of Medical School Pharmacology Chairs (ASPET/AMSPC) knowledge objectives and assesses the impact of retrieval-augmented generation (RAG) on question accuracy and quality. Using standardized prompts, 50 questions (25 RAG and 25 non-RAG) were generated and subsequently evaluated by expert reviewers. Results showed higher accuracy for non-RAG questions (88.0% vs. 69.2%), though the difference was not statistically significant. No significant differences were observed in other quality dimensions. These findings suggest that sophisticated LLMs can generate high-quality pharmacology questions efficiently without RAG, though human oversight remains crucial.

PMID:40483567 | DOI:10.1002/bcp.70119

from artificial intelligence https://pubmed.ncbi.nlm.nih.gov/40483567/?utm_source=Feedly&utm_medium=rss&utm_campaign=pubmed-2&utm_content=1R9m212NERpoMrZU5wkw13XyvZsbpoCLYtx2eUMdVLe8kLrcE2&fc=20250608055056&ff=20250608062848&v=2.18.0.post9+e462414
via IFTTT

Умные решения для пациентов и клиник

Искусственный интеллект: расшифровка анализов, интерпретация отклонений.

Решения для умной клиники

Новости медицины