Skip to main navigation Skip to search Skip to main content

Can Large Artificial Intelligence-Based Linguistic Models Help to Obtain Information About Burning Mouth Syndrome?

  • Paula Benito López
  • , Daniela Adamo
  • , Vito Carlo Alberto Caponio
  • , Jose González-Serrano
  • , Alan Roger Dos Santos Silva
  • , Miguel de Pedro Herráez
  • , Rui Albuquerque
  • , María Pía López Jornet
  • , Vlaho Brailo
  • , A Farag
  • , Márcio Diniz Freitas
  • , Noburo Noma
  • , Riordain RN
  • , Gonzalo Hernández
  • , Rosa María López-Pintor
  • Complutense University
  • Link Campus University
  • University of Campinas
  • European University of Madrid
  • Guy's Hospital
  • King's College London
  • University of Murcia
  • University Clinical Hospital Centre Zagreb
  • Tufts University
  • University of Santiago de Compostela
  • Nihon University School of Dentistry

Research output: Contribution to journalArticlepeer-review

Abstract

<h4>Objective</h4>Burning Mouth Syndrome (BMS) is an idiopathic chronic orofacial pain disorder with diagnostic and therapeutic challenges. Inexperienced clinicians may desperately resort to online information. The objective of this study was to evaluate the usefulness, quality, and readability of responses generated by three artificial intelligence large language models (AI-LLMs)-ChatGPT-4, Gemini, and Microsoft Copilot-to frequent questions about BMS.<h4>Materials and methods</h4>Nine clinically relevant open-ended questions were identified through search-trend analysis and expert review. Standardized prompts were submitted, and responses were independently rated by 12 international experts using a 4-point usefulness scale. Quality was evaluated using the QAMAI tool. Readability was measured using Flesch-Kincaid Grade Level and Reading Ease scores. Statistical analyses included Kruskal-Wallis and Bonferroni correction.<h4>Results</h4>All AI-LLMs produced moderately useful responses, with no significant difference in global performance. Gemini achieved highest overall quality scores, particularly in relevance, completeness, and source provision. Copilot scored lower in usefulness and source provision. No significant differences were obtained among AI-LLMs. Average readability corresponded to 12th grade, with ChatGPT requiring the highest proficiency.<h4>Conclusions</h4>AI-LLMs show potential for generating reliable information on BMS, though variability in quality, readability, and source citation remains concerning. Continuous optimization is essential to ensure their clinical integration.
Original languageEnglish
JournalOral Diseases
DOIs
Publication statusPublished - 31 Aug 2025

Fingerprint

Dive into the research topics of 'Can Large Artificial Intelligence-Based Linguistic Models Help to Obtain Information About Burning Mouth Syndrome?'. Together they form a unique fingerprint.

Cite this