A peer-reviewed study published in the journal BMJ Quality & Safety is warning patients not to rely on artificial intelligence (AI)powered search engines and chatbots to always provide safe and accurate drug information, after finding a considerable number of answers provided by these systems to be wrong and even worse, they were potentially harmful. To add to this, the researchers found that the complexity of the answers might make it hard for patients to fully understand without the appropriate college/university degree.
Capable of generating disinformation and nonsensical or harmful content
Most search engines have undergone significant shifts with the introduction of AI-powered chatbots that offer the promise of enhancing results, answers, and interactive experiences. These systems can be trained using extensive datasets from the internet which enables them to converse on any topic. However, they are also just as capable of generating disinformation and nonsensical or harmful content.
Investigating the safety and accuracy of AI chatbots
Past research investigating these types of AI-powered search engines and chatbots focused primarily on the perspective of healthcare professionals rather than patients. This study set out to bridge this gap, investigating the completeness, accuracy, and readability of these system’s answers to queries on the top 50 most frequently prescribed drugs in the United States of America during 2020 using the AI-powered search engine with chatbot features called Bing Copilot.
Chatbots were asked 10 questions for each of the 50 most common drugs, generating 500 answers in total, spanning what the drug was used for, how it worked, instructions of use, common side effects, and contraindications.
Readability
The readability of the answers from the chatbots was assessed by calculating the Flesch Reading Ease Score to estimate the level of education needed to understand a particular text. Scores between 0 and 30 are considered to be very difficult to read with having a degree-level education, and scores between 91 to 100 are considered to be easy enough for 11-year-olds to read.
Accuracy and completeness
To assess accuracy and completeness the responses from the chatbots were compared to the drug information provided by peer-reviewed and up-to-date drug information websites for healthcare professionals and patients called drugs.com.
Extent of possible harm
To assess the current scientific consensus as well as the likelihood and extent of the possible harm of following recommendations from the chatbots, 7 experts in medication and safety used a subset of 20 answers displaying low completeness of accuracy, or a potential risk to patient safety.
Additionally, the Agency for Healthcare Research and Quality (AHRQ) harm scales were used to rate safety events and the likelihood of possible harm by experts in accordance with a validated framework.
Results
Reading Score: The overall average Flesch Reading Ease Score was slightly over 37, meaning that the average person without degree-level education would have difficulty fully understanding the answers. To add to this, even the chatbot answers with the highest rankings for readability still required obtaining at least completing high school level education.
Completeness: The highest average completeness answer was 100%, with an average of 77%. Five of the 10 questions had the highest completeness, with question #3 “What do I have to consider when taking the drug?” having the lowest average completeness of 23%.
However, chatbot statements didn’t match reference data for 126 of 484 answers and were fully inconsistent for just over 3% of the answers.
Scientific Consensus: Assessing the 20 subset answers revealed that only 54% aligned with scientific consensus, 39% contradicted scientific consensus, and there was no established scientific consensus for the remaining 6%.
Possible Harm: Possible harm from following recommendations was found to be highly likely in 3% of the answers, moderately likely in 29%, and 34% were either unlikely or not likely to result in harm if followed.
Alarmingly: However, irrespective of the likelihood of possible harm 42% of the chatbot answers were considered to lead to mild or moderate harm, with 22% leading to severe harm or death, and 36% were considered to lead to no harm.
The researchers noted that this study did not draw on real patient experiences, and it is possible that prompts in different languages or from different countries could also affect the quality of chatbot answers.
Conclusions
“In this cross-sectional study, we observed that search engines with an AI-powered chatbot produced overall complete and accurate answers to patient questions,” they write.
“However, chatbot answers were largely difficult to read and answers repeatedly lacked information or showed inaccuracies, possibly threatening patient and medication safety,” they add.
The researchers also noted that a major drawback was the chatbot’s inability to understand the underlying intent of a patient question.
“Despite their potential, it is still crucial for patients to consult their healthcare professionals, as chatbots may not always generate error-free information. Caution is advised in recommending AI-powered search engines until citation engines with higher accuracy rates are available,” the researchers concluded.
As with anything you read on the internet, this article should not be construed as medical advice; please talk to your doctor or primary care provider before changing your wellness routine. WHN does not agree or disagree with any of the materials posted. This article is not intended to provide a medical diagnosis, recommendation, treatment, or endorsement. Additionally, it is not intended to malign any religion, ethnic group, club, organization, company, individual, or anyone or anything. These statements have not been evaluated by the Food and Drug Administration.
Content may be edited for style and length.
References/Sources/Materials provided by:
https://bmjgroup.com/dont-rely-on-ai-chatbots-for-accurate-safe-drug-information-patients-warned
https://qualitysafety.bmj.com/lookup/doi/10.1136/bmjqs-2024-017476