Studien zu ChatGPT und KI in der Medizin

Studies on ChatGPT and AI in medicine

More and more studies are looking at the use of ChatGPT in medical practices, in research or on medical issues in general. The results are usually astonishing. Many studies suggest that AI technologies, such as ChatGPT, can not only increase efficiency within a healthcare organization, but also raise the quality of the work results of doctors and medical staff to a new level.

In this article, I present the most relevant and up-to-date studies and give my own assessment of the results with regard to their use in the Swiss and German healthcare world.

Chat-GPT writes reports ten times faster than doctors

Researchers from Basel and Sweden have tested the chatbot with six virtual patient cases.

The researchers invented six patient cases and wrote fictitious medical notes for them. From this, Chat-GPT and real medical specialists created medical reports for the six patients. A 15-member panel of experts assessed the quality and speed with which the reports were produced. The committee did not know who had written the reports, man or machine.

By and large, the quality of the reports from AI and humans was comparable. Both made roughly the same number of errors, and corrections were necessary for several reports. But: The Chat-GPT-4 AI model created the documents ten times faster than the doctors.

The Chat-GPT AI model can write medical reports up to ten times faster than doctors without compromising on quality. This is the conclusion drawn by researchers from the Department of Orthopaedics and Traumatology at the University Hospital Basel (USB) from a pilot study with six virtual patient cases. They conducted the study together with Swedish colleagues from the Karolinska Institute, Uppsala University Hospital and Danderyd Hospital.

Link to study: ChatGPT-4 generates orthopedic discharge documents faster than humans maintaining comparable quality: a pilot study of 6 cases

My assessment of the study

The results of the study are impressive and reflect what I have already tested myself with medical practices in pilot groups. Provided ChatGPT receives a suitable prompt, the quality of ChatGPT’s medical reports is extremely good. It should be noted that the quality and accuracy of the prompt has a major influence on the result.

I would also like to point out that ChatGPT in no way complies with data protection standards. ChatGPT for quality and efficiency tests is a good option. If AI technologies are to be used in operations, however, secure (Swiss) solutions must be used. One of these is SwissGPT from AlpineAI.

Ultimately, the fact remains that, as is already the case today with conventionally created documents, every report has to be checked by people.

ChatGPT: More empathetic than a doctor

In California and Wisconsin, USA, several hospitals tested the Open AI software ChatGPT as part of a comprehensive pilot project. The chatbot was tasked with answering questions from patients who were able to submit them via a social media forum. The result: According to the study, the bot performed better than the human doctors.

For the study, written answers from doctors to real health questions were compared with answers given by ChatGPT. A team of licensed healthcare professionals then evaluated the results and decided – of course without knowing which answer came from humans and which from the AI – which answer was better. They rated both the quality of the information provided (very poor, poor, acceptable, good or very good) and the empathy or bedside manner (not empathetic, slightly empathetic, moderately empathetic). The average results were ranked on a scale of 1 to 5 and compared between the chatbot and doctors. The result was quite clear: in 79 percent of cases, the panel preferred ChatGPT’s answers. They even came to the conclusion that the answers were both of higher quality and more empathetic.

The conclusion of the study states: “Further research into this technology in the clinical environment is required, for example by using chatbots to draft answers to patient questions. Doctors can then process these. Randomized trials could further investigate whether the use of AI assistants could improve responses to patient questions, reduce physician burnout and improve patient care.”

More information on the study “ChatGPT: More empathetic than a doctor?” can be found at this link.

My assessment of the study

ChatGPT’s ability to recognize users’ moods and even respond to them individually has been used in customer service for some time now and has also proven to be very useful in many studies. I am therefore not surprised that AI technologies such as ChatGPT are also perceived as emphatic and sometimes even more sympathetic or more appropriate in tone than a human being in the context of medicine. Of course, you always have to consider who the AI is being compared with. In the healthcare context, this is often compounded by a lack of time on the part of care staff, which makes it difficult to adapt individual personal tones.

Ultimately, however, it must again be taken into account in this study that ChatGOT is not directly suitable due to data protection. Comparable tools but secure AI technologies but deliver similarly good results

ChatGPT overall better than doctors at drawing medical conclusions

In a comparative test in the USA, ChatGPT achieved better scores in the diagnosis of illnesses than well-trained medical staff. However, the AI also made some serious mistakes. ChatGPT-4 outperformed senior and resident physicians in processing medical data and clinical reasoning overall, despite individual errors. The comparative test was carried out at the Beth Israel Deaconess Medical Center (BIDMC) in Boston. The basis was a point system recognized by doctors, the so-called “r-IDEA score”.

The researchers recruited 21 senior physicians and 18 residents, who each worked on one of 20 selected clinical cases consisting of four consecutive phases of diagnostic thinking. The authors instructed the doctors to write down and justify their differential diagnoses at each stage. The chatbot GPT-4 received identical instructions for all 20 cases. Responses were then scored on clinical judgment (r-IDEA score) and various other measures of reasoning.

In the r-IDEA score, ChatGPT came out on top with an average of ten out of ten points. The senior physicians scored an average of nine out of ten, the assistants eight out of ten. In some areas, however, the chatbot also made obvious mistakes that the human staff did not make.

The entire study Clinical Reasoning of a Generative Artificial Intelligence Model Compared With Physicians can be read here.

My assessment of the study

The fact that ChatGPT even performs better than humans when it comes to content-related questions impressed even me. However, it illustrates once again that collaboration between humans and artificial intelligence is becoming increasingly important. ChatGPT had also made gross errors in the study, which a human would have found quickly. As Thilo Stadelmann once said on SRF, AI is probably most useful when it supports the human thought process, but does not replace it. Or AI technologies serve as control instances that help people to ensure that nothing is overlooked.

ChatGPT as good as doctors at diagnosis in the emergency room

According to a study published in the journal “Annals of Emergency Medicine”, ChatGPT makes diagnoses for patients in the emergency room that are at least as accurate as those made by doctors. According to the Dutch study authors, the chatbot, which uses artificial intelligence (AI), even outperformed the doctors’ work in some cases – but was still prone to errors.

For their study, the researchers examined 30 cases of patients who had been treated in a Dutch emergency room in the past year. They fed ChatGPT with the anonymized patient data, lab tests and the doctors’ observations and asked the chatbot to make five possible diagnoses. They then compared these with the doctors’ list of diagnoses and finally matched them with the correct diagnosis.

Among the doctors, the correct diagnosis was found in 87 percent of cases among the five suggestions, with ChatGPT version 3.5 even in 97 percent of cases. Simply put, this means that ChatGPT was able to suggest medical diagnoses, much like a human doctor would. However, as in other areas, the chatbot also showed some weaknesses. Sometimes the chatbot’s reasoning was “medically implausible or contradictory”, according to the study. This could lead to “misinformation or misdiagnosis” – with correspondingly serious consequences.

Study co-author Steef Kurstjens categorizes the study himself and states that he does not assume that ChatGPT or similar AI technologies could take over the overall management of the emergency department. But AI technologies can help doctors under pressure to make a diagnosis and thus save time and reduce waiting times in the emergency room.

The study was published in the journal “Annals of Emergency Medicine”.

My assessment of the study

First of all, I would like to point out that the study was conducted with ChatGPT, but that in reality real patient data must not be entered into ChatGPT under any circumstances.

This study also shows once again that ChatGPT delivers fundamentally good or correct results and diagnoses. However, when it comes to deeper details or logical explanations, AI technologies such as ChatGPT are usually worse than a human.

AI-supported diagnostics: less discrimination against women

The results of the feasibility study “Frau.Herz.KI – Gerechte Medizin für Frauen” (Woman.Heart.AI – Fair Medicine for Women) on the early detection of coronary heart disease in women using artificial intelligence show that AI technologies are sometimes better at detecting cancer than traditional tools. Women die more frequently from heart attacks than men. One problem is that they often do not have the same typical symptoms, which can lead to a misdiagnosis or delayed treatment. For this reason, gender medicine has long been concerned with the influence of gender on various diseases and treatment methods.

Patient data from the Klinikum rechts der Isar in Munich and the Osypka Heart Center were used for the “Frau.Herz-KI – Gerechte Medizin für Frauen” project. The data was exported, processed and then analyzed using various AI systems. The initial tests are promising: up to 19 percent of CHDs could be predicted better on the basis of the data used than by expert assessment. Together with doctors, the AI cardiologist could significantly improve the diagnosis of heart disease and thus enable faster and more precise therapies. According to the study, it is conceivable for the future to develop a type of “digital assistant doctor” based on the project results, which would help doctors to overcome the gender health gap. Women are more likely to suffer from shortness of breath, back pain, nausea or pain in the upper abdomen in the event of a heart attack. Gender-specific AI applications for women are therefore becoming increasingly important and hardly exist today. According to the study, the next steps include obtaining new, more comprehensive data sets that include more women and corresponding female influencing factors. This is the only way to optimize the trained models in order to effectively support individualized diagnostics and treatment and improve prevention.

More information on the project “Frau.Herz.KI – Gerechte Medizin für Frauen” here.

My assessment of the study

I would apply the relevance of this study not only to gender medicine, but to medicine in general with regard to marginalized groups. There are many cultures or groups of people whose health data have hardly been represented in research to date. And due to cost and time pressure, not all groups of people can always be considered. However, AI technologies can close precisely this gap and include marginalized groups in their diagnostics in a cost-neutral and scalable way.

AI chatbots are transforming patient care

According to a report by Harald Witte, Tobias Blatter and Alexander B. Leichtle from the Computational Medicine Group at Inselspital Bern, chatbots can already perform a range of tasks in healthcare today. This includes activities such as arranging doctor’s appointments or recording patient data, as well as sub-steps in the processing of insurance inquiries. Outsourcing such comparatively simple but time-consuming functions is already taking a huge burden off healthcare employees.

In addition, chatbots can communicate reliable medical information “fatigue-free”. According to the Computational Medicine Group, AIs will also be able to improve communication at expert level in the future, between medical staff and patients, but also between specialists from different disciplines. After all, in-depth specialist knowledge is no guarantee that it can be communicated in a simple way.

The full study“Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models” here.

My assessment of the study

I am convinced that chatbots and voicebots are good AI applications for the healthcare sector. In addition to the examples shown above, I would like to emphasize the fact that chatbots can always communicate at eye level with their users. This ability is particularly important in health communication. Whether it’s a sensitive explanation of illnesses and symptoms or effective communication as part of health education – chat and voicebots can prepare any content to suit the target group. This increases both the quality of dialog and the efficiency of the medical staff.

Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information

There is little data on the quality of cancer information provided by chatbots and other AI technologies. The study “Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information” evaluates the accuracy of cancer information on ChatGPT compared to the National Cancer Institute’s (NCI) answers to the questions on the Common Cancer Myths and Misconceptions website. The NCI responses and the ChatGPT responses to each question were blinded and then assessed for accuracy (exactly: yes or no). Ratings were scored independently for each question and then compared between the blinded NCI and ChatGPT responses. In addition, the word count and the Flesch-Kincaid readability level were assessed for each individual response. After expert review, the percentage of overall agreement for accuracy was 100% for the NCI responses and 96.9% for the ChatGPT outputs for questions 1 through 13 (ĸ = -0.03, standard error = 0.08). There were hardly any significant differences in the number of words or the readability of the answers from NCI and ChatGPT. Overall, the results suggest that ChatGPT provides accurate information about common cancer myths and misconceptions.

The full study Using ChatGPT to evaluate cancer myths and misconceptions: artificial intelligence and cancer information can be read here.

My assessment of the study

There are already many studies on the quality and accuracy of ChatGPT responses. It must always be taken into account that ChatGPT itself only learns from existing data. It seems to be the case here that the existing basic data is of very high quality and therefore ChatGPT also provides good answers. However, this does not have to remain the case in the future. Other studies also show that the accuracy of ChatGPT responses has deteriorated in some cases, as the data basis has also become impure.

Conclusion: Studies on ChatGPT and AI in medicine

In general, the studies on the use of ChatGPT are extremely positive and put ChatGPT’s AI technology in a very favorable light. However, it is important to note that ChatGPT is not suitable for sharing personal or confidential data. However, there are already good alternatives to ChatGPT, see SwissGPT, which do not stand in the way of Swiss and data protection.

If you would like to know more about this or similar topics, write me a message with your wishes and questions. You can send your message by WhatsApp message or by e-mail.

Or you can take a look at my AI offering specifically for the healthcare sector.

Book now
Your personal consultation

Do you need support or have questions? Then simply make an appointment with me and get a personal consultation. I look forward to hearing from you!

> Concept & Strategy

> Keynotes, workshops and expert contributions

> Chatbots, Voicebots, ChatGPT

Further contributions

Good content costs time...

... Sometimes time is money.

You can now pay a small amount to Sophie on a regular or one-off basis as a thank you for her work here (a little tip from me as Sophie’s AI Assistant).