Healthcare Chatbots Provoke Unease in AI Governance Analysts
When an AI chatbot suggests adding glue to pizza, the error is evident. However, when it advises increasing banana intake—sound nutritional advice—it can be perilous for individuals with kidney failure. Such mistakes often go unnoticed, posing risks to hundreds of millions of users with minimal regulatory oversight.
Recently, OpenAI launched ChatGPT Health, allowing users to link medical records and wellness applications for personalized health guidance. The company reported that over 230 million people seek health-related information from ChatGPT each week, with 40 million daily users asking for medical advice.
In a surprising partnership, Google has collaborated with the health data platform b.well, indicating that similar products may soon emerge.
Expert Concerns
Even seasoned AI experts express skepticism regarding these developments. Diana Kelley, Chief Information Security Officer at Noma Security, articulated concerns about the inherent nature of AI models. “They are probabilistic, next-token generators that lack the ability to recognize when they lack sufficient information,” she stated. “These models are adept at producing text that appears plausible and authoritative, even when it is not.”
Utilizing chatbots in healthcare amplifies risks due to what Kelley terms verification asymmetry. In coding, erroneous outputs often fail quickly, but in medicine, guidance relies heavily on patient-specific contexts that AI systems typically lack. This means that while an AI’s answer may seem reasonable, it can still be contextually incorrect.
AI Safety Evaluation
Standard AI safety assessments often overlook high-risk outputs. Most evaluations focus on explicit policy violations or factual inaccuracies, rewarding fluency and empathy instead. Koustuv Saha, an assistant professor at the University of Illinois, noted that this approach allows subtly misleading advice to pass safety checks unchallenged.
Shannon Germain Farraher, a senior healthcare analyst at Forrester, emphasized the necessity for high accuracy in healthcare organizations. “Medical advice cannot tolerate the ‘coherent nonsense’ that is acceptable in less critical domains,” she said, highlighting the need for human oversight in identifying and addressing subtle issues overlooked by AI models.
Risks in Conversational AI
The conversational nature of AI models tends to reinforce previous information rather than challenge it. “These risks are often implicit and arise from omitted details and smoothed-over uncertainty,” Saha explained. As conversations progress, especially when users express fear or trauma, models prioritize being supportive over strict adherence to rules.
Proposed Solutions
Naga, an AI governance advocate, suggests that mandatory citation could serve as a crucial technical safeguard. “We need systems that not only provide answers but also require highlighting the specific medical sources supporting those answers.” Additionally, he recommends product friction—such as displaying blurred answers until users acknowledge disclaimers—to enhance safety.
However, implementing such friction can be challenging, as companies often prioritize creating a seamless user experience over ensuring safety. “Adding warnings or prompting the AI to say ‘I don’t know’ can reduce user engagement,” Naga noted.
Unresolved Liability Framework
The liability framework surrounding AI in healthcare remains ambiguous. Currently, there is no unified federal law or industry standard regulating consumer health chatbots. As the government promotes AI adoption through initiatives like America’s AI Action Plan, these efforts focus more on fostering innovation rather than imposing necessary safeguards.
As a result, consumer health chatbots operate within a fragmented governance landscape, facing minimal proactive constraints. This governance gap may pose strategic risks for organizations deploying health AI, underscoring the urgent need for comprehensive regulatory frameworks.