
Thu Apr 17 00:00:00 UTC 2025: ## AI’s Hallucination Problem: Experts Question Reliability of Popular AI Models
**Sri City, April 17, 2025** – The increasing prevalence of “hallucinations” in popular artificial intelligence (AI) models like ChatGPT and DALL-E is raising serious concerns about their reliability, according to leading AI experts. Hallucinations, where AI fabricates answers to questions it hasn’t been trained to answer, have manifested in bizarre recommendations such as adding glue to pizza sauce or drinking urine to treat kidney stones.
A recent study revealed that ChatGPT v3.5 fabricated 55% of its references, while even the improved ChatGPT-4 still hallucinated in 18% of instances. This unreliability stems from AI models’ reliance on statistical associations learned during training rather than true language comprehension. When faced with unfamiliar queries, they fill in the gaps with existing associations, often resulting in factually incorrect and confidently presented responses. This was demonstrated when DALL-E, despite repeated prompts, failed to generate images of a room without any elephants.
Experts highlighted two key criteria for evaluating AI model reliability: consistency (producing similar outputs for similar inputs) and factuality (providing correct answers or admitting lack of knowledge). Hallucinations compromise factuality, and the issue is further exacerbated by the use of potentially flawed benchmarks in evaluating AI performance. Some AI models have been shown to perform well on benchmarks because their training data included the benchmark’s testing data. This “gaming” of the system leads to inflated performance metrics that don’t reflect real-world application.
While the frequency of hallucinations is decreasing for common queries due to increased training data, experts caution that this is a temporary fix. True improvements will require either a universally comprehensive, real-time knowledge base (which would create an all-powerful AI) or a shift in model development. Proposed solutions include developing specialized AI models for specific tasks, employing retrieval-augmented generation (RAG) to access relevant databases, and using curriculum learning to improve model training. Despite these advances, human oversight and verification of AI-generated outputs will remain necessary.