Fri Nov 15 15:12:46 UTC 2024: ## OpenAI’s Whisper AI Model Found to Overestimate Accuracy for Indian Languages

**Thiruvananthapuram, November 15, 2024:** Researchers at Digital University Kerala (DUK) have exposed a significant error in OpenAI’s Whisper, an AI model designed to transcribe speech into text. Their research, presented at the ongoing Empirical Methods in Natural Language Processing (EMNLP) conference in Florida, reveals that OpenAI’s accuracy-checking process fails to account for the crucial vowel signs used in Indian languages, leading to an inflated estimate of its AI system’s performance.

The DUK’s Virtual Resource Centre for Language Computing (VRCLC), led by Elizabeth Sherly, discovered that OpenAI’s assessment process overlooks the presence of vowel signs and ‘chandrakkala’ (virama) in Indian languages like Malayalam. For instance, when these signs are stripped from the word “Digital University,” it becomes “ഡ ജ റ റ ൽ യ ണ വ ഴ സ റ റ,” losing its readability and making errors harder to detect. This omission results in an overestimation of the AI’s transcription accuracy.

The researchers also found similar issues with Meta’s evaluation of its own AI models. This highlights a pressing need for AI companies to develop more comprehensive quality-checking methods that account for the unique complexities of regional languages.

The DUK’s VRCLC is committed to developing AI systems that cater to the linguistic nuances of Indian languages, aiming to enhance accuracy and usability for a wider range of users.

Read More