AI-Powered Pronunciation Mistake Detection Using Gemini 1.5 Flash: A Training-Free Approach

Supritha P O , Omkar Mahale , Shalya Gaonkar , Shetty Aditya Udaya , Sooraj Devadiga

doi:10.55524/ijircst.2026.14.1.10

International Journal of Innovative Research in Computer Science and Technology- Volume 14, Issue 1, 2026

Pages: 79-88

AI-Powered Pronunciation Mistake Detection Using Gemini 1.5 Flash: A Training-Free Approach

Supritha P O , Omkar Mahale , Shalya Gaonkar , Shetty Aditya Udaya , Sooraj Devadiga

Download PDF

Abstract:

Pronunciation accuracy is a fundamental factor in effective language learning; however, many existing systems face difficulties in delivering real-time error analysis without relying on computationally intensive acoustic model training. This paper introduces an AI-driven pronunciation mistake detection system developed using Google Gemini 1.5 Flash, a low-latency multimodal large language model capable of directly processing spoken input. Unlike conventional approaches based on MFCC features or task-specific deep learning pipelines, the proposed system employs prompt-guided reasoning combined with algorithmic scoring methods to detect pronunciation errors at the word, phoneme, and prosodic levels. Learner speech is transmitted to the Gemini API, which generates a structured pronunciation analysis that includes phoneme-level interpretations and word-level discrepancies. These outputs are further processed by a custom scoring framework to evaluate pronunciation quality and produce clear, actionable feedback. Experimental evaluation using diverse English utterances demonstrates the system’s effectiveness in identifying vowel–consonant substitutions, omitted syllables, and stress-related errors. The findings underscore the potential of LLM-based audio reasoning as a lightweight, scalable, and real-time solution for automated pronunciation assessment.

Keywords:

Pronunciation Error Detection; Gemini 1.5 Flash; Speech Processing; Multimodal Llms; Prompt Engineering; Phoneme Analysis; Real-Time Pronunciation Feedback; Ai-Assisted Learning

DOI URL:- 10.55524/ijircst.2026.14.1.10

International Journal of Innovative Research in Computer Science and Technology (IJIRCST)

International Journal of Innovative Research in Computer Science and Technology- Volume 14, Issue 1, 2026

AI-Powered Pronunciation Mistake Detection Using Gemini 1.5 Flash: A Training-Free Approach

Abstract:

Keywords: