An Intelligent Speech Recognition Framework Using Hidden Markov Models and Actor–Critic Deep Reinforcement Learning for Low-Resource African Languages

Samson Isaac; Muhammad Aminu Ahmad; Peter Ayuba

doi:10.70882/josrar.2026.v3i3.151

Authors

Samson Isaac Federal University of Applied Sciences Kachia Author
Muhammad Aminu Ahmad Kaduna State University Author
Peter Ayuba Kaduna State University Author

DOI:

https://doi.org/10.70882/josrar.2026.v3i3.151

Keywords:

Automatic Speech Recognition (ASR), Hidden Markov Model (HMM), Actor–Critic Deep Reinforcement Learning, Deep Deterministic Policy Gradient (DDPG), Mel-Frequency Cepstral Coefficients (MFCC)

Abstract

ASR also plays a crucial role in assistive technologies for individuals with disabilities by enabling them to manage their surroundings more effectively through dialing phone numbers, operating light switches, and controlling home appliances, thereby contributing to the development of smart home systems. This study extracts features from isolated speech using Mel-Frequency Cepstral Coefficients (MFCC) and Bidirectional Long Short-Term Memory (BiLSTM) networks to ensure speaker invariance and enhance feature localization. Deep learning techniques were employed to explicitly normalize speech spectral features. Numerous pattern recognition and regression tasks have demonstrated the effectiveness of LSTM-based architectures. The novelty of this study lies in the integrating a hybrid MFCC–DNN–HMM framework to achieve high speech recognition accuracy for isolated words. The model achieved an accuracy of 0.945 (94.5%), indicating that it correctly classified the majority of instances. The precision obtained was 0.901, meaning that 90.1% of the instances identified as positive were correctly classified. The recall rate was 0.92, indicating that 92% of the actual positive instances were successfully detected by the system. The F1-score was 0.909, reflecting a balanced measure of precision and recall.

References

Aljinu Khadar, M., Rahman, A., & Suresh, P. (2023). Gaussian mixture model–universal background model I-vector approach for speaker verification in noisy environments. International Journal of Speech Technology, 26(3), 455–468.

Babu, R., Kumar, S., & Reddy, V. (2023). A comprehensive classification of speech recognition approaches: Pattern recognition, acoustic-phonetic, and artificial intelligence methods. Journal of Signal Processing Systems, 95(4), 601–615.

Barkani, A., El Moutaouakil, K., & El Mohajir, M. (2023). Amazigh automatic speech recognition using the Kaldi toolkit. Speech Communication, 152, 45–57.

Biswas, T., Roy, S., & Chatterjee, A. (2023). Spoken language identification using MFCC features and machine learning classifiers. Expert Systems with Applications, 221, 119765.

Fadhel, M., & Mohammed, H. (2023). Classification and evaluation of automatic speech recognition systems. International Journal of Computer Applications, 185(12), 25–34.

Hazmoune, Y., Benyettou, M., & Ouni, K. (2024). An ensemble hidden Markov model approach for robust speech recognition. IEEE Access, 12, 33421–33435.

Isaac, S., Haruna, K., Ahmad, M. A., & Mustapha, R. (2023). Deep reinforcement learning with hidden Markov model for speech recognition. Journal of Technology and Innovation, 3(1), 1-5.

Kanke, S., Patil, A., & Joshi, R. (2023). Marathi speech recognition using language-specific acoustic modeling techniques. Procedia Computer Science, 218, 987–996.

Manideep, K., & Mohana, R. (2023). Voice recognition using hybrid Gaussian mixture model and hidden Markov model. International Journal of Intelligent Systems and Applications, 15(2), 112–124.

Mishra, D., Verma, P., & Singh, A. (2024). Comparative analysis of machine learning techniques for automatic speech recognition. Multimedia Tools and Applications, 83(5), 14321–14345.

Moondra, A., Jain, S., & Kulkarni, P. (2023). Modified MFCC-GMM approach for speaker recognition under degraded speech conditions. Applied Acoustics, 206, 109251.

Nugroho, H., Prasetyo, E., & Wibowo, S. (2023). Multi-accent speaker detection using normalized MFCC and neural networks. Neural Computing and Applications, 35(14), 10523–10536.

Ouisaadane, H., El Hannani, A., & Boulaknadel, S. (2024). Moroccan dialect speech recognition using PocketSphinx in noisy environments. Speech Communication, 160, 78–90.

Pavithran, P., & Sherly, E. (2024). Hidden Markov model-based automatic speech recognition system for individuals with hearing impairment. Biomedical Signal Processing and Control, 89, 105432.

Prabhu, S., & Jayasri, V. (2024). Hidden Markov model-based speech recognition system for vending machine applications. International Journal of Embedded Systems, 17(1), 65–76.

Ramadan, Z., & Bitmead, R. (2022). Gaussian mixture models and maximum likelihood estimation for speech recognition systems. Signal Processing, 196, 108511.

Sallagundla, S., Rao, P., & Krishna, M. (2023). Voice-enabled form filling system using hidden Markov models. Journal of Ambient Intelligence and Humanized Computing, 14(8), 10923–10935.

Santos, L., Pereira, J., & Almeida, F. (2023). Hybrid HMM-CNN architecture for improved automatic speech recognition. IEEE Access, 11, 91234–91248.

Shafieian, R. (2023). Persian speech recognition using hidden Markov models. International Journal of Speech Technology, 26(2), 233–245.

Sudarshan, R., Karthik, S., & Menon, V. (2023). Context-aware automatic speech recognition using semantic processing. Artificial Intelligence Review, 56(6), 4891–4910.

Thimmaraja, Y., Ramesh, H., & Kumar, N. (2024). Real-time Kannada continuous speech recognition using hidden Markov models. International Journal of Speech Technology, 27(1), 101–115.

Tsai, C. H., & Wang, Y. T. (2023). Hardware-efficient Gaussian mixture model-based speaker verification system using MFCC features. IEEE Transactions on Circuits and Systems II: Express Briefs, 70(9), 3150–3154.

Wirdiani, A., Santoso, D., & Prabowo, R. (2024). MFCC-CNN with online triplet mining for robust speaker recognition. Expert Systems with Applications, 235, 120123.

An Intelligent Speech Recognition Framework Using Hidden Markov Models and Actor–Critic Deep Reinforcement Learning for Low-Resource African Languages

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Make a Submission

Publication Schedule

Payment