Automatic Detection of Hijaiyah Letters Pronunciation using Convolutional Neural Network Algorithm


  • Yana Aditia Gerhana Faculty of Information and Communication Technology, Asia e-University; Department of Informatics, UIN Sunan Gunung Djati Bandung, Malaysia
  • Aaz Muhammad Hafidz Azis UIN Sunan Gunung Djati Bandung, Indonesia
  • Diena Rauda Ramdania UIN Sunan Gunung Djati Bandung, Indonesia
  • Wildan Budiawan Dzulfikar UIN Sunan Gunung Djati Bandung, Indonesia
  • Aldy Rialdy Atmadja UIN Sunan Gunung Djati Bandung, Indonesia
  • Deden Suparman UIN Sunan Gunung Djati Bandung
  • Ayu Puji Rahayu Faculty of Education Fujian Normal University Fuzhou, China, China



Hijaiyah, Speech recognition, MFCC, CNN, CRISP-DM,


Abstract— Speech recognition technology is used in learning to read letters in the Qur'an. This study aims to implement the CNN algorithm in recognizing the results of introducing the pronunciation of the hijaiyah letters. The pronunciation sound is extracted using the Mel-frequency cepstral coefficients (MFCC) model and then classified using a deep learning model with the CNN algorithm. This system was developed using the CRISP-DM model. Based on the results of testing 616 voice data of 28 hijaiyah letters, the best value was obtained for accuracy of 62.45%, precision of 75%, recall of 50% and f1-score of 58%.

Author Biography

Yana Aditia Gerhana, Faculty of Information and Communication Technology, Asia e-University; Department of Informatics, UIN Sunan Gunung Djati Bandung


Q. Nada, C. Ridhuandi, P. Santoso, and D. Apriyanto, “Speech Recognition dengan Hidden Markov Model untuk Pengenalan dan Pengucapan Huruf Hijaiyah,†J. Al-AZHAR Indones. SERI SAINS DAN Teknol., vol. 5, no. 1, p. 19, 2019, doi: 10.36722/sst.v5i1.319.

S. Khairuddin et al., “Classification of the Correct Quranic Letters Pronunciation of Male and Female Reciters,†in IOP Conference Series: Materials Science and Engineering, 2017, doi: 10.1088/1757-899X/260/1/012004.

A. T. Ali, H. S. Abdullah, and M. N. Fadhil, “Voice recognition system using machine learning techniques,†in Materials Today: Proceedings, 2021, pp. 1–7, doi: 10.1016/j.matpr.2021.04.075.

S. Souli, R. Amami, and S. Ben Yahia, “A robust pathological voices recognition system based on DCNN and scattering transform,†Appl. Acoust., vol. 177, June, pp. 1–7, 2021, doi: 10.1016/j.apacoust.2020.107854.

M. T.LuetmerBA, C. H.Hunt, R. J.McDonald MD, B. J. B. MD, and D. F.KallmesMD, “Laterality Errors in Radiology Reports Generated With and Without Voice Recognition Software: Frequency and Clinical Significance,†J. Am. Coll. Radiol., vol. 10, no. 7, pp. 538–543, 2013, doi: /10.1016/j.jacr.2013.02.017.

S. G. Koolagudi, D. Rastogi, and K. S. Rao, “Identification of Language using Mel-Frequency Cepstral Coefficients (MFCC),†in ICMOC, 2012, pp. 3391–3398, doi: 10.1016/j.proeng.2012.06.392.

D. Taufik and N. Hanafiah, “AutoVAT: An Automated Visual Acuity Test Using Spoken Digit Recognition with Mel Frequency Cepstral Coefficients and Convolutional Neural Network,†in 5th International Conference on Computer Science and Computational Intelligence 2020, 2021, pp. 458–468, doi: 10.1016/j.procs.2021.01.029.

G. Shen, Q. Nguyen, and J. Choi, “An Environmental Sound Source Classification System Based on Mel-Frequency Cepstral Coefficients and Gaussian Mixture Models,†in 14th IFAC Symposium on Information Control Problems in Manufacturing, 2012, pp. 1802–1807, doi: 10.3182/20120523-3-RO-2023.00251.

L. Marlina et al., “Makhraj recognition of Hijaiyah letter for children based on Mel-Frequency Cepstrum Coefficients (MFCC) and Support Vector Machines (SVM) method,†in 2018 International Conference on Information and Communications Technology, ICOIACT 2018, 2018, doi: 10.1109/ICOIACT.2018.8350684.

Y. Wang and B. Lawlor, “Speaker recognition based on MFCC and BP neural networks,†2017 28th Irish Signals Syst. Conf. ISSC 2017, pp. 0–3, 2017, doi: 10.1109/ISSC.2017.7983644.

S. Tirronen, S. ReddyKadiri, and P. Alku, “The Effect of the MFCC Frame Length in Automatic Voice Pathology Detection,†J. Voice, vol. 6, no. 3, pp. 297–440, 2022, doi: 10.1016/j.jvoice.2022.03.021.

N. Sugan, N. S. S. Srinivas, L. S. Kumar, M. K. Nath, and A. Kanhe, “Speech emotion recognition using cepstral features extracted with novel triangular filter banks based on bark and ERB frequency scales,†Digit. Signal Process., vol. 104, pp. 1–22, 2020, doi: 10.1016/j.dsp.2020.102763.

L. Abdel-Hamid, “Egyptian Arabic Speech Emotion Recognition using Prosodic, Spectral and Wavelet Features,†Speech Commun., vol. 122, pp. 19–20, 2020, doi: 10.1016/j.specom.2020.04.005.

N. W. Arshad, S. N. Abdul Aziz, R. Hamid, R. Abdul Karim, F. Naim, and N. F. Zakaria, “Speech processing for makhraj recognition,†pp. 323–327, 2011, doi: 10.1109/inecce.2011.5953900.

S. Saha et al., “Predicting motor outcome in preterm infants from very early brain diffusion MRI using a deep learning convolutional neural network (CNN) model,†Neuroimage, vol. 215, pp. 1–35, 2020, doi: 10.1016/j.neuroimage.2020.116807.

T. Masuda et al., “Deep learning with convolutional neural network for estimation of the characterisation of coronary plaques: Validation using IB-IVUS,†Radiography, vol. 28, no. 3, pp. 1–7, 2022, doi: /10.1016/j.radi.2021.07.024.

Jahandada, S. M. Sam, K. Kamardin, N. N. A. Sjarif, and N. Mohamed, “Offline Signature Verification using Deep Learning Convolutional Neural Network (CNN) Architectures GoogLeNet Inception-v1 and Inception-v3,†in The Fifth Information Systems International Conference 2019, 2019, pp. 475–483, doi: 10.1016/j.procs.2019.11.147.

S. Ghimirea, T. Nguyen-Huy, R. C Deo, D. Casillas-Pérez, and S. Salcedo-Sanz, “Efficient daily solar radiation prediction with deep learning 4-phase convolutional neural network, dual stage stacked regression and support vector machine CNN-REGST hybrid model,†Sustain. Mater. Technol., vol. 32, 2022, doi: 10.1016/j.susmat.2022.e00429.

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet classification with deep convolutional neural networks,†Commun. ACM, 2017, doi: 10.1145/3065386.

U. N. Wisesty, M. S. Mubarok, and A. Adiwijaya, “A classification of marked hijaiyah letters’ pronunciation using hidden Markov model,†in AIP Conference Proceedings, 2017, doi: 10.1063/1.4994439.

Institute of Electrical and Electronics Engineers., “Extending CRISP-DM,†pp. 0–4, 2009.

C. Schröer, F. Kruse, and J. M. Gómez, “A Systematic Literature Review on Applying CRISP-DM Process Model,†in CENTERIS - International Conference on ENTERprise Information Systems / ProjMAN - International Conference on Project MANagement / HCist - International Conference on Health and Social Care Information Systems and Technologies 2020, 2020, pp. 526–534, doi: 10.1016/j.procs.2021.01.199.

V. Plotnikova, M. Dumas, and F. P. Milani, “Applying the CRISP-DM data mining process in the financial services industry: Elicitation of adaptation requirements,†Data Knowl. Eng., vol. 139, 2022, doi: 10.1016/j.datak.2022.102013







Citation Check

Most read articles by the same author(s)

Similar Articles

1 2 3 > >> 

You may also start an advanced similarity search for this article.