YOLOv5 and U-Net-based Character Detection for Nusantara Script

Authors

  • Agi Prasetiadi Institut Teknologi Telkom Purwokerto, Indonesia
  • Julian Saputra Institut Teknologi Telkom Purwokerto, Indonesia
  • Iqsyahiro Kresna Institut Teknologi Telkom Purwokerto, Indonesia
  • Imada Ramadhanti Institut Teknologi Telkom Purwokerto, Indonesia

DOI:

https://doi.org/10.15575/join.v8i2.1180

Keywords:

Nusantara Script, Character Detection, Bounding Box, YOLO, U-Net

Abstract

Indonesia boasts a diverse range of indigenous scripts, called Nusantara scripts, which encompass Bali, Batak, Bugis, Javanese, Kawi, Kerinci, Lampung, Pallava, Rejang, and Sundanese scripts. However, prevailing character detection techniques predominantly cater to Latin or Chinese scripts. In an extension of our prior work, which concentrated on the classification of script types and character recognition within Nusantara script systems, this study advances our research by integrating object detection techniques, employing the YOLOv5 model, and enhancing performance through the incorporation of the U-Net model to facilitate the pinpointing of fundamental Nusantara script's character locations within input document images. Subsequently, our investigation delves into rearranging these character positions in alignment with the distinctive styles of Nusantara scripts. Experimental results reveal YOLOv5's performance, yielding a loss rate of approximately 0.05 in character location detection. Concurrently, the U-Net model exhibits an accuracy ranging from 75% to 90% for predicting character regions. While YOLOv5 may not achieve flawless detection of all Nusantara scripts, integrating the U-Net model significantly enhances the detection rate by 1.2%.

References

P. K. Charles, V. Harish, M. Swathi, and C. H. Deepthi, "A review on the various techniques used for optical character recognition," International Journal of Engineering Research and Applications, vol. 2, no. 1, pp. 659-662, 2012.

G. Nagy, S. Seth, and M. Viswanathan, "A Prototype Document Image Analysis System for Technical Journals," Computer, vol. 25, no. 7, pp. 10–22, 1992.

D. R. Dickson and K. Nusair, "An HR perspective: The global hunt for talent in the digital age," Worldwide Hospitality and Tourism Themes, vol. 2, no. 1, pp. 86–93, 2010. doi: 10.1108/17554211011012612.

J. Lo Bianco, "The importance of language policies and multilingualism for cultural diversity," International Social Science Journal, vol. 61, no. 199, pp. 37–67, 2010. doi: 10.1111/j.1468-2451.2010.01747.x.

Y. Yamashita, K. Higuchi, Y. Yamada, and Y. Haga, "Classification of handprinted Kanji characters by the structured segment matching method," Pattern Recognition Letters, vol. 1, no. 5-6, pp. 475-479, 1983.

G. Lee, J. H. Lee, and J. Yoo, "Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation," Pattern Recognition, vol. 30, no. 8, pp. 1347-1360, 1997.

C. L. Liu, F. Yin, D. H. Wang, and Q. F. Wang, "Online and offline handwritten Chinese character recognition: benchmarking on new databases," Pattern Recognition, vol. 46, no. 1, pp. 155-162, 2013.

M. Avadesh and N. Goyal, "Optical character recognition for Sanskrit using convolution neural networks," in 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 447-452, 2018. doi: 10.1109/DAS.2018.27.

A. Prasetiadi, J. Saputra, I. Ramadhanti, A. D. Sripamuji, and R. R. Amalia, "Minimalist DCT-based Depthwise Separable Convolutional Neural Network Approach for Tangut Script," Journal of Dinda: Data Science, Information Technology, and Data Analytics, vol. 3, no. 2, pp. 59-64, 2023.

S. Alghyaline, "A Printed Arabic Optical Character Recognition System using Deep Learning," Journal of Computer Science, vol. 18, no. 11, pp. 1038–1050, 2022. doi: 10.3844/jcssp.2022.1038.1050.

B. Kataria and H. B. Jethva, "CNN-Bidirectional LSTM Based Optical Character Recognition of Sanskrit Manuscripts: A Comprehensive Systematic Literature Review," International Journal of Scientific Research in Computer Science, Engineering and Information Technology, pp. 1362–1383, 2019. doi: 10.32628/cseit2064126.

A. W. Mahastama and L. D. Krisnawati, "Optical character recognition for printed Javanese script using projection profile segmentation and nearest centroid classifier," in 2020 Asia Conference on Computers and Communications (ACCC), pp. 52–56, 2020. doi: 10.1109/ACCC51160.2020.9347895.

M. H. Faishal, M. D. Sulistiyo, and A. F. Ihsan, "Javanese Script Letter Detection Using Faster R-CNN," Indonesian Journal of Artificial Intelligence and Data Mining, vol. 6 no. 2, 243-251, 2023

N. Suciati, N. P. Sutramiani, and D. Siahaan, "LONTAR_DETC: Dense and High Variance Balinese Character Detection Method in Lontar Manuscripts," IEEE Access, vol. 10, pp. 14600-14609, 2022.

B. Gašparovi?, G. Mauša, J. Rukavina, and J. Lerga, "Evaluating YOLOv5, YOLOv6, YOLOv7, and YOLOv8 in Underwater Environment: Is There Real Improvement?," in 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1-4, June 2023.

T. A. N. Dang and D. T. Nguyen, "End-to-end information extraction by character-level embedding and multi-stage attentional U-Net," arXiv preprint arXiv:2106.00952, 2021.

A. Prasetiadi, J. Saputra, I. Kresna, and I. Ramadhanti, "Deep Learning Approaches for Nusantara Scripts Optical Character Recognition," IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 3, 2023.

D. Ghosh, T. Dube, and A. Shivaprasad, "Script Recognition-a review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2142–2161, 2010. doi: 10.1109/TPAMI.2010.30.

E. Alfian, "Penggunaan Unsur Aksara Nusantara Pada Huruf Modern," Jurnal Komunikasi Visual, vol. 7, no. 1, pp. 42–48, 2014.

P. T. Daniels, "Fundamentals of Grammatology," Journal of the American Oriental Society, vol. 119, no. 4, pp. 727–731, Oct.–Dec. 1990. doi: 10.2307/602899.

J. Chen, M. Xie, Z. Xing, C. Chen, X. Xu, L. Zhu and G. Li, "Object detection for graphical user interface: Old fashioned or deep learning or a combination?," in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Nov. 2020, pp. 1202-1214.

V. Lempitsky, P. Kohli, C. Rother, and T. Sharp, "Image Segmentation with a Bounding Box Prior," in International Conference on Computer Vision (ICCV), IEEE, pp. 277–284, 2009.

J. Son, M. Baek, M. Cho, and B. Han, "Multi-object tracking with quadruplet convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5620-5629, 2017.

W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, pp. 21-37, 2016.

P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, "A Review of Yolo algorithm developments," Procedia Computer Science, vol. 199, pp. 1066-1073, 2022.

P. Bharati and A. Pramanik, "Deep learning techniques—R-CNN to mask R-CNN: a survey," in Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019, pp. 657-668, 2020.

G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, X. Tao, J. Fang, I. Imyhxy, L. Lorna, Y. Zeng, C. Wong, A. V, D. Montes, Z. Wang, C. Fati, J. Nadar, Laughing, D. UnglvKitDe, V. Sonck, T. Tkianai, Y. YxNONG, P. Skalski, A. Hogan, D. Nair, M. Strobel, and M. Jain, "ultralytics/yolov5: v7.0-yolov5 sota realtime instance segmentation," Zenodo, 2022.

J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.

C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, "Scaled-yolov4: Scaling cross stage partial network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13029-13038.

A. Neubeck and L. Van Gool, "Efficient non-maximum suppression," in 18th International Conference on Pattern Recognition (ICPR'06), 2006, pp. 850-855.

O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," arXiv:1505.04597, 2015.

A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, "K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data," Information Sciences, 2022.

R. H. Pramestya, "Deteksi dan Klasifikasi Kerusakan Jalan Aspal menggunakan Metode YOLO berbasis Citra Digital," 2018. [Online]. Available: http://repository.its.ac.id/id/eprint/59044.

I. van Kinsbergen, "Inscribed stone at Kawali near Tjiamis," in KITLV Digital Image Collection, KITLV, Before 1900. [Online]. Available: https://digitalcollections.universiteitleiden.nl/view/item/770870.

Downloads

Published

2023-12-28

Issue

Section

Article

Citation Check