YOLOv5 and U-Net-based Character Detection for Nusantara Script
DOI:
https://doi.org/10.15575/join.v8i2.1180Keywords:
Nusantara Script, Character Detection, Bounding Box, YOLO, U-NetAbstract
Indonesia boasts a diverse range of indigenous scripts, called Nusantara scripts, which encompass Bali, Batak, Bugis, Javanese, Kawi, Kerinci, Lampung, Pallava, Rejang, and Sundanese scripts. However, prevailing character detection techniques predominantly cater to Latin or Chinese scripts. In an extension of our prior work, which concentrated on the classification of script types and character recognition within Nusantara script systems, this study advances our research by integrating object detection techniques, employing the YOLOv5 model, and enhancing performance through the incorporation of the U-Net model to facilitate the pinpointing of fundamental Nusantara script's character locations within input document images. Subsequently, our investigation delves into rearranging these character positions in alignment with the distinctive styles of Nusantara scripts. Experimental results reveal YOLOv5's performance, yielding a loss rate of approximately 0.05 in character location detection. Concurrently, the U-Net model exhibits an accuracy ranging from 75% to 90% for predicting character regions. While YOLOv5 may not achieve flawless detection of all Nusantara scripts, integrating the U-Net model significantly enhances the detection rate by 1.2%.
References
P. K. Charles, V. Harish, M. Swathi, and C. H. Deepthi, "A review on the various techniques used for optical character recognition," International Journal of Engineering Research and Applications, vol. 2, no. 1, pp. 659-662, 2012.
G. Nagy, S. Seth, and M. Viswanathan, "A Prototype Document Image Analysis System for Technical Journals," Computer, vol. 25, no. 7, pp. 10–22, 1992.
D. R. Dickson and K. Nusair, "An HR perspective: The global hunt for talent in the digital age," Worldwide Hospitality and Tourism Themes, vol. 2, no. 1, pp. 86–93, 2010. doi: 10.1108/17554211011012612.
J. Lo Bianco, "The importance of language policies and multilingualism for cultural diversity," International Social Science Journal, vol. 61, no. 199, pp. 37–67, 2010. doi: 10.1111/j.1468-2451.2010.01747.x.
Y. Yamashita, K. Higuchi, Y. Yamada, and Y. Haga, "Classification of handprinted Kanji characters by the structured segment matching method," Pattern Recognition Letters, vol. 1, no. 5-6, pp. 475-479, 1983.
G. Lee, J. H. Lee, and J. Yoo, "Multi-level post-processing for Korean character recognition using morphological analysis and linguistic evaluation," Pattern Recognition, vol. 30, no. 8, pp. 1347-1360, 1997.
C. L. Liu, F. Yin, D. H. Wang, and Q. F. Wang, "Online and offline handwritten Chinese character recognition: benchmarking on new databases," Pattern Recognition, vol. 46, no. 1, pp. 155-162, 2013.
M. Avadesh and N. Goyal, "Optical character recognition for Sanskrit using convolution neural networks," in 2018 13th IAPR International Workshop on Document Analysis Systems (DAS), pp. 447-452, 2018. doi: 10.1109/DAS.2018.27.
A. Prasetiadi, J. Saputra, I. Ramadhanti, A. D. Sripamuji, and R. R. Amalia, "Minimalist DCT-based Depthwise Separable Convolutional Neural Network Approach for Tangut Script," Journal of Dinda: Data Science, Information Technology, and Data Analytics, vol. 3, no. 2, pp. 59-64, 2023.
S. Alghyaline, "A Printed Arabic Optical Character Recognition System using Deep Learning," Journal of Computer Science, vol. 18, no. 11, pp. 1038–1050, 2022. doi: 10.3844/jcssp.2022.1038.1050.
B. Kataria and H. B. Jethva, "CNN-Bidirectional LSTM Based Optical Character Recognition of Sanskrit Manuscripts: A Comprehensive Systematic Literature Review," International Journal of Scientific Research in Computer Science, Engineering and Information Technology, pp. 1362–1383, 2019. doi: 10.32628/cseit2064126.
A. W. Mahastama and L. D. Krisnawati, "Optical character recognition for printed Javanese script using projection profile segmentation and nearest centroid classifier," in 2020 Asia Conference on Computers and Communications (ACCC), pp. 52–56, 2020. doi: 10.1109/ACCC51160.2020.9347895.
M. H. Faishal, M. D. Sulistiyo, and A. F. Ihsan, "Javanese Script Letter Detection Using Faster R-CNN," Indonesian Journal of Artificial Intelligence and Data Mining, vol. 6 no. 2, 243-251, 2023
N. Suciati, N. P. Sutramiani, and D. Siahaan, "LONTAR_DETC: Dense and High Variance Balinese Character Detection Method in Lontar Manuscripts," IEEE Access, vol. 10, pp. 14600-14609, 2022.
B. Gašparovi?, G. Mauša, J. Rukavina, and J. Lerga, "Evaluating YOLOv5, YOLOv6, YOLOv7, and YOLOv8 in Underwater Environment: Is There Real Improvement?," in 2023 8th International Conference on Smart and Sustainable Technologies (SpliTech), pp. 1-4, June 2023.
T. A. N. Dang and D. T. Nguyen, "End-to-end information extraction by character-level embedding and multi-stage attentional U-Net," arXiv preprint arXiv:2106.00952, 2021.
A. Prasetiadi, J. Saputra, I. Kresna, and I. Ramadhanti, "Deep Learning Approaches for Nusantara Scripts Optical Character Recognition," IJCCS (Indonesian Journal of Computing and Cybernetics Systems), vol. 17, no. 3, 2023.
D. Ghosh, T. Dube, and A. Shivaprasad, "Script Recognition-a review," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, no. 12, pp. 2142–2161, 2010. doi: 10.1109/TPAMI.2010.30.
E. Alfian, "Penggunaan Unsur Aksara Nusantara Pada Huruf Modern," Jurnal Komunikasi Visual, vol. 7, no. 1, pp. 42–48, 2014.
P. T. Daniels, "Fundamentals of Grammatology," Journal of the American Oriental Society, vol. 119, no. 4, pp. 727–731, Oct.–Dec. 1990. doi: 10.2307/602899.
J. Chen, M. Xie, Z. Xing, C. Chen, X. Xu, L. Zhu and G. Li, "Object detection for graphical user interface: Old fashioned or deep learning or a combination?," in Proceedings of the 28th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Nov. 2020, pp. 1202-1214.
V. Lempitsky, P. Kohli, C. Rother, and T. Sharp, "Image Segmentation with a Bounding Box Prior," in International Conference on Computer Vision (ICCV), IEEE, pp. 277–284, 2009.
J. Son, M. Baek, M. Cho, and B. Han, "Multi-object tracking with quadruplet convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5620-5629, 2017.
W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C. Y. Fu, and A. C. Berg, "Ssd: Single shot multibox detector," in Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part I, pp. 21-37, 2016.
P. Jiang, D. Ergu, F. Liu, Y. Cai, and B. Ma, "A Review of Yolo algorithm developments," Procedia Computer Science, vol. 199, pp. 1066-1073, 2022.
P. Bharati and A. Pramanik, "Deep learning techniques—R-CNN to mask R-CNN: a survey," in Computational Intelligence in Pattern Recognition: Proceedings of CIPR 2019, pp. 657-668, 2020.
G. Jocher, A. Chaurasia, A. Stoken, J. Borovec, NanoCode012, Y. Kwon, K. Michael, X. Tao, J. Fang, I. Imyhxy, L. Lorna, Y. Zeng, C. Wong, A. V, D. Montes, Z. Wang, C. Fati, J. Nadar, Laughing, D. UnglvKitDe, V. Sonck, T. Tkianai, Y. YxNONG, P. Skalski, A. Hogan, D. Nair, M. Strobel, and M. Jain, "ultralytics/yolov5: v7.0-yolov5 sota realtime instance segmentation," Zenodo, 2022.
J. Redmon, S. Divvala, R. Girshick and A. Farhadi, "You only look once: Unified, real-time object detection," in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 779-788, 2016.
C. Y. Wang, A. Bochkovskiy, and H. Y. M. Liao, "Scaled-yolov4: Scaling cross stage partial network," in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, pp. 13029-13038.
A. Neubeck and L. Van Gool, "Efficient non-maximum suppression," in 18th International Conference on Pattern Recognition (ICPR'06), 2006, pp. 850-855.
O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation," arXiv:1505.04597, 2015.
A. M. Ikotun, A. E. Ezugwu, L. Abualigah, B. Abuhaija, and J. Heming, "K-means clustering algorithms: A comprehensive review, variants analysis, and advances in the era of big data," Information Sciences, 2022.
R. H. Pramestya, "Deteksi dan Klasifikasi Kerusakan Jalan Aspal menggunakan Metode YOLO berbasis Citra Digital," 2018. [Online]. Available: http://repository.its.ac.id/id/eprint/59044.
I. van Kinsbergen, "Inscribed stone at Kawali near Tjiamis," in KITLV Digital Image Collection, KITLV, Before 1900. [Online]. Available: https://digitalcollections.universiteitleiden.nl/view/item/770870.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2023 Agi Prasetiadi, Julian Saputra, Iqsyahiro Kresna, Imada Ramadhanti
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License