Evaluating RAG Performance on Small Language Models for Low-Resource Devices through Chunking and Retrieval Methods
DOI:
https://doi.org/10.15575/join.v11i1.1733Keywords:
Chunking Technic, Low-Resource Device, Retrieval-Augmented Generation (RAG), Retrieval Approaches, Small Language Model (SLM)Abstract
Retrieval-Augmented Generation (RAG) combines generative capabilities of language models with external document retrieval to answer questions grounded in reference texts. However, deploying RAG on low-resource devices like Android smartphones is challenging because SLMs have limited computational capacity and depend heavily on efficient chunking and retrieval. Although interest in on-device processing is growing, research on RAG configurations for SLMs under strict resource constraints especially for domain-specific tasks remains limited. This study therefore investigates which combinations of chunking technique, chunk size, overlap, and retrieval strategy best balance accuracy and speed on low-resource devices. The evaluation uses 148 Indonesian questions sourced from an official Hajj guidebook. The study consists of two phases retrieval and generation. Retrieval is evaluated using BLEU, ROUGE-L, MRR, MAP, and Hit@k, while answer quality is measured with BERTScore. The experiments compare different chunking methods (fixed-size or semantic), chunk sizes (128 or 256 tokens), overlaps (25, 50 and 100 tokens), and retrieval methods (dense, sparse, or hybrid). Results show that sparse retrieval with 256-token chunks and 100-token overlap yields the best answer quality (F1 = 0.726). However, 128-token chunks with the same overlap provide the fastest generation time (69.737 seconds). The main contribution of this study is a systematic evaluation of RAG configurations for fully on-device SLMs using a domain-specific Hajj and Umrah dataset not explored in prior research. The findings provide practical guidance for designing efficient and accurate RAG-based question-answering systems on low-resource devices.
References
[1] Y. Gao et al., ‘Retrieval-Augmented Generation for Large Language Models: A Survey’, Dec. 2023, [Online]. Available: http://arxiv.org/abs/2312.10997
[2] X. Wang et al., ‘Searching for Best Practices in Retrieval-Augmented Generation’, Jul. 2024, [Online]. Available: http://arxiv.org/abs/2407.01219
[3] J. Liu, R. Ding, L. Zhang, P. Xie, and F. Huang, ‘CoFE-RAG: A Comprehensive Full-chain Evaluation Framework for Retrieval-Augmented Generation with Enhanced Data Diversity’, Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.12248
[4] R. F. Reza, Muhmmad Thoriq, and Rd. Imam Saepul Millah, ‘Sentiment Analysis of Marketplace Review with Islamic Perspective using Fine-Tuning DistilBERT’, Khazanah Journal of Religion and Technology, vol. 2, no. 2, pp. 45–54, Jan. 2025, doi: 10.15575/kjrt.v2i2.1118.
[5] C. Van Nguyen et al., ‘A Survey of Small Language Models’, Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.20011
[6] T. Fan, J. Wang, X. Ren, and C. Huang, ‘MiniRAG: Towards Extremely Simple Retrieval-Augmented Generation’, Jan. 2025, [Online]. Available: http://arxiv.org/abs/2501.06713
[7] Z. Lu et al., ‘Small Language Models: Survey, Measurements, and Insights’, Sep. 2024, [Online]. Available: http://arxiv.org/abs/2409.15790
[8] P. Lewis et al., ‘Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks’, Apr. 2021, [Online]. Available: http://arxiv.org/abs/2005.11401
[9] D. Kim, B. Kim, D. Han, and M. Eibich, ‘AutoRAG: Automated Framework for optimization of Retrieval Augmented Generation Pipeline’, Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.20878
[10] Nerve Sparks, ‘nerve-sparks/iris_android’, 2025, India.
[11] Shubham Panchal, ‘shubham0204/Android-Document-QA’, 2025, India.
[12] S. Setty, H. Thakkar, A. Lee, E. Chung, and N. Vidra, ‘Improving Retrieval for RAG based Question Answering Models on Financial Documents’, Mar. 2024, [Online]. Available: http://arxiv.org/abs/2404.07221
[13] R. Qu, R. Tu, and F. Bao, ‘Is Semantic Chunking Worth the Computational Cost?’, Oct. 2024, [Online]. Available: http://arxiv.org/abs/2410.13070
[14] X. Ma, Y. Gong, P. He, H. Zhao, and N. Duan, ‘Query Rewriting for Retrieval-Augmented Large Language Models’, 2023. [Online]. Available: https://github.com/xbmxb/RAG-query-rewriting
[15] S.-C. Lin, J.-H. Yang, R. Nogueira, M.-F. Tsai, C.-J. Wang, and J. Lin, ‘Multi-Stage Conversational Passage Retrieval: An Approach to Fusing Term Importance Estimation and Neural Query Rewriting’, Mar. 2021, [Online]. Available: http://arxiv.org/abs/2005.02230
[16] P. Mandikal and R. Mooney, ‘Sparse Meets Dense: A Hybrid Approach to Enhance Scientific Document Retrieval’, Jan. 2024, [Online]. Available: http://arxiv.org/abs/2401.04055
[17] B. G. Chepino, R. R. Yacoub, A. Aula, M. Saleh, and B. W. Sanjaya, ‘EFFECT OF MINMAX NORMALIZATION ON ORB DATA FOR IMPROVED ANN ACCURACY’, Journal of Electrical Engineering, Energy, and Information Technology (J3EIT), vol. 11, no. 2, p. 29, Aug. 2023, doi: 10.26418/j3eit.v11i2.68689.
[18] Bartowski, ‘bartowski/gemma-2-2b-it-GGUF’, Hugging Face.
[19] J. Park, K. Atarashi, K. Takeuchi, and H. Kashima, ‘Emulating Retrieval Augmented Generation via Prompt Engineering for Enhanced Long Context Comprehension in LLMs’, Feb. 2025, [Online]. Available: http://arxiv.org/abs/2502.12462
[20] H. Yu, A. Gan, K. Zhang, S. Tong, Q. Liu, and Z. Liu, ‘Evaluation of Retrieval-Augmented Generation: A Survey’, May 2024, doi: 10.1007/978-981-96-1024-2_8.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2026 Amelia Dewi Agustiani, Salsabila Maharani Putri, Jonner Hutahaean, Muhammad Rizqi Sholahuddin, Muhammad Riza Alifi, Ade Hodijah

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.

This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License








