Regression Analysis for Crop Production Using CLARANS Algorithm


  • Arie Vatresia 1) Research Center for Computing, The National Research and Innovation Agency, Bogor, Indonesia 2) Informatics, Engineering Faculty, Universitas Bengkulu, Indonesia, Indonesia
  • Ruvita Faurina Informatics, Engineering Faculty, Universitas Bengkulu, Indonesia, Indonesia
  • Yanti Simanjuntak Informatics, Engineering Faculty, Universitas Bengkulu, Indonesia, Indonesia



CRALARNS, Clusters, Crop Production, Regression, Rainfall


Crop production rate relies on rainfall over Rejang Lebong district. Data showed a discrepancy between increased crop production and rainfall in Rejang Lebong District. However, the spatiotemporal distribution of the crop variable's dependencies remains unclear. This study analyses the relationship between rainfall and crop production rate in the Rejang Lebong district based on the performance of the machine learning method. In addition, this research also performed regression analysis to carry out rainfall clusters and crop production. This order provides information in the form of cluster results to determine how much the rainfall variable influences the crop production rate  in each cluster. Harnessing the Elbow, CLARANS, Simple Linear Regression, and Silhouette Coefficient methods, this study used 231 rainfall data sourced from the Bengkulu BMKG and 110 data for plant production obtained from BPS Bengkulu Province from 2000-2022. This research found that the optimal clusters were 3 clusters. C1 contains 106 data with the largest regression value for chili = 0.127, C2 contains 15 data with the largest regression value for mustard greens = 0.135, and C3 contains 110 data with the largest regression value for cabbage = 0.408, eggplant = 0.197, and carrots = 0.201. Furthermore, this research also found that the biggest correlation of crops with highly significant improvement would be cabbage commodity (Y=0.4114X+0.2013) and chili plantation with high RSME (0.9897).


T. T. H. Tambunan, Perkembangan Sektor Pertanian di Indonesia, Cet. 1. Jakarta?: Ghalia Indonesia, 2003. [Online]. Available:

C. Kubitza, V. V Krishna, K. Urban, Z. Alamsyah, and M. Qaim, “Land Property Rights, Agricultural Intensification, and Deforestation in Indonesia,” Ecological Economics, vol. 147, pp. 312–321, 2018, doi:

TNA, “Indonesia Technology Needs Assessment for Climate Change Mitigation,” UNEP on behalf of Global Environmental Facility (GEF), 2012.

H. S. Lee, “General Rainfall Patterns in Indonesia and the Potential Impacts of Local Seas on Rainfall Intensity,” Water (Switzerland), vol. 7, no. 4, 2015, doi: 10.3390/w7041751.

R. D’Arrigo and R. Wilson, “El Niño and Indian Ocean influences on Indonesian drought: Implications for forecasting rainfall and crop productivity,” International Journal of Climatology, vol. 28, no. 5, 2008, doi: 10.1002/joc.1654.

Supari, F. Tangang, E. Salimun, E. Aldrian, A. Sopaheluwakan, and L. Juneng, “ENSO modulation of seasonal rainfall and extremes in Indonesia,” Clim Dyn, vol. 51, no. 7–8, 2018, doi: 10.1007/s00382-017-4028-8.

Badan Pusat Statistika, “Statistik Perumahan Dan Permukiman 2019,” Katalog BPS, 2019.

N. S. Sani, A. H. A. Rahman, A. Adam, I. Shlash, and M. Aliff, “Ensemble Learning for Rainfall Prediction,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 11, 2020, doi: 10.14569/IJACSA.2020.0111120.

G. B. Sai Tarun, J. V. Sriram, K. Sairam, K. T. Sreenivas, and M. V. B. T. Santhi, “Rainfall prediction using machine learning techniques,” International Journal of Innovative Technology and Exploring Engineering, vol. 8, no. 7, 2019.

S. Aftab, M. Ahmad, N. Hameed, M. S. Bashir, I. Ali, and Z. Nawaz, “Rainfall prediction using data mining techniques: A systematic literature review,” International Journal of Advanced Computer Science and Applications, vol. 9, no. 5. 2018. doi: 10.14569/IJACSA.2018.090518.

W. H. H. Wischmeier and D. D. D. Smith, “Predicting rainfall erosion losses,” Agriculture handbook no. 537, no. 537, pp. 285–291, 1978, doi: 10.1029/TR039i002p00285.

A. Kurniadi, E. Weller, S. K. Min, and M. G. Seong, “Independent ENSO and IOD impacts on rainfall extremes over Indonesia,” International Journal of Climatology, vol. 41, no. 6, 2021, doi: 10.1002/joc.7040.

Supriyono, F. Wira Citra, B. Sulistyo, and M. Faiz Barchia, “Mapping Erosivity Rain And Spatial Distribution Of Rainfall In Catchment Area Bengkulu River Watershed,” Journal of Environment and Earth Science, vol. 7, no. 10, 2017.

M. Wang, A. Wang, and A. Li, “Mining spatial-temporal clusters from geo-databases,” in Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2006. doi: 10.1007/11811305_29.

M. Bertolotto, S. Di Martino, F. Ferrucci, and T. Kechadi, “Towards a framework for mining and analyzing spatio-temporal datasets,” International Journal of Geographical Information Science, vol. 21, no. 8, 2007, doi: 10.1080/13658810701349052.

G. Atluri, A. Karpatne, and V. Kumar, “Spatio-temporal data mining: A survey of problems and methods,” ACM Computing Surveys, vol. 51, no. 4. 2018. doi: 10.1145/3161602.

M. S. M. Ariff, N. M., Bakar, M. A. A., Mahbar, S. F. S., & Nadzir, “Clustering Of Rainfall Distribution Patterns Using Time Series Clustering Method,” Malaysian Journal of Science, vol. 38, no. Sp2, 2019.

V. Tobar and G. Wyseure, “Seasonal rainfall patterns classification, relationship to ENSO and rainfall trends in Ecuador,” International Journal of Climatology, vol. 38, no. 4, 2018, doi: 10.1002/joc.5297.

S. M. C. M. Nor, S. M. Shaharudin, S. Ismail, S. A. M. Najib, M. L. Tan, and N. Ahmad, “Statistical Modeling of RPCA-FCM in Spatiotemporal Rainfall Patterns Recognition,” Atmosphere (Basel), vol. 13, no. 1, 2022, doi: 10.3390/atmos13010145.

F. Liu and Y. Deng, “Determine the Number of Unknown Targets in Open World Based on Elbow Method,” IEEE Transactions on Fuzzy Systems, vol. 29, no. 5, 2021, doi: 10.1109/TFUZZ.2020.2966182.

B. Purnima, K. Arvind, P. Bholowalia, and A. Kumar, “EBK-Means: A Clustering Technique based on Elbow Method and K-Means in WSN,” Int J Comput Appl, vol. 105, no. 9, 2014.

C. Shi, B. Wei, S. Wei, W. Wang, H. Liu, and J. Liu, “A quantitative discriminant method of elbow point for the optimal number of clusters in clustering algorithm,” EURASIP J Wirel Commun Netw, vol. 2021, no. 1, 2021, doi: 10.1186/s13638-021-01910-w.

V. Sagvekar, V. Sagvekar, and K. Deorukhkar, “Performance assessment of CLARANS: A Method for Clustering Objects for Spatial Data Mining,” Global Journal of Engineering, Design & Technology/Global Institute for Research & Education, vol. 2, no. 6, 2013.

A. Azizah, R. Cahyandari, A. F. Huda, Sukono, Subiyanto, and A. T. Bon, “Application of spatial weighting matrix of GSTAR by using CLARANS clustering on farmer exchange rates in 32 provinces in Indonesia,” in Proceedings of the International Conference on Industrial Engineering and Operations Management, 2019.

R. T. Ng and J. Han, “CLARANS: A method for clustering objects for spatial data mining,” IEEE Trans Knowl Data Eng, vol. 14, no. 5, 2002, doi: 10.1109/TKDE.2002.1033770.

M. B. Al-Zoubi and M. Al Rawi, “An efficient approach for computing silhouette coefficients,” Journal of Computer Science, vol. 4, no. 3, 2008, doi: 10.3844/jcssp.2008.252.255.

H. ?ezanková, “Different approaches to the silhouette coefficient calculation in cluster evaluation,” 21st International Scientific Conference AMSE, no. September 2018.

R. Hidayati, A. Zubair, A. Hidayat Pratama, L. Indana, P. Studi Sistem Informasi, and F. Teknologi Informasi, “Silhouette Coefficient Analysis in 6 Measuring Distances of K-Means Clustering,” Techno.Com, vol. 20, no. 2, 2021.

R. D. Jujjuri and M. Venkateswara Rao, “Evaluation of enhanced subspace clustering validity using silhouette coefficient internal measure,” Journal of Advanced Research in Dynamical and Control Systems, vol. 11, no. 1, 2019.

D. Bera, N. Das Chatterjee, and S. Bera, “Comparative performance of linear regression, polynomial regression and generalized additive model for canopy cover estimation in the dry deciduous forest of West Bengal, Remote Sensing Applications: Society and Environment,” vol. 22, p. 100502, Dec. 2021.

Y. W. Park and D. Klabjan, “Subset selection for multiple linear regression via optimization,” Journal of Global Optimization, vol. 77, no. 3, 2020, doi: 10.1007/s10898-020-00876-1.

B. Dhaval and A. Deshpande, “Short-term load forecasting with using multiple linear regression,” International Journal of Electrical and Computer Engineering, vol. 10, no. 4, 2020, doi: 10.11591/ijece.v10i4.pp3911-3917.

B. Zerouali, M. Chettih, Z. Abda, M. Mesbah, C. A. G. Santos, and R. M. Brasil Neto, “A new regionalization of rainfall patterns based on wavelet transform information and hierarchical cluster analysis in northeastern Algeria,” Theor Appl Climatol, vol. 147, no. 3–4, 2022, doi: 10.1007/s00704-021-03883-8.

W. Y. Ayele, “Adapting CRISP-DM for Idea Mining,” International Journal of Advanced Computer Science and Applications, vol. 11, no. 6, pp. 20–32, 2020.

R. Wirth, “CRISP-DM?: Towards a Standard Process Model for Data Mining,” Proceedings of the Fourth International Conference on the Practical Application of Knowledge Discovery and Data Mining, no. 24959, 2000.

C. Schr?er, F. Kruse, and J. M. G?mez, “A systematic literature review on applying CRISP-DM process model, Procedia Computer Science,” vol. 181, pp. 526–534, 2021.

BNPB, “Infografis Bencana Banjir dan Longsor Bengkulu,” 2023. (accessed Feb. 14, 2023).

S. Supriyono, S. Utaya, D. Taryana, and B. Handoyo, “Spatial-Temporal Trend Analysis of Rainfall Erosivity and Erosivity Density of Tropical Area in Air Bengkulu Watershed, Indonesia, Quaestiones Geographicae,” vol. 40, no. 3, pp. 125–142, 2021.

C. Shearer et al., “The CRISP-DM model: The New Blueprint for Data Mining,” Journal of Data Warehousing, 2000.

J. Wu, Advances in K-means Clustering: a data mining thinking. 2012.

E. Biabiany, D. C. Bernard, V. Page, and H. Paugam-Moisy, “Design of an expert distance metric for climate clustering: The case of rainfall in the Lesser Antilles,” Comput Geosci, vol. 145, 2020, doi: 10.1016/j.cageo.2020.104612.

M. Senožetnik, L. Bradeško, B. Kaži?, D. Mladeni, and T. Šubic, “Spatio-temporal clustering methods,”, 2016.

H. F. Tork, “Spatio-Temporal Clustering Methods Classification,” Doctoral Symposium on Informatics Engineering (DSIE’2012), no. December 2012.

V. V. D. M. S. Takalikar, “Survey on Spatio-Temporal Clustering,” International Journal of Science and Research (IJSR), vol. 5, no. 7, 2016.

Y. Ren, N. Wang, M. Li, and Z. Xu, “Deep density-based image clustering,” Knowl Based Syst, vol. 197, 2020, doi: 10.1016/j.knosys.2020.105841.

S. Rath, A. Tripathy, and A. R. Tripathy, “Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model,” Diabetes and Metabolic Syndrome: Clinical Research and Reviews, vol. 14, no. 5, 2020, doi: 10.1016/j.dsx.2020.07.045.

G. Mulyasari, “KAJIAN KETAHANAN PANGAN DAN KERAWANAN PANGAN DI PROVINSI BENGKULU,” Jurnal AGRISEP, vol. 15, no. 1, 2016, doi: 10.31186/jagrisep.15.1.83-90.

A. Sutoyo, “Implementasi Program Aksi Ketahanan Pangan Di Propinsi Bengkulu,” Jurnal Administrasi Publik, vol. 11, no. 1, 2013.

G. Su, “Analysis of optimization method for online education data mining based on big data assessment technology,” Int J Contin Eng Educ Life Long Learn, vol. 29, no. 4, 2019, doi: 10.1504/IJCEELL.2019.102768.

S. Shekhar, M. R. Evans, J. M. Kang, and P. Mohan, “Identifying patterns in spatial information: A survey of methods,” Wiley Interdiscip Rev Data Min Knowl Discov, vol. 1, no. 3, pp. 193–214, 2011, doi: 10.1002/widm.25.

C. Fischer et al., “Mining Big Data in Education: Affordances and Challenges,” Review of Research in Education, vol. 44, no. 1, 2020, doi: 10.3102/0091732X20903304.

Jeonghee Kim, “Exploratory Analysis and Visualization of Spatio-Temporal Data Using Data Mining,” Journal of the Association of Korean Photo-Geographers, vol. 29, no. 4, 2019, doi: 10.35149/jakpg.2019.29.4.01







Citation Check