CatBoost Optimization Using Recursive Feature Elimination
DOI:
https://doi.org/10.15575/join.v9i2.1324Keywords:
CatBoost, Feature Selection, RFEAbstract
References
[1] L. Prokhorenkova, G. Gusev, A. Vorobev, A. V. Dorogush, and A. Gulin, “CatBoost: unbiased boosting with categorical features,” in Advances in Neural Information Processing Systems, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett, Eds., Curran Associates, Inc., 2018. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2018/file/14491b756b3a51daac41c24863285549-Paper.pdf
[2] S. Karimi, J. Shiri, and P. Marti, “Supplanting missing climatic inputs in classical and random forest models for estimating reference evapotranspiration in humid coastal areas of Iran,” Comput Electron Agric, vol. 176, 2020, doi: 10.1016/j.compag.2020.105633.
[3] A. V. Dorogush, V. Ershov, and A. Gulin, “CatBoost: gradient boosting with categorical features support,” Oct. 2018, [Online]. Available: http://arxiv.org/abs/1810.11363
[4] J. T. Hancock and T. M. Khoshgoftaar, “CatBoost for big data: an interdisciplinary review,” J Big Data, vol. 7, no. 1, p. 94, Dec. 2020, doi: 10.1186/s40537-020-00369-8.
[5] “comparison-between-xgboost-lightgbm-and-catboost-using-a-home-credit-dataset”.
[6] Y. Xia, L. He, Y. Li, N. Liu, and Y. Ding, “Predicting loan default in peer‐to‐peer lending using narrative data,” J Forecast, vol. 39, no. 2, pp. 260–280, Mar. 2020, doi: 10.1002/for.2625.
[7] P. S. Kumar, A. K. K, S. Mohapatra, B. Naik, J. Nayak, and M. Mishra, “CatBoost Ensemble Approach for Diabetes Risk Prediction at Early Stages,” in 2021 1st Odisha International Conference on Electrical Power Engineering, Communication and Computing Technology(ODICON), IEEE, Jan. 2021, pp. 1–6. doi: 10.1109/ODICON50556.2021.9428943.
[8] Y. Rathod et al., “Predictive Analysis of Polycystic Ovarian Syndrome using CatBoost Algorithm,” in 2022 IEEE Region 10 Symposium (TENSYMP), IEEE, Jul. 2022, pp. 1–6. doi: 10.1109/TENSYMP54529.2022.9864439.
[9] S. Ben Jabeur, C. Gharib, S. Mefteh-Wali, and W. Ben Arfi, “CatBoost model and artificial intelligence techniques for corporate failure prediction,” Technol Forecast Soc Change, vol. 166, p. 120658, May 2021, doi: 10.1016/j.techfore.2021.120658.
[10] N. Nguyen et al., “A Proposed Model for Card Fraud Detection Based on CatBoost and Deep Neural Network,” IEEE Access, vol. 10, pp. 96852–96861, 2022, doi: 10.1109/ACCESS.2022.3205416.
[11] S. Hussain et al., “A novel feature engineered-CatBoost-based supervised machine learning framework for electricity theft detection,” Energy Reports, vol. 7, pp. 4425–4436, Nov. 2021, doi: 10.1016/j.egyr.2021.07.008.
[12] R. Punmiya and S. Choe, “Energy Theft Detection Using Gradient Boosting Theft Detector With Feature Engineering-Based Preprocessing,” IEEE Trans Smart Grid, vol. 10, no. 2, pp. 2326–2329, Mar. 2019, doi: 10.1109/TSG.2019.2892595.
[13] K. M. Ghori, A. Rabeeh Ayaz, M. Awais, M. Imran, A. Ullah, and L. Szathmary, “Impact of Feature Selection on Non-technical Loss Detection,” in 2020 6th Conference on Data Science and Machine Learning Applications (CDMA), IEEE, Mar. 2020, pp. 19–24. doi: 10.1109/CDMA47397.2020.00009.
[14] A. Sau and I. Bhakta, “Screening of anxiety and depression among seafarers using machine learning technology,” Inform Med Unlocked, vol. 16, p. 100228, 2019, doi: 10.1016/j.imu.2019.100228.
[15] J. Nayak, B. Naik, P. B. Dash, S. Vimal, and S. Kadry, “Hybrid Bayesian optimization hypertuned catboost approach for malicious access and anomaly detection in IoT nomalyframework,” Sustainable Computing: Informatics and Systems, vol. 36, p. 100805, Dec. 2022, doi: 10.1016/j.suscom.2022.100805.
[16] N. Bakhareva, A. Shukhman, A. Matveev, P. Polezhaev, Y. Ushakov, and L. Legashev, “Attack Detection in Enterprise Networks by Machine Learning Methods,” in 2019 International Russian Automation Conference (RusAutoCon), IEEE, Sep. 2019, pp. 1–6. doi: 10.1109/RUSAUTOCON.2019.8867696.
[17] Y. Wang, X. Huang, X. Ren, Z. Chai, and X. Chen, “In-process belt-image-based material removal rate monitoring for abrasive belt grinding using CatBoost algorithm,” The International Journal of Advanced Manufacturing Technology, vol. 123, no. 7–8, pp. 2575–2591, Dec. 2022, doi: 10.1007/s00170-022-10341-w.
[18] M. Ou, P. Cui, J. Pei, Z. Zhang, and W. Zhu, “Asymmetric Transitivity Preserving Graph Embedding,” in Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA: ACM, Aug. 2016, pp. 1105–1114. doi: 10.1145/2939672.2939751.
[19] H.-C. Yi, Z.-H. You, and Z.-H. Guo, “Construction and Analysis of Molecular Association Network by Combining Behavior Representation and Node Attributes,” Front Genet, vol. 10, Nov. 2019, doi: 10.3389/fgene.2019.01106.
[20] F. Lin, E.-M. Cui, Y. Lei, and L. Luo, “CT-based machine learning model to predict the Fuhrman nuclear grade of clear cell renal cell carcinoma,” Abdominal Radiology, vol. 44, no. 7, pp. 2528–2534, Jul. 2019, doi: 10.1007/s00261-019-01992-7.
[21] A. A. Kolesnikov, P. M. Kikin, and A. M. Portnov, “DISEASES SPREAD PREDICTION IN TROPICAL AREAS BY MACHINE LEARNING METHODS ENSEMBLING AND SPATIAL ANALYSIS TECHNIQUES,” The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. XLII-3/W8, pp. 221–226, Aug. 2019, doi: 10.5194/isprs-archives-XLII-3-W8-221-2019.
[22] J. Fan, X. Wang, F. Zhang, X. Ma, and L. Wu, “Predicting daily diffuse horizontal solar radiation in various climatic regions of China using support vector machine and tree-based soft computing models with local and extrinsic climatic data,” J Clean Prod, vol. 248, p. 119264, Mar. 2020, doi: 10.1016/j.jclepro.2019.119264.
[23] E. B. Postnikov, B. Jasiok, and M. Chorążewski, “The CatBoost as a tool to predict the isothermal compressibility of ionic liquids,” J Mol Liq, vol. 333, p. 115889, Jul. 2021, doi: 10.1016/j.molliq.2021.115889.
[24] A. N. Beskopylny et al., “Concrete Strength Prediction Using Machine Learning Methods CatBoost, k-Nearest Neighbors, Support Vector Regression,” Applied Sciences, vol. 12, no. 21, p. 10864, Oct. 2022, doi: 10.3390/app122110864.
[25] D. Niu, L. Diao, Z. Zang, H. Che, T. Zhang, and X. Chen, “A Machine-Learning Approach Combining Wavelet Packet Denoising with Catboost for Weather Forecasting,” Atmosphere (Basel), vol. 12, no. 12, p. 1618, Dec. 2021, doi: 10.3390/atmos12121618.
[26] G. Huang et al., “Evaluation of CatBoost method for prediction of reference evapotranspiration in humid regions,” J Hydrol (Amst), vol. 574, pp. 1029–1041, Jul. 2019, doi: 10.1016/j.jhydrol.2019.04.085.
[27] W. Xiang, P. Xu, J. Fang, Q. Zhao, Z. Gu, and Q. Zhang, “Multi-dimensional data-based medium- and long-term power-load forecasting using double-layer CatBoost,” Energy Reports, vol. 8, pp. 8511–8522, Nov. 2022, doi: 10.1016/j.egyr.2022.06.063.
[28] H. Sun, Y. Chen, L. Li, and B. Zhao, “Estimating Sea Surface pCO2 in the North Atlantic based on CatBoost,” 2021, doi: 10.20944/preprints202104.0065.v1.
[29] F. Yao, J. Sun, and J. Dong, “Estimating Daily Dew Point Temperature Based on Local and Cross-Station Meteorological Data Using CatBoost Algorithm,” Computer Modeling in Engineering & Sciences, vol. 130, no. 2, pp. 671–700, 2022, doi: 10.32604/cmes.2022.018450.
[30] M. Luo et al., “Combination of Feature Selection and CatBoost for Prediction: The First Application to the Estimation of Aboveground Biomass,” Forests, vol. 12, no. 2, p. 216, Feb. 2021, doi: 10.3390/f12020216.
[31] N. H. M. Khalid, A. R. Ismail, N. A. Aziz, and A. A. A. Hussin, “Performance Comparison of Feature Selection Methods for Prediction in Medical Data,” 2023, pp. 92–106. doi: 10.1007/978-981-99-0405-1_7.
[32] R. Zhu, G. Ciren, B. Tang, and X. Gong, “Power system short‐term voltage stability assessment based on improved CatBoost with consideration of model confidence,” Energy Sci Eng, vol. 11, no. 2, pp. 783–795, Feb. 2023, doi: 10.1002/ese3.1362.
[33] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation Forest,” in 2008 Eighth IEEE International Conference on Data Mining, IEEE, Dec. 2008, pp. 413–422. doi: 10.1109/ICDM.2008.17.
[34] D. Chicco, M. J. Warrens, and G. Jurman, “The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation,” PeerJ Comput Sci, vol. 7, p. e623, Jul. 2021, doi: 10.7717/peerj-cs.623.
Downloads
Published
Issue
Section
Citation Check
License
Copyright (c) 2024 Agus Hadianto; Wiranto Herry Utomo
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.
You are free to:
- Share — copy and redistribute the material in any medium or format for any purpose, even commercially.
- The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
-
Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NoDerivatives — If you remix, transform, or build upon the material, you may not distribute the modified material.
-
No additional restrictions — You may not apply legal terms or technological measures that legally restrict others from doing anything the license permits.
Notices:
- You do not have to comply with the license for elements of the material in the public domain or where your use is permitted by an applicable exception or limitation.
- No warranties are given. The license may not give you all of the permissions necessary for your intended use. For example, other rights such as publicity, privacy, or moral rights may limit how you use the material.
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License