A Consolidated Tree Structure Combining Multiple Regression Trees With Varying Depths, Resulting in an Efficient Ensemble Model
DOI:
https://doi.org/10.31449/inf.v47i9.3844Abstract
Regression is a commonly used technique to predict a continuous target value based on a set of input features. Decision trees are hierarchical models that offer high interpretability, fast and precise reasoning, and are also used for regression tasks. However, determining the optimal stopping conditions for decision trees is a complex problem that has attracted significant research interest. Ensemble based modeling is an effective approach for adjusting hyper-parameters, where base models with varying parameter values are combined instead of searching for the best value. Random forests are a classic example of an ensemble model that combines decision trees generated from different perspectives. This paper proposes a novel approach that generates base trees using the same tree-generation procedure, but with different stopping conditions. Unlike random forests, this model can be efficiently integrated into a single tree structure. Additionally, the paper proposes some aggregation methods based on weighting the base models. Experimental results on standard datasets demonstrate that the proposed method outperforms well-known stopping conditions.References
Abellan, J., Mantas, C. J., Castellano, J. G., & Moral-Garcia, S. (2018). Increasing diversity in random forest learning algorithm via imprecise probabilities. Expert Systems with Applications, 97, 228-243.
Ahmad, M. W., Reynolds, J., & Rezgui, Y. (2018). Predictive modelling for solar thermal energy systems: A comparison of support vector regression, random forest, extra trees and regression trees. Journal of cleaner production, 203, 810-821.
Alamgir, M. S. M., Sultana, M. N., & Chang, K. (2020). Link adaptation on an underwater communications network using machine learning algorithms: Boosted regression tree approach. IEEE access, 8, 73957-73971.
Asuncion, A., & Newman, D. (2007). UCI machine learning repository.
Avellaneda, F. (2020, April). Efficient inference of optimal decision trees. In Proceedings of the AAAI Conference on Artificial Intelligence (Vol. 34, No. 04, pp. 3195-3202).
Biau, G., & Scornet, E. (2016). A random forest guided tour. Test, 25(2), 197-227.
Breiman, Leo, et al. Classification and regression trees. Routledge, 2017.
Charbuty, B., & Abdulazeez, A. (2021). Classification based on decision tree algorithm for machine learning. Journal of Applied Science and Technology Trends, 2(01), 20-28.
Choubin, B., Moradi, E., Golshan, M., Adamowski, J., Sajedi-Hosseini, F., & Mosavi, A. (2019). An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Science of the Total Environment, 651, 2087-2096.
Fidalgo-Merino, R., & Nunez, M. (2011). Self-adaptive induction of regression trees. IEEE transactions on pattern analysis and machine intelligence, 33(8), 1659-1672.
Ghasemain, B., Asl, D. T., Pham, B. T., Avand, M., Nguyen, H. D., & Janizadeh, S. J. V. J. O. E. S. (2020). Shallow landslide susceptibility mapping: A comparison between classification and regression tree and reduced error pruning tree algorithms. Vietnam Journal of Earth Sciences, 42(3), 208-227.
Gomes, C. M. A., & Jelihovschi, E. (2020). Presenting the regression tree method and its application in a large-scale educational dataset. International Journal of Research & Method in Education, 43(2), 201-221.
Gomes, C. M. A., Amantes, A., & Jelihovschi, E. G. (2020). Applying the regression tree method to predict students’ science achievement. Trends in Psychology, 28(1), 99-117.
Hornung, R. (2020). Diversity forests: Using split sampling to allow for complex split procedures in random forest.
Hu, Y., Dai, Z., & Guldmann, J. M. (2020). Modeling the impact of 2D/3D urban indicators on the urban heat island over different seasons: A boosted regression tree approach. Journal of environmental management, 266, 110424.
Jadhav, D. A. (2021). An enhanced and secured predictive model of Ada-Boost and Random-Forest techniques in HCV detections. Materials Today: Proceedings.
Kordos, M., Piotrowski, J., Bialka, S., Blachnik, M., Golak, S., & Wieczorek, T. (2012, March). Evolutionary optimized forest of regression trees: application in metallurgy. In International Conference on Hybrid Artificial Intelligence Systems (pp. 409-420). Springer, Berlin, Heidelberg.
Loh, W. Y. (2002). Regression tress with unbiased variable selection and interaction detection. Statistica sinica, 361-386.
Lotfi, S., Ghasemzadeh, M., Mohsenzadeh, M., & Mirzarezaee, M. (2021). The Construction of Scalable Decision Tree based on Fast Splitting and J-Max Pre-Pruning on Large Datasets. International Journal of Engineering, 34(8).
Morgan, J. N., & Sonquist, J. A. (1963). Problems in the analysis of survey data, and a proposal. Journal of the American statistical association, 58(302), 415-434.
Muharam, F. M., Nurulhuda, K., Zulkafli, Z., Tarmizi, M. A., Abdullah, A. N. H., Che Hashim, M. F., ... & Ismail, M. R. (2021). UAV-and Random-Forest-AdaBoost (RFA)-Based Estimation of Rice Plant Traits. Agronomy, 11(5), 915.
Nancy, P., Muthurajkumar, S., Ganapathy, S., Kumar, S. S., Selvi, M., & Arputharaj, K. (2020). Intrusion detection using dynamic feature selection and fuzzy temporal decision tree classification for wireless sensor networks. IET Communications, 14(5), 888-895.
Panhalkar, A. R., & Doye, D. D. (2021). A novel approach to build accurate and diverse decision tree forest. Evolutionary intelligence, 1-15.
Pham, B. T., Prakash, I., & Bui, D. T. (2018). Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology, 303, 256-270.
Rajesh, B., Vardhan, M. V. S., & Sujihelen, L. (2020, June). Leaf Disease Detection and Classification by Decision Tree. In 2020 4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (pp. 705-708). IEEE.
Sahoo, S., Subudhi, A., Dash, M., & Sabut, S. (2020). Automatic classification of cardiac arrhythmias based on hybrid features and decision tree algorithm. International Journal of Automation and Computing, 17(4), 551-561.
Salman Saeed, M., Mustafa, M. W., Sheikh, U. U., Jumani, T. A., Khan, I., Atawneh, S., & Hamadneh, N. N. (2020). An efficient boosted C5. 0 Decision-Tree-Based classification approach for detecting non-technical losses in power utilities. Energies, 13(12), 3242.
Shabani, S., Pourghasemi, H. R., & Blaschke, T. (2020). Forest stand susceptibility mapping during harvesting using logistic regression and boosted regression tree machine learning models. Global Ecology and Conservation, 22, e00974.
Vanfretti, L., & Arava, V. N. (2020). Decision tree-based classification of multiple operating conditions for power system voltage stability assessment. International Journal of Electrical Power & Energy Systems, 123, 106251.
Wang, C., Wang, A., Xu, J., Wang, Q., & Zhou, F. (2020). Outsourced privacy-preserving decision tree classification service over encrypted data. Journal of Information Security and Applications, 53, 102517.
Wang, Q., Zhou, Y., Ding, W., Zhang, Z., Muhammad, K., & Cao, Z. (2020). Random forest with self-paced bootstrap learning in lung cancer prognosis. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), 16(1s), 1-12.
Wang, Y., Xia, S. T., & Wu, J. (2017). A less-greedy two-term Tsallis Entropy Information Metric approach for decision tree classification. Knowledge-Based Systems, 120, 34-42.
Witten, I. H., Frank, E., Hall, M. A., Pal, C. J., & DATA, M. (2005). Practical machine learning tools and techniques. In DATA MINING (Vol. 2, p. 4).
Yang, Q., Williamson, A. M., Hasted, A., & Hort, J. (2020). Exploring the relationships between taste phenotypes, genotypes, ethnicity, gender and taste perception using Chi-square and regression tree analysis. Food Quality and Preference, 83, 103928.
Yang, S. B., & Chen, T. L. (2020). Uncertain decision tree for bank marketing classification. Journal of Computational and Applied Mathematics, 371, 112710.
Zhang, B., Wei, Z., Ren, J., Cheng, Y., & Zheng, Z. (2018). An empirical study on predicting blood pressure using classification and regression trees. IEEE access, 6, 21758-21768.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika