Optimizing Sequential Forward Selection on Classification using Genetic Algorithm
DOI:
https://doi.org/10.31449/inf.v46i9.4964Abstract
Regarding the digital transformation of modern technologies, the amount of data increases significantly resulting in novel knowledge discovery techniques in Data Analytic and Data Mining. These data usually consist of noises or non-informative features which affect the analysis results. The features-eliminating approaches have been studied extensively in the past few decades name feature selection. It is a significant preprocessing step of the mining process, which selects only the informative features from the original feature set. These selected features improve the learning model efficiency. This study proposes a forward sequential feature selection method called Forward Selection with Genetic Algorithm (FS-GA). FS-GA consists of three major steps. First, it creates the preliminarily selected subsets. Second, it provides an improvement on the previous subsets. Third, it optimizes the selected subset using the genetic algorithm. Hence, it maximizes the classification accuracy during the feature addition. We performed experiments based on ten standard UCI datasets using three popular classification models including the Decision Tree, Naive Bayes, and K-Nearest Neighbour classifiers. The results are compared with the state-of-the-art methods. FS-GA has shown the best results against the other sequential forward selection methods for all the tested datasets with O(n2) time complexity.References
Zeng, Z., Zhang, H, Zhang, R. and Zhang, Y. (2014). Hybrid Feature Selection Method based on Rough Conditional Mutual Information and Naïve Bayesian Classifier, Hindawi Publishing Corporation, ISRN Applied Mathematics.
https://doi.org/10.1155/2014/382738
Somol, P., Pudil, P. and Kittler, J. (2004). Fast Branch & Bound Algorithms for Optimal Feature Selection, IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(7), pp. 900-912.
https://doi.org/10.1109/tpami.2004.28
Nakariyakul, S. and Casasent, D. P. (2007). Adaptive branch and bound algorithm for selecting optimal features, Pattern Recognition Letters, 28, pp. 1415-1427.
Cai, J., Luo, J., Wang, S. and Yang, S. (2018). Feature selection in machine learning: A new perspective, Neurocomputing, pp. 70-79.
https://doi.org/10.1016/j.neucom.2017.11.077
Chandrashekar, G. and Sahin, F. (2014). A survey on feature selection methods, Computers and Electrical Engineering, 40, pp. 16-28.
Sutha, K. and Tamilselvi, J. J. (2015). A Review of Feature Selection Algorithms for Data Mining Techniques, International Journal on Computer Science and Engineering (IJCSE), pp. 63-67.
Jovic, A., Brkic, K. and Bogunovic, N. (2015). A review of feature selection methods with applications, International Convention on Information and Communication Technology.
Pudil, P., Novovicova, J. and Kittler, J. (1994). Floating search methods in feature selection, Pattern Recognition Letters, pp. 1119-1125.
https://doi.org/10.1016/0167-8655(94)90127-9
Pavya, K. and Srinivasan, B. (2017). Feature Selection Techniques in Data Mining: A Study, International Journal of Scientific Development and Research (IJSDR), 2(6), pp. 594-598.
A. W. Whitney. (1971). A Direct Method of Nonparametric Measurement Selection, IEEE Transactions on Computers, pp. 1100-1103.
https://doi.org/10.1109/t-c.1971.223410
Somol, P., Pudil, P., Novovicova, J. and Paclik P. (1999). Adaptive floating search methods in feature selection, Pattern Recognition Letters, pp. 1157-1163.
Nakariyakul, S. and Casasent, D. P. (2009). An improvement on floating search algorithms for feature subset selection, Pattern Recognition, pp. 1932-1940.
Lv, J., Peng, Q. and Sun, Z. (2015). A modified sequential deep floating search algorithm for feature selection, International Conference on Information and Automation, pp. 2988-2933.
Pudil, P., Ferri, F. J., Novovicova, J. and Kittler, J. (1994). Floating Search Methods for Feature Selection with Nonmonotonic Criterion Functions, Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 279-283.
https://doi.org/10.1109/icpr.1994.576920
Chotchantarakun, K. and Sornil, O. (2021). An Adaptive Multi-levels Sequential Feature Selection, International Journal of Computer Information Systems and Industrial Management Applications (IJCISIM), 13, pp. 010-019.
Chotchantarakun, K. and Sornil, O. (2021). Adaptive Multi-level Backward Tracking for Sequential Feature Selection, Journal of ICT Research and Applications, 15, pp. 1-20.
https://doi.org/10.5614/itbj.ict.res.appl.2021.15.1.1
Holland, J. H. (1992). Adaptation in Natural and Artificial Systems: An Introductory Analysis with Applications to Biology, Control, and Artificial Intelligence, MIT Press: Cambridge, UK.
El-Shafiey, M. G., Hagag, A., El-Dahshan, E. A. and Ismail, M. A. (2022). A hybrid GA and PSO optimized approach for heart-disease prediction based on random forest, Multimedia Tools and Applications, 81, pp. 18155-18179.
Homsapaya, K. and Sornil, O. (2017). Improving Floating Search Feature Selection using Genetic Algorithm, Journal of ICT Research and Applications, 11(3), pp. 299-317.
Ileberi, E., Sun, Y. and Wang, Z. (2022). A machine learning based credit card fraud detection using the GA algorithm for feature selection, Journal of Big Data, 9(24).
https://doi.org/10.1186/s40537-022-00573-8
Aswal, S., Jyothi, A. and Mehra, R. (2023). Feature Selection Method Based on Honeybee-SMOTE for Medical Data Classification. Informatica, 46(9), pp. 111-118. https://doi.org/10.31449/inf.v46i9.4098
Alija, S., Beqiri, E., Gaafar, A. S. and Hamoud, A. K. (2023). Predicting Students Performance Using Supervised Machine Learning Based on Imbalanced Dataset and Wrapper Feature Selection. Informatica,47(1), pp. 11-20
https://doi.org/10.31449/inf.v47i1.4519
Al-jadir, I., Wong, K. W., Fung, C. C. and Xie, H. (2017). Text Document Clustering Using Memetic Feature Selection, Proceedings of the 9th International Conference on Machine Learning and Computing (ICMLC), pp. 415-420.
https://doi.org/10.1145/3055635.3056603
Panda, D., Panda, D., Dash, S. R. and Parida, S. (2021). Extreme Learning Machines with Feature Selection Using GA for Effective Prediction of Fetal Heart Disease: A Novel Approach. Informatica, 45(3), pp. 381-392.
https://doi.org/10.31449/inf.v45i3.3223
Dua, C. Graff. (2019). UCI Machine Learning Repository (http://archive.ics.uci.edu/ml), Irvine, CA: University of California, School of Information and Computer Science.
https://archive.ics.uci.edu/datasets
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika