Feature Selection Method Based on Honeybee-SMOTE for Medical Data Classification
DOI:
https://doi.org/10.31449/inf.v46i9.4098Abstract
Bio-Medical data analysis has an important role in clinical practices. Usually, bio-medical data have complex issues like skeweedness, redundant and irrelevant attributes etc.. Several redundant and unrelated features frequently degrade the accuracy of the classifier while using with imbalanced datasets. The selection of features becomes critical in this situation. The key goal of feature selection is to establish a feature subspace that maintains classifier accuracy even as reducing the excessive computational learning cost and casting off noise. Appropriate feature selection approaches are highly dependent on their ability to match the issue context and uncover fundamental patterns within the data. This study’s main goal is to construct a disease detection model that uses a hybrid feature-selection strategy based on Honeybee-SMOTE and classification using the c4.5 algorithm. The empirical results establish the suggested hybrid methodology's superiority over competing methods regarding the accuracy parameter, precision-parameter, recall-parameter, f1-score parameter and G-Mean parameter. The statistical analysis of the collected findings demonstrates that the suggested hybrid method outperforms and is competitive with existing state-of-the-art algorithms.References
Abbass, H. A. (2001) 'MBO: Marriage in honey bees optimization a haplometrosis polygynous swarming approach', Proceedings of the IEEE Conference on Evolutionary Computation, ICEC, 1, pp. 207–214. doi: 10.1109/cec.2001.934391.
Abbass, H. A. H. (2001) 'A monogenous MBO approach to satisfiability', Proceeding of the international conference on computational intelligence for modelling, control and automation, CIMCA, (October 2001). Available at: https://www.researchgate.net/publication/2481231_A_Monogenous_MBO_Approach_to_Satisfiability.
Adamu, A. et al. (2021) 'An hybrid particle swarm optimization with crow search algorithm for feature selection', Machine Learning with Applications. Elsevier Ltd., 6(July), p. 100108. doi: 10.1016/j.mlwa.2021.100108.
Aljarah, I. et al. (2018) 'Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm', Cognitive Computation. Cognitive Computation, 10(3), pp. 478–495. doi: 10.1007/s12559-017-9542-9.
Arora, S. and Anand, P. (2019) 'Binary butterfly optimization approaches for feature selection', Expert Systems with Applications. Elsevier Ltd, 116, pp. 147–160. doi: 10.1016/j.eswa.2018.08.051.
Bunkhumpornpat, C., Sinapiromsaran, K. and Lursinsap, C. (2009) 'Safe-level-SMOTE: Safe-level-synthetic minority over-sampling technique for handling the class imbalanced problem', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5476 LNAI, pp. 475–482. doi: 10.1007/978-3-642-01307-2_43.
Chawla, N. V. et al. (2002) 'snopes.com: Two-Striped Telamonia Spider', Journal of Artificial Intelligence Research, 16(Sept. 28), pp. 321–357. Available at: https://arxiv.org/pdf/1106.1813.pdf%0Ahttp://www.snopes.com/horrors/insects/telamonia.asp.
Chen, B. et al. (2021) 'RSMOTE: A self-adaptive robust SMOTE for imbalanced problems with label noise', Information Sciences. Elsevier Inc., 553, pp. 397–428. doi: 10.1016/j.ins.2020.10.013.
Engelbrecht, A. P., Grobler, J. and Langeveld, J. (2019) 'Set based particle swarm optimization for the feature selection problem', Engineering Applications of Artificial Intelligence. Elsevier Ltd, 85(July), pp. 324–336. doi: 10.1016/j.engappai.2019.06.008.
Fayyad, U. and Stolorz, P. (1997) 'Data mining and KDD: Promise and challenges', Future Generation Computer Systems, 13(2–3), pp. 99–115. doi: 10.1016/s0167-739x(97)00015-0.
Haddad, O. B., Afshar, A. and Mariňo, M. A. (2011) 'Multireservoir optimization in discrete and continuous domains', Proceedings of the Institution of Civil Engineers: Water Management, 164(2), pp. 57–72. doi: 10.1680/wama.900077.
Han, H., Wang, W. Y. and Mao, B. H. (2005) 'Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning', Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 3644 LNCS, pp. 878–887. doi: 10.1007/11538059_91.
Hegazy, A. E., Makhlouf, M. A. and El-Tawel, G. S. (2020) 'Improved salp swarm algorithm for feature selection', Journal of King Saud University - Computer and Information Sciences. King Saud University, 32(3), pp. 335–344. doi: 10.1016/j.jksuci.2018.06.003.
Holmes, J. H. (2013) Knowledge Discovery in Biomedical Data: Theory and Methods. Error, Methods in Biomedical Informatics: A Pragmatic Approach. Error. Elsevier Inc. doi: 10.1016/B978-0-12-401678-1.00007-5.
Kumar, L. and Bharti, K. K. (2019) An improved BPSO algorithm for feature selection, Lecture Notes in Electrical Engineering. Springer Singapore. doi: 10.1007/978-981-13-2685-1_48.
Mafarja, M. and Mirjalili, S. (2018) 'Whale optimization approaches for wrapper feature selection', Applied Soft Computing. Elsevier B.V., 62, pp. 441–453. doi: 10.1016/j.asoc.2017.11.006.
Marinaki, M., Marinakis, Y. and Zopounidis, C. (2010) 'Honey Bees Mating Optimization algorithm for financial classification problems', Applied Soft Computing Journal. Elsevier B.V., 10(3), pp. 806–812. doi: 10.1016/j.asoc.2009.09.010.
Remeseiro, B. and Bolon-Canedo, V. (2019) 'A review of feature selection methods in medical applications', Computers in Biology and Medicine. MIPRO, 112(May), pp. 25–29. doi: 10.1016/j.compbiomed.2019.103375.
Rodrigues, D. et al. (2014) 'A wrapper approach for feature selection based on Bat Algorithm and Optimum-Path Forest', Expert Systems with Applications. Elsevier Ltd, 41(5), pp. 2250–2258. doi: 10.1016/j.eswa.2013.09.023.
Sayed, G. I., Hassanien, A. E. and Azar, A. T. (2019) 'Feature selection via a novel chaotic crow search algorithm', Neural Computing and Applications. Neural Computing and Applications, 31(1), pp. 171–188. doi: 10.1007/s00521-017-2988-6.
Speiser, J. L. (2021) 'A random forest method with feature selection for developing medical prediction models with clustered and longitudinal data', Journal of Biomedical Informatics. Elsevier Inc., 117(March), p. 103763. doi: 10.1016/j.jbi.2021.103763.
Tubishat, M. et al. (2021) 'Dynamic Salp swarm algorithm for feature selection', Expert Systems with Applications. Elsevier Ltd, 164(November 2019), p. 113873. doi: 10.1016/j.eswa.2020.113873.
Vieira, S. M. et al. (2013) 'Modified binary PSO for feature selection using SVM applied to mortality prediction of septic patients', Applied Soft Computing Journal, 13(8), pp. 3494–3504. doi: 10.1016/j.asoc.2013.03.021.
Wang, K. J. et al. (2014) 'A hybrid classifier combining SMOTE with PSO to estimate 5-year survivability of breast cancer patients', Applied Soft Computing Journal. Elsevier B.V., 20, pp. 15–24. doi: 10.1016/j.asoc.2013.09.014.
Zawbaa, H. M. et al. (2018) 'Large-dimensionality small-instance set feature selection: A hybrid bio-inspired heuristic approach', Swarm and Evolutionary Computation. Elsevier B.V., 42(February), pp. 29–42. doi: 10.1016/j.swevo.2018.02.021.
Chen, C. W., Tsai, Y. H., Chang, F. R., & Lin, W. C. (2020). Ensemble feature selection in medical datasets: Combining filter, wrapper, and embedded feature selection results. Expert Systems, 37(5), e12553.
Rostami, M., Forouzandeh, S., Berahmand, K., & Soltani, M. (2020). Integration of multi-objective PSO based feature selection and node centrality for medical datasets. Genomics, 112(6), 4370-4384.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika