A Hybrid Feature Selection Based on Fisher score and SVM-RFE for Microarray Data
DOI:
https://doi.org/10.31449/inf.v48i1.4759Abstract
In the last two decades, analyzing microarray data plays a critical role in disease diagnosis and identification of different tumors. However, it is difficult to classify microarray data because of the curse of the dimensionality problem, in which the number of features is huge while the number of samples is small. Thus, dimension reduction techniques, such as feature selection methods, play a vital role in eliminating non-informative features and enhancing cancer classification. In this paper, we propose a Filter-embedded hybrid feature selection method for the gene selection problem. First, the proposed method selects the top-ranked features obtained from the Fisher score to provide a candidate subset for the embedded stage. Second, Support Vector Machine-Recursive Feature Elimination (SVM-RFE) applies to the candidate subset to find the optimal subset. We assess the performance of our proposed method over ten high-dimensional microarray datasets. The results reveal that the proposed method enhances the classification accuracy, reduces the number of selected features, and decreases computational time.References
Muhammed Abd-Elnaby, Marco Alfonse,
and Mohamed Roushdy. Classification of
breast cancer using microarray gene expression
data: A survey. Journal of Biomedical
Informatics, 117:103764, 2021.
Russul Alanni, Jingyu Hou, Hasseeb Azzawi,
and Yong Xiang. A novel gene selection algorithm
for cancer classification using microarray
datasets. BMC medical genomics, 12(1):
–12, 2019.
Nada Almugren and Hala Alshamlan. A survey
on hybrid feature selection methods in
microarray gene expression data for cancer
classification. IEEE Access, 7:78533–78548,
Talal Almutiri and Faisal Saeed. Chi square
and support vector machine with recursive
feature elimination for gene expression data
classification. In 2019 First International
Conference of Intelligent Computing and Engineering
(ICOICE), pages 1–6. IEEE, 2019.
Veronica Bolon-Canedo and Amparo
Alonso-Betanzos. Microarray Bioinformatics.
Springer, 2019.
Veronica Bolon-Canedo, Noelia S´anchez-
Marono, Amparo Alonso-Betanzos,
Jos´e Manuel Ben´ıtez, and Francisco Herrera.
A review of microarray datasets and applied
feature selection methods. Information
Sciences, 282:111–135, 2014.
Zhipeng Cai, Randy Goebel, Mohammad R
Salavatipour, and Guohui Lin. Selecting dissimilar
genes for multi-class classification, an
application in cancer subtyping. BMC bioinformatics,
(1):1–15, 2007.
Hakan Gunduz. An efficient dimensionality
reduction method using filter-based feature
selection and variational autoencoders on
parkinson’s disease classification. Biomedical
Signal Processing and Control, 66:102452,
Isabelle Guyon, Jason Weston, Stephen
Barnhill, and Vladimir Vapnik. Gene selection
for cancer classification using support
vector machines. Machine learning, 46(1):
–422, 2002.
Hind Hamla and Khadoudja Ghanem. Comparative
study of embedded feature selection
methods on microarray data. In IFIP International
Conference on Artificial Intelligence
Applications and Innovations, pages 69–77.
Springer, 2021.
Xiaojuan Huang, Li Zhang, Bangjun Wang,
Fanzhang Li, and Zhao Zhang. Feature clustering
based support vector machine recursive
feature elimination for gene selection.
Applied Intelligence, 48(3):594–607, 2018.
Hengxun Li, Wei Guo, Guoying Wu, and
Yanxia Li. A rf-pso based hybrid feature selection
model in intrusion detection system.
In 2018 IEEE Third International Conference
on Data Science in Cyberspace (DSC),
pages 795–802. IEEE, 2018.
Zifa Li, Weibo Xie, and Tao Liu. Efficient
feature selection and classification for microarray
data. PloS one, 13(8):e0202167,
Huijuan Lu, Junying Chen, Ke Yan, Qun Jin,
Yu Xue, and Zhigang Gao. A hybrid feature
selection algorithm for gene expression
data classification. Neurocomputing, 256:56–
, 2017.
Shruti Mishra and Debahuti Mishra. Svmbt-
rfe: An improved gene selection framework
using bayesian t-test embedded in support
vector machine (recursive feature elimination)
algorithm. Karbala International
Journal of Modern Science, 1(2):86–96, 2015.
Piyushkumar A Mundra and Jagath C Rajapakse.
Svm-rfe with mrmr filter for gene
selection. IEEE transactions on nanobioscience,
(1):31–37, 2009.
Akshata Naik, Venkatanareshbabu Kuppili,
and Damodar Reddy Edla. Binary dragonfly
algorithm and fisher score based hybrid
feature selection adopting a novel fitness
function applied to microarray data.
In 2019 International Conference on Applied
Machine Learning (ICAML), pages 40–43.
IEEE, 2019.
Salima Ouadfel and Mohamed Abd Elaziz.
Efficient high-dimension feature selection
based on enhanced equilibrium optimizer.
Expert Systems with Applications, 187:
, 2022.
Beatriz Remeseiro and Veronica Bolon-
Canedo. A review of feature selection methods
in medical applications. Computers in
biology and medicine, 112:103375, 2019.
Zohre Sadeghian, Ebrahim Akbari, and Hossein
Nematzadeh. A hybrid feature selection
method based on information theory
and binary butterfly optimization algorithm.
Engineering Applications of Artificial Intelligence,
:104079, 2021.
Alok Kumar Shukla. Multi-population adaptive
genetic algorithm for selection of microarray
biomarkers. Neural Computing and
Applications, 32(15):11897–11918, 2020.
Alok Kumar Shukla, Pradeep Singh, and
Manu Vardhan. A hybrid gene selection
method for microarray recognition. Biocybernetics
and Biomedical Engineering, 38(4):
–991, 2018.
Alok Kumar Shukla, Diwakar Tripathi,
B Ramachandra Reddy, and D Chandramohan.
A study on metaheuristics approaches
for gene selection in microarray
data: algorithms, applications and open
challenges. Evolutionary Intelligence, 13(3):
–329, 2020.
Mervyn Stone. Cross-validatory choice and
assessment of statistical predictions. Journal
of the royal statistical society: Series B
(Methodological), 36(2):111–133, 1974.
Lin Sun, Xiao-Yu Zhang, Yu-Hua Qian, Jiu-
Cheng Xu, Shi-Guang Zhang, and Yun Tian.
Joint neighborhood entropy-based gene selection
method with fisher score for tumor
classification. Applied Intelligence, 49(4):
–1259, 2019.
J Yang, YL Liu, CS Feng, and GQ Zhu. Applying
the fisher score to identify alzheimer’s
disease-related genes. Genet Mol Res, 15(2),
Ge Zhang, Jincui Hou, Jianlin Wang,
Chaokun Yan, and Junwei Luo. Feature selection
for microarray data classification using
hybrid information gain and a modified
binary krill herd algorithm. Interdisciplinary
Sciences: Computational Life Sciences, 12:
–301, 2020.
Huaqing Zhang, Jian Wang, Zhanquan Sun,
Jacek M Zurada, and Nikhil R Pal. Feature
selection for neural networks using group
lasso regularization. IEEE Transactions on
Knowledge and Data Engineering, 32(4):659–
, 2019.
Xue Zhang, Zhiguo Shi, Xuan Liu, and Xueni
Li. A hybrid feature selection algorithm for
classification unbalanced data processsing.
In 2018 IEEE International Conference on
Smart Internet of Things (SmartIoT), pages
–275. IEEE, 2018.
Ying Zhang, Qingchun Deng, Wenbin Liang,
and Xianchun Zou. An efficient feature selection
strategy based on multiple support vector
machine technology with gene expression
data. BioMed research international, 2018,
Yuefeng Zheng, Ying Li, Gang Wang, Yupeng
Chen, Qian Xu, Jiahao Fan, and Xueting
Cui. Retracted: A hybrid feature selection
algorithm for microarray data. Concurrency
and Computation: Practice and Experience,
(12):e4716, 2019.
Zexuan Zhu, Yew-Soon Ong, and Manoranjan
Dash. Markov blanket-embedded genetic
algorithm for gene selection. Pattern Recognition,
(11):3236–3248, 2007.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika