Optimizing Network Intrusion Detection Systems Through Ensemble Learning and Feature Selection Using the CIC-IDS2017 Dataset
Abstract
The increasing complexity of cyber threats demands high-performance Network Intrusion Detection Systems (NIDS) that are both accurate and efficient. This study presents an optimized NIDS framework combining feature selection with ensemble learning. Experiments were performed on the CIC-IDS2017 datasetusing a stratified train/test split of 70/30. Feature selection methods included Information Gain (24 features), Chi-square (χ2, 25 features), and Principal Component Analysis (PCA, 20 features). Bagging classifiers (Random Forest, Extra Trees, Bagged Decision Tree) and boosting classifiers (XGBoost, Gradient Boosting, LightGBM, AdaBoost, CatBoost) were evaluated. Using Information Gain selecting 24 features, Extra Trees achieved 99.98% accuracy with near-perfect precision, recall, and F1-score, and extremely low false positive and false negative rates of 0.0001397 and 0.0002597, respectively. Boosting-based models demonstrated superior sensitivity for minority attack classes, improving performance under imbalanced conditions. These results indicate that integrating feature selection with diverse ensemble techniques produces a scalable, interpretable, and highly effective NIDS suitable for practical cybersecurity applications.References
M. M. Issa, M. Aljanabi, and H. M.
Muhialdeen, “Systematic literature re-
view on intrusion detection systems:
Research trends,
algorithms,
meth-
ods, datasets, and limitations,” Jour-
nal of Intelligent Systems, vol. 33,
no. 1, p. 20230248, 2024. DOI: https:
//doi.org/10.1515/jisys-2023-0248.
Vanin, P., Newe, T., Dhirani, L. L.,
O’Connell, E., O’Shea, D., Lee, B., and Rao,
M, “A study of network intrusion detection
systems using artificial intelligence/machine
learning,” Applied Sciences, vol. 12, no. 22,
p. 11752, 2022. DOI:https://doi.org/10.
/app122211752.
A. Khraisat, I. Gondal, P. Vamplew, and J.
Kamruzzaman, “Survey of intrusion detec-
tion systems: Techniques, datasets, and chal-
lenges,” Cybersecurity, vol. 2, no. 1, pp. 1–
, 2019. DOI: https://doi.org/10.1186/
s42400-019-0038-7.
D. R. Patil and T. M. Pattewar, “Major-
ity voting and feature selection based net-
work intrusion detection system,” EAI En-
dorsed Transactions on Scalable Information
Systems, vol. 9, no. 6, 2022. DOI: https://
doi.org/10.4108/eai.4-4-2022.173780.
N. G. Relan and D. R. Patil, “Implementa-
tion of network intrusion detection system
using variant of decision tree algorithm,” in
International Conference on Nascent
Technologies in the Engineering Field (IC-
NTE), pp. 1–5, 2015.
Cisco Cyber Threat Trends Report
[Online]. Available: https://www.
cisco.com/c/en/us/products/security/
cyber-threat-trends-report.html
Checkpoint 2024 Cyber Security Report.
[Online]. Available:
https://engage.
checkpoint.com/quantum-force-ppc
Ahmad, Z., Shahid Khan, A., Wai Shiang,
C., Abdullah, J.,and Ahmad, F. , “Network
intrusion detection system: A systematic
study of machine learning and deep learn-
ing approaches,” Transactions on Emerging
Telecommunications Technologies, vol. 32,
no. 1, p. e4150, 2021. DOI: https://doi.
org/10.1002/ett.4150.
J. O. Mebawondu, O. D. Alowolodu, J.
O. Mebawondu, and A. O. Adetunmbi,
“Network intrusion detection system using supervised learning paradigm,” Scientific
African, vol. 9, p. e00497, 2020. DOI:
https://ui.adsabs.harvard.edu/link_
gateway/2020SciAf...900497M/doi:
1016/j.sciaf.2020.e00497.
J. Ghadermazi, A. Shah, and N. D. Bas-
tian, “Towards real-time network intrusion
detection with image-based sequential pack-
ets representation,” IEEE Transactions on
Big Data, 2024. DOI: https://doi.org/10.
/TBDATA.2024.3403394.
R. Vinayakumar, K. P. Soman, and P. Poor-
nachandran, “A comparative analysis of deep
learning approaches for network intrusion
detection systems (N-IDSs): Deep learn-
ing for N-IDSs,” International Journal of
Digital Crime and Forensics (IJDCF), vol.
, no. 3, pp. 65–89, 2019. DOI: DOI:
4018/IJDCF.2019070104.
Sarvari, S., Sani, N. F. M., Hanapi, Z. M.,
and Abdullah, M. T. , “An efficient anomaly
intrusion detection method with feature se-
lection and evolutionary neural network,”
IEEE Access, vol. 8, pp. 70651–70663, 2020.
DOI: 10.1109/ACCESS.2020.2986217.
Duhayyim, M. A., Alissa, K. A., Alrayes,
F. S., Alotaibi, S. S., Tag El Din, E. M.,
Abdelmageed, A. A., and Motwakel, A. ,
“Evolutionary-based deep stacked Autoen-
coder for intrusion detection in a cloud-
based cyber-physical system,” Applied Sci-
ences, vol. 12, no. 14, p. 6875, 2022. DOI:
https://doi.org/10.3390/app12146875.
Dini, P., Elhanashi, A., Begni, A., Saponara,
S., Zheng, Q., and Gasmi, K. , “Overview on
intrusion detection systems design exploit-
ing machine learning for networking cyber-
security,” Applied Sciences, vol. 13, no. 13,
p. 7507, 2023. DOI: https://doi.org/10.
/app13137507.
Su, T., Sun, H., Zhu, J., Wang, S., and
Li, Y. , “BAT: Deep learning methods on
network intrusion detection using NSL-KDD
dataset,” IEEE Access, vol. 8, pp. 29575–
, 2020. DOI: https://doi.org/10.
/ACCESS.2020.2972627.
Stiawan, D., Idris, M. Y. B., Bamhdi, A. M.,
and Budiarto, R. , “CICIDS-2017 dataset
feature analysis with information gain for
anomaly detection,” IEEE Access, vol. 8, pp.
–132921, 2020. DOI: https://doi.
org/10.1109/ACCESS.2020.3009843.
G. Liu and J. Zhang, “CNID: Research of
network intrusion detection based on convo-
lutional neural network,” Discrete Dynamics
in Nature and Society, vol. 2020, no. 1, p.
, 2020. DOI: https://doi.org/10.
/2020/4705982.
A. S. Jaradat, M. M. Barhoush, and R.
B. Easa, “Network intrusion detection sys-
tem: Machine learning approach,” Indone-
sian Journal of Electrical Engineering and
Computer Science, vol. 25, no. 2, pp. 1151–
, 2022.
Alissa, K. A., Alotaibi, S. S., Alrayes, F.
S., Aljebreen, M., Alazwari, S., Alshahrani,
H., and Motwakel, A. , “Crystal structure
optimization with deep-Autoencoder-based
intrusion detection for secure internet of
drones environment,” Drones, vol. 6, no. 10,
p. 297, 2022. DOI: https://doi.org/10.
/drones6100297.
Toldinas, J., Venčkauskas, A., Damaševičius,
R., Grigaliūnas, Š. Morkevičius, N., and
Baranauskas, E. , “A novel approach for
network intrusion detection using multi-
stage deep learning image recognition,”
Electronics, vol. 10, no. 15, p. 1854,
DOI: https://doi.org/10.3390/
electronics10151854.
Fatani, A., Abd Elaziz, M., Dahou, A.,
Al-Qaness, M. A., and Lu, S. , “IoT in-
trusion detection system using deep learn-
ing and enhanced transient search optimiza-
tion,” IEEE Access, vol. 9, pp. 123448–
, 2021. DOI: https://doi.org/10.
/ACCESS.2021.3109081.
A. Chiche and M. Meshesha, “Towards a
scalable and adaptive learning approach for
network intrusion detection,” Journal of
Computer Networks and Communications,
vol. 2021, no. 1, p. 8845540, 2021. DOI:
https://doi.org/10.1155/2021/8845540.
Zivkovic, M., Tair, M., Venkatachalam, K.,
Bacanin, N., Hubálovský, Š., and Trojovský,
P. , “Novel hybrid firefly algorithm: An ap-
plication to enhance XGBoost tuning for in-
trusion detection classification,” PeerJ Com-
puter Science, vol. 8, p. e956, 2022. DOI:
https://doi.org/10.7717/peerj-cs.956.
E. S. A. Alars and S. Kurnaz, “Enhanc-
ing network intrusion detection systems with
combined network and host traffic features
using deep learning: Deep learning and IoT
perspective,” Discover Computing, vol. 27,
no. 1, p. 39, 2024. DOI: https://doi.org/
1007/s10791-024-09480-3.
M. Sajid, K. R. Malik, A. Almogren, T.
S. Malik, A. H. Khan, J. Tanveer, and A.
U. Rehman, “Enhancing intrusion detection:
A hybrid machine and deep learning ap-
proach,” Journal of Cloud Computing, vol.
, no. 1, p. 123, 2024. DOI: https://doi.
org/10.1186/s13677-024-00685-x.
A. Shiravani, M. H. Sadreddini, and H. N.
Nahook, “Network intrusion detection us-
ing data dimensions reduction techniques,”
Journal of Big Data, vol. 10, no. 1, p.
, 2023. DOI: https://doi.org/10.1186/
s40537-023-00697-5.
Ayantayo, A., Kaur, A., Kour, A., Schmoor,
X., Shah, F., Vickers, I.,and Abdelsamea,
M. M., “Network intrusion detection us-
ing feature fusion with deep learning,”
Journal of Big Data, vol. 10, no. 1,
p. 167, 2023. DOI: https://doi.org/10.
/s40537-023-00834-0.
C. Xi, H. Wang, and X. Wang, “A
novel multi-scale network intrusion detection
model with transformer,” Scientific Reports,
vol. 14, no. 1, p. 23239, 2024. DOI :https:
//doi.org/10.1038/s41598-024-74214-w.
Y. Gu, K. Li, Z. Guo, and Y. Wang,
“Semi-supervised K-means DDoS detection
method using hybrid feature selection al-
gorithm,” IEEE Access, vol. 7, pp. 64351–
, 2019. DOI: https://doi.org/10.
/ACCESS.2019.2917532.
Mohamed, H. G., Alrowais, F., Al-Hagery,
M. A., Al Duhayyim, M., Hilal, A. M.,
and Motwakel, A., “Optimal Wavelet Neu-
ral Network-Based Intrusion Detection in
Internet of Things Environment,” Comput-
ers, Materials & Continua, vol. 75, no.
, 2023. DOI: https://doi.org/10.32604/
cmc.2023.036822.
F. Wei, H. Li, Z. Zhao, and H. Hu, “XNIDS:
Explaining Deep Learning-based Network In-
trusion Detection Systems for Active Intru-
sion Responses,” in 32nd USENIX Secu-
rity Symposium (USENIX Security 23), pp.
–4354, 2023.
Scikit-learn
Documentation
on
Fea-
ture
Selection,
[Online].
Available:
https://scikit-learn.org/stable/
modules/feature_selection.html.
[Ac-
cessed: Nov. 25, 2024].
D. R. Patil, “A framework for malicious do-
main names detection using feature selec-
tion and majority voting approach,” Infor-
matica, vol. 48, no. 3, 2024. DOI: https:
//doi.org/10.31449/inf.v48i3.5824.
D. R. Patil and J. B. Patil, “Malicious web
pages detection using feature selection tech-
niques and machine learning,” Int. J. High
Perform. Comput. Networking, vol. 14, no. 4,
pp. 473–488, 2019. DOI: https://doi.org/
1504/IJHPCN.2019.102355.
Qu K, Xu J, Hou Q, Qu K, Sun Y. Fea-
ture selection using Information Gain and
decision information in neighborhood deci-
sion system. Applied Soft Computing. 2023
Mar 1;136:110100. DOI: https://doi.org/
1016/j.asoc.2023.110100.
Prasetiyo B, Muslim MA, Baroroh N. Eval-
uation of feature selection using information
gain and gain ratio on bank marketing clas-
sification using naı̈ve bayes. In Journal of
physics: conference series 2021. Jun 1 (Vol.
, No. 4, p. 042153). IOP Publishing.
DOI: 10.1088/1742-6596/1918/4/042153.
Zhai Y, Song W, Liu X, Liu L, Zhao X.
A chi-square statistics based feature selec-
tion method in text classification. In 2018
IEEE 9th International conference on soft-
ware engineering and service science (IC-
SESS) 2018. Nov 23 (pp. 160-163). IEEE.
Scikit-learn Documentation on Chi-square
Feature Selection,
[Online]. Available:
https://scikit-learn.org/stable/
modules/feature_selection.html#chi2.
[Accessed: Nov. 25, 2024].
I. T. Jolliffe and J. Cadima, “Principal Com-
ponent Analysis: A Review and Recent De-
velopments,” Philosophical Transactions of
the Royal Society A: Mathematical, Physi-
cal and Engineering Sciences, vol. 374, no.
, pp. 20150202, Apr. 2016. DOI: https:
//doi.org/10.1098/rsta.2015.0202.
H. Abdi and L. J. Williams, “Principal Com-
ponent Analysis,” Wiley Interdisciplinary
Reviews: Computational Statistics, vol. 2,
no. 4, pp. 433–459, July 2010.
Scikit-learn Documentation on PCA, [On-
line]. Available: https://scikit-learn.
org/stable/modules/generated/
sklearn.decomposition.PCA.html.
[Ac-
cessed: Nov. 25, 2024].
F. Pedregosa et al., “Scikit-learn: Machine
Learning in Python,” Journal of Machine
Learning Research, vol. 12, pp. 2825–2830,
Oct. 2011.
D. R. Patil and J. B. Patil, “Malicious URLs
detection using decision tree classifiers and
majority voting technique,” Cybernetics and
Inf. Technol., vol. 18, no. 1, pp. 11–29, 2018.
DOI: 10.2478/cait-2018-0002.
L. Breiman, “Bagging predictors,” Machine
Learning, vol. 24, no. 2, pp. 123–140, 1996.
P. Geurts, D. Ernst, and L. Wehenkel, “Ex-
tremely Randomized Trees,” Machine Learn-
ing, vol. 63, no. 1, pp. 3–42, Apr. 2006.
L. Breiman, “Random forests,” Machine
Learning, vol. 45, no. 1, pp. 5–32, 2001.
Y. Freund and R. E. Schapire, “A decision-
theoretic generalization of on-line learning
and an application to boosting,” in Proceed-
ings of the Second European Conference on
Computational Learning Theory, pp. 23–37,
Springer, 1995.
T. Chen and C. Guestrin, “XGBoost: A scal-
able tree boosting system,” in Proceedings of
the 22nd ACM SIGKDD International Con-
ference on Knowledge Discovery and Data
Mining, pp. 785–794, ACM, 2016.
A. V. Dorogush, V. Ershov, and A. Gulin,
“CatBoost: A high-performance gradient
boosting library,” in Proceedings of the 2018
Data Mining and Knowledge Discovery Con-
ference, pp. 1–10, 2018.
J. H. Friedman, “Greedy function approxi-
mation: A gradient boosting machine,” The
Annals of Statistics, vol. 29, no. 5, pp. 1189–
, 2001.
Ke, G., Meng, Q., Finley, T., Wang, T., and
Yang, W. , “LightGBM: A highly efficient
gradient boosting decision tree,” in Proceed-
ings of the 31st Conference on Neural Infor-
mation Processing Systems, pp. 3146–3154,
I. Sharafaldin, A. H. Lashkari, and
A. A. Ghorbani, “Toward Generating a
New Intrusion Detection Dataset and Intru-
sion Traffic Characterization,” in Proc. 4th
Int. Conf. Information Systems Security and
Privacy (ICISSP), Funchal, Portugal, 2018,
pp. 108–116.
Canadian Institute for Cybersecurity, “CI-
CIDS2017 Dataset,” [Online]. Available:
https://www.unb.ca/cic/datasets/
ids-2017.html. [Accessed: Nov. 25, 2024].
Kaggle, “CICIDS2017 Dataset for In-
trusion Detection,” [Online]. Available:
https://www.kaggle.com/datasets/
ishadss/cicids2017. [Accessed: Nov. 25,
.
A. H. Lashkari, M. S. Mamun, and
A. A. Ghorbani, “Characterization of Tor
Traffic Using Time Based Features,” in Proc.
rd Int. Conf. Information Systems Secu-
rity and Privacy (ICISSP), Porto, Portugal,
, pp. 253–262.
M. Sokolova and G. Lapalme, “A systematic
analysis of performance measures for clas-
sification tasks,” Information Processing & Management, vol. 45, no. 4, pp. 427–437,
Jul. 2009. DOI: https://doi.org/10.1016/
j.ipm.2009.03.002.
DOI:
https://doi.org/10.31449/inf.v49i4.7678Downloads
Additional Files
Published
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika







