A Framework for Malicious Domain Names Detection using Feature Selection and Majority Voting Approach

Authors

  • Dharmaraj Rajaram Patil R.C.Patel Institute of Technology, Shirpur

DOI:

https://doi.org/10.31449/inf.v48i3.5824

Abstract

As cyber attacks become more sophisticated, identifying and mitigating bad domainnames has become critical to assuring the security of online environments. This paperpresents a framework for detecting malicious domain names using a feature selectionstrategy and a majority vote method. The suggested methodology begins with theextraction of important features from domain names and their related characteristics,followed by a rigorous feature selection procedure to determine the most discriminatingattributes. To accomplish feature selection, a variety of feature selection techniques areused, including chi-square statistics, information gain, gain ratio, and correlation-basedfeature selection, to analyse the value of each characteristic in distinguishing benign andmalicious domain names. In addition, a majority voting strategy is utilised to improvethe detection system’s overall accuracy and reliability by combining the predictions ofdifferent classifiers such as AdaBoost, logistic regression, k-nearest neighbours, naivebayes, and multilayer perceptron. The ensemble of classifiers is trained on the idealfeatures, yielding a complete and robust model capable of accurately recognising mali-cious domain names while minimising false positives. The proposed approach is evalu-ated against real-world examples of harmful domain names. The suggested frameworkemploying Chi-square feature selection and majority voting detects malicious domainnames with an accuracy of 99.44%, precision of 99.44%, recall of 99.44%, and f-measureof 99.44%. The use of feature selection and a majority voting technique improves thesystem’s adaptability and resilience in the face emerging cyber threats.

Author Biography

Dharmaraj Rajaram Patil, R.C.Patel Institute of Technology, Shirpur

Dharmaraj R. Patil received his Master of Engineering in Computer Science and Engineering from the Government College of Engineering, Aurangabad, Maharashtra, India and PhD in Computer Engineering from the Kavayitri Bahinabai Chaudhari North Maharashtra University Jalgaon, Maharashtra, India. He is working as an Assistant Professor in the Computer Engineering Department at R.C. Patel Institute of Technology, Shirpur, Maharashtra, India. He has 20 years of teaching experience. His research interests are web security, intrusion detection and web mining. He has published many papers in international/national conferences and journals.

References

Interisle malicious domain names statistics 4Q 2022. Available

online,https://www.cybercrimeinfocenter. org/malware-landscape-2023.

CSC domain security 2023 report. Available

online, https://www.cscdbs.com/assets/

pdfs/2023-Domain-Security-Report.pdf.

Zhao, Hong, Zhaobin Chang, Guangbin Bao,

and Xiangyan Zeng, Malicious domain names

detection algorithm based on N-gram. Jour-

nal of Computer Networks and Communica-

tions 2019.

Soleymani, Ali, and Fatemeh Arabgol, A

novel approach for detecting DGA-based

botnets in DNS queries using machine learn-ing techniques. Journal of Computer Networks and Communications, 2021, 1–13.

Yang, Luhui, Guangjie Liu, Weiwei Liu,

Huiwen Bai, Jiangtao Zhai, and Yuewei

Dai,Detecting Multielement Algorithmically

Generated Domain Names Based on Adap-

tive Embedding Model, Security and Com-

munication Networks, 2021, 1–20.

Chen, Shaojie, Bo Lang, Yikai Chen, and

Chong Xie, Detection of Algorithmically

Generated Malicious Domain Names with

Feature Fusion of Meaningful Word Segmen-

tation and N-Gram Sequences, Applied Sci-

ences, 13, no. 7,2023, 4406.

Wagan, Atif Ali, Qianmu Li, Zubair Za-

land, Shah Marjan, Dadan Khan Bozdar,

Aamir Hussain, Aamir Mehmood Mirza, and

Mehmood Baryalai, A Unified Learning Ap-

proach for Malicious Domain Name Detec-

tion, Axioms, 12, no. 5, 2023, 458.

Bilge, Leyla, Engin Kirda, Christopher

Kruegel, and Marco Balduzzi, Exposure:

Finding malicious domains using passive

DNS analysis, In Ndss, pp. 1–17, 2011.

Fan, Zhaoshan, Qing Wang, Haoran Jiao,

Junrong Liu, Zelin Cui, Song Liu, and Yuling

Liu, PUMD: a PU learning-based malicious

domain detection framework, Cybersecurity,

, no. 1, 2022, 1–22.

Yang, Luhui, Jiangtao Zhai, Weiwei Liu, Xi-

aopeng Ji, Huiwen Bai, Guangjie Liu, and

Yuewei Dai, Detecting word-based algorith-

mically generated domains using semantic

analysis, Symmetry, 11, no. 2, 2019, 176.

Shi, Yong, Gong Chen, and Juntao Li, Mali-

cious domain name detection based on ex-

treme machine learning, Neural Processing

Letters, 48,2018,1347–1357.

Fu, Yu, Lu Yu, Oluwakemi Hambolu, Ilker

Ozcelik, Benafsh Husain, Jingxuan Sun,

Karan Sapra, Dan Du, Christopher Tate

Beasley, and Richard R. Brooks, Stealthy do-

main generation algorithms, IEEE Transac-

tions on Information Forensics and Security,

, no. 6, 2017, 1430–1443.

Yun, Xiaochun, Ji Huang, Yipeng Wang,

Tianning Zang, Yuan Zhou, and Yongzheng

Zhang, Khaos: An adversarial neural net-

work DGA with high anti-detection ability,

IEEE transactions on information forensics

and security, 15, 2019,, 2225–2240.

Yang, Luhui, Guangjie Liu, Yuewei Dai,

Jinwei Wang, and Jiangtao Zhai, Detecting

stealthy domain generation algorithms using

heterogeneous deep neural network frame-

work, IEEE Access, 8, 2020,82876–82889.

Xu, Congyuan, Jizhong Shen, and Xin Du,

Detection method of domain names gener-

ated by DGAs based on semantic represen-

tation and deep neural network, Computers

& Security, 85, 2019,77–88.

Vinayakumar, R., K. P. Soman, and Praba-

haran Poornachandran, Detecting malicious

domain names using deep learning ap-

proaches at scale, Journal of Intelligent &

Fuzzy Systems, 34, no. 3, 2018,1355–1367.

Yang, Luhui, Guangjie Liu, Jinwei Wang,

Jiangtao Zhai, and Yuewei Dai, A seman-

tic element representation model for mali-

cious domain name detection, Journal of

Information Security and Applications, 66,

,103148.

Marques, Claudio, Benign and malicious do-

mains based on DNS logs, Mendeley Data,

V5, 2021, doi: 10.17632/623sshkdrz.5.

Hall M, Frank E, Holmes G, Pfahringer

B, Reutemann P, Witten IH, The WEKA

data mining software: an update, ACM

SIGKDD explorations newsletter, 2009, Nov

, 11(1),10–8.

Zhai Y, Song W, Liu X, Liu L, Zhao X,

A chi-square statistics based feature selec-

tion method in text classification, In 2018

IEEE 9th International conference on soft-

ware engineering and service science (IC-

SESS), 2018, Nov 23,pp. 160–163, IEEE.

Prasetiyo B, Muslim MA, Baroroh N, Eval-

uation of feature selection using information

gain and gain ratio on bank marketing clas-

sification using Naı̈ve bayes, In Journal of physics: conference series, 2021, Jun 1,Vol. 1918, No. 4, pp. 042153, IOP Publishing.

Qu K, Xu J, Hou Q, Qu K, Sun Y., Fea-

ture selection using Information Gain and de-

cision information in neighborhood decision

system, Applied Soft Computing, 2023, Mar

, 136,110100.

Hall, Mark A., Correlation-based feature se-

lection of discrete and numeric class machine

learning, 2000.

Patil, Dharmaraj R., Tareek M. Patte-

war, Vipul D. Punjabi, and Shailendra M.

Pardeshi, Detecting Fake Social Media Pro-

files Using the Majority Voting Approach,

EAI Endorsed Transactions on Scalable In-

formation Systems,2024.

Schapire RE., Explaining AdaBoost, In Em-

pirical Inference: Festschrift in Honor of

Vladimir N. Vapnik, 201,3 Oct 9, pp. 37–52,.

Berlin, Heidelberg: Springer Berlin Heidel-

berg.

Stoltzfus JC., Logistic regression: a brief

primer, Academic emergency medicine, 2011,

Oct, 18(10), 1099–104.

Peterson LE., K-nearest neighbor, Scholar-

pedia, 2009, Feb 21, 4(2),1883.

Rish, Irina., An empirical study of the naive

Bayes classifier, In IJCAI 2001 workshop on

empirical methods in artificial intelligence,

vol. 3, no. 22, pp. 41–46. 2001.

Tang, Jiexiong, Chenwei Deng, and Guang-

Bin Huang, Extreme learning machine for

multilayer perceptron, IEEE transactions on

neural networks and learning systems, 27, no.

, 2015, 809–821.

Ruta D, Gabrys B., Classifier selection for

majority voting, Information fusion, 2005,

Mar 1, 6(1), 63-81.

Patil, Dharmaraj R., Tareek M. Patte-

war, Vipul D. Punjabi, and Shailendra M.

Pardeshi, Detecting Fake Social Media Pro-

files Using the Majority Voting Approach,

EAI Endorsed Transactions on Scalable In-

formation Systems, 2024.

Patil, Dharmaraj R., and Tareek M. Patte-

war, Majority Voting and Feature Selection

Based Network Intrusion Detection System,

EAI Endorsed Transactions on Scalable In-

formation Systems 9, no. 6,2022: e6-e6.

Patil, Dharmaraj R., Fake news detection us-

ing majority voting technique, arXiv preprint

arXiv:2203.09936, 2022.

Patil, Dharmaraj R., and Jayantro B. Patil,

Malicious URLs detection using decision tree

classifiers and majority voting technique, Cy-

bernetics and Information Technologies 18,

no. 1, 2018: 11-29.

Sokolova M, Lapalme G., A systematic anal-

ysis of performance measures for classifica-

tion tasks, Information processing & man-

agement, 2009, Jul 1, 45(4), 427–37.

Downloads

Published

2024-09-09

How to Cite

Patil, D. R. (2024). A Framework for Malicious Domain Names Detection using Feature Selection and Majority Voting Approach. Informatica, 48(3). https://doi.org/10.31449/inf.v48i3.5824

Issue

Section

Regular papers