Enhanced Cybercrime Detection on Twitter Using Aho-Corasick Algorithm and Machine Learning Techniques

Authors

  • Romil Rawat Shri Vaishnav Vidyapeeth Vishwavidyalaya
  • A Samson Arun Raj School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Tamil Nadu, Coimbatore, India
  • Dr. Rajesh Kumar Chakrawarti Department of Computer Science and Engineering, Sushila Devi Bansal College(SDBC), Bansal Group of Institutions(BGI), Indore(MP) - India
  • Dr. Krishnan Sakthidasan Sankaran Department of ECE, Hindustan Institute of Technology and Science, Chennai, India.
  • Dr. Sanjaya Kumar Sarangi Utkal University, Coordinator and Adjunct Professor, Department of Computer Science and Engineering, Utkal University, Bhubaneswar, India
  • Hitesh Rawat Research Scholar ,Department of Business Management ,University of Extremadura, Spain
  • Anjali Rawat Research Scholar ,Department of Computer and Communication Technology,University of Extremadura, Spain

DOI:

https://doi.org/10.31449/inf.v48i18.6272

Abstract

The proposed work objective is to adapt Online social networking (OSN) is a type of interactive computer-mediated technology that allows people to share information through virtual networks. The microblogging feature of Twitter makes cyberspace prominent (usually accessed via the dark web). The work used the datasets and considered the Scrape Twitter Data (Tweets) in Python using the SN-Scrape module and Twitter 4j API in JAVA to extract social data based on hashtags, which is used to select and access tweets for dataset design from a profile on the Twitter platform based on locations, keywords, and hashtags. The experiments contain two datasets. The first dataset has over 1700 tweets with a focus on location as a keypoint (hacking-for-fun data, cyber-violence data, and vulnerability injector data), whereas the second dataset only comprises 370 tweets with a focus on reposting of tweet status as a keypoint.  The method used is focused on a new system model for analysing Twitter data and detecting terrorist attacks. The weights of susceptible keywords are found using a ternary search by the Aho-Corasick algorithm (ACA) for conducting signature and pattern matching.  The result represents the ACA used to perform signature matching for assigning weights to extracted words of tweet. ML is used to evaluate Twitter data for classifying patterns and determining the behaviour to identify if a person is a terrorist. SVM (Support Vector Machine) proved to be a more accurate classifier for predicting terrorist attacks compared to other classifiers (KNN- K-Nearest Neighbour and NB-Naïve Bayes). The 1st dataset shows the KNN-Acc. -98.38% and SVM Accuracy as 98.85%, whereas the 2nd dataset shows the KNN-Acc. -91.68% and SVM Accuracy as 93.97%. The proposed work concludes that the generated weights are classified (cyber-violence, vulnerability injector, and hacking-for-fun) for further feature classification. Machine learning (ML) [KNN and SVM] is used to predict the occurrence and incident of crime. The accuracy and efficacy are evaluated using several parameters in the model.

Author Biographies

Romil Rawat, Shri Vaishnav Vidyapeeth Vishwavidyalaya

Research Scholar, Department of Computer Science Engineering, Shri Vaishnav Institute of Information Technology, Shri Vaishnav Vidyapeeth Vishwavidyalaya Indore, India

A Samson Arun Raj, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Tamil Nadu, Coimbatore, India

Dr. A. Samson Arun Raj is currently working as an Assistant Professor (Grade I) in the Division of Computer Science and Engineering, School of Computer Science and Technology, Karunya Institute of Technology and Sciences, Coimbatore, Tamil Nadu, India. His research is focused on smart city development using drone networks and energy grids under various applications, and his area of expertise is wireless sensor networks, vehicular ad-hoc networks, and intelligent transportation systems.

Dr. Rajesh Kumar Chakrawarti, Department of Computer Science and Engineering, Sushila Devi Bansal College(SDBC), Bansal Group of Institutions(BGI), Indore(MP) - India

Dr. Rajesh Kumar Chakrawarti is working as an Dean and Professor, Department of Computer Science & Engineering, Sushila Devi Bansal College(SDBC), Bansal Group of Institutions(BGI), Indore(MP) - India. He has a professional experience of over 20+ years that includes in Academia and Industry. He is involved in teaching courses at both Undergraduate and Post Graduate levels. He is always involved in Teaching, Training, Research, Department, Institution, and University development activities. He has guided 17 Post-Graduate Dissertations and 01 is ongoing. He has organized nearly 100+ Seminars/ Workshops/ FDPs/ SDPs/ Conferences/ Certifications. He has attended around 100+ Seminars/ Workshops/ FDPs/ Conferences/ Certifications and presented/ published/ 100+ Research papers/chapters in books/ abstracts in National/ International Conferences/ Journals. He has worked as Remote Centre Coordinator, Workshop Coordinator, Aakash Project Coordinator, and Spoken Tutorial RESOURCE Centre Coordinator for NMEICT-Project MHRD Govt. of India New Delhi Conducted by IIT-Bombay and IIT-Kharagpur in duration 2012-14 for Remote Centre Shri Vaishnav Institute of Technology and Science (RCID-1117), Indore. His main research interest includes Compiler Design, Natural Language Processing, Computational Linguistics, Machine Translation, & Word-Sense Disambiguation.

Dr. Krishnan Sakthidasan Sankaran, Department of ECE, Hindustan Institute of Technology and Science, Chennai, India.

Dr. Krishnan Sakthidasan Sankaran is a professor in the Department of Electronics and Communication Engineering at Hindustan Institute of Technology and Science, India. He received his B.E. degree from Anna University in 2005, his M.Tech. degree from SRM University in 2007, and his Ph.D. degree from Anna University in 2016. He has been a senior member of IEEE for the past 10 years and a member of various professional bodies. He is an active reviewer in Elsevier journals and an editorial board member in various international journals. His research interests include image processing, wireless networks, cloud computing, and antenna design. He has published more than 70 papers in referred journals and international conferences. He has also published three books to his credit. 

Dr. Sanjaya Kumar Sarangi, Utkal University, Coordinator and Adjunct Professor, Department of Computer Science and Engineering, Utkal University, Bhubaneswar, India

Adjunct Professor and Coordinator at Utkal University, distinguished background in Academic, Research, and Industry sectors combined with 23 years of experience in the knowledge towards achieving the institution's objectives through skill, knowledge, and nourishing the global educational system.  He is a qualified GATE and holds a Master of Technology in Comp. Sc & Engg. and Ph.D. degree in Computer Science.  He was the Research Fellow in Science and Technology, at UGC, Govt of India, and visiting Doctoral Fellow at the University of California, USA. His research findings are in Wireless Ad hoc and Sensor Networks, IDS, Mobile Communications, Cloud Computing, Geospatial Science and Remote Sensing, Climate Change, and Disaster prediction system. He has a number of publications in Journals and Conferences. He has authored many textbooks and book chapters. He has more than 30 National and International Patents. He is an active member and Life member in many associations and also Editor, Technical Program Committee member, and reviewer in reputed journals and conferences. He is a successful coordinator for the NMICT project conducted by IIT, Bombay, and Kharagpur. His Contributed Digital Resources dedicated to Govt. of Odisha, IGNOU, and CSTA, USA. Academic coordinator in administration to various components of the Internationalization of higher education and international profile-building of university and academic exchange agreements with international universities. Managing International Student and Scholar Services with visibility of the international community and promoting educational and cultural exchange programs as International Student Advisor. Taking care of Information and Communication Technology (ICT) to enhance and optimize the information and worldwide research that can lead to improved student learning and teaching methods. Also undertakes ICT-based support for the smooth functioning of e-learning systems for the university.

Hitesh Rawat, Research Scholar ,Department of Business Management ,University of Extremadura, Spain

Hitesh Rawat is a Research Scholar in the Department of Business Management and Economics, at the University of Extremadura, Spain, and is currently also working as a researcher, dealing with research and promotional activities. He has several years of consulting, teaching, and research experience. The author/editor has research aligned with the business framework for secure communication, cyber security, management, tourism studies, operation, marketing, financial supply chain, and cyber security. He also chaired international conferences and hosted several research events, including national and international research schools, PhD colloquia, workshops, and training programs.

Anjali Rawat, Research Scholar ,Department of Computer and Communication Technology,University of Extremadura, Spain

Anjali Rawat is a Research scholar working in the domain of cyber threat at Online social Network withteaching, research, and promotional activities. She has several years of consulting, teaching, and researchexperience. The author/editor has research alignment with business framework for secure communication andcyber security. She also chaired international conferences and hosted several research events, includingnational and international research schools, PhD colloquia, workshops, and training programmes.

References

. Sarker, A., Chakraborty, P., Sha, S. S., Khatun, M., Hasan, M. R., & Banerjee, K. (2020). Improvised technique for analyzing data and detecting terrorist attack using machine learning approach based on twitter data. Journal of Computer and Communications, 8(7), 50-62.doi: 10.4236/jcc.2020.87005

. Nandhini, B. S., & Sheeba, J. I. (2015). Online social network bullying detection using intelligence techniques. Procedia Computer Science, 45, 485-492.https://doi.org/10.1016/j.procs.2015.03.085

. Galán-García, P., Puerta, J. G. D. L., Gómez, C. L., Santos, I., & Bringas, P. G. (2016). Supervised machine learning for the detection of troll profiles in twitter social network: Application to a real case of cyberbullying. Logic Journal of the IGPL, 24(1), 42-53.https://doi.org/10.1093/jigpal/jzv048

. Rathi, S. K., Keswani, B., Saxena, R. K., Kapoor, S. K., Gupta, S., & Rawat, R. (Eds.). (2024). Online Social Networks in Business Frameworks. John Wiley & Sons.https://onlinelibrary.wiley.com/doi/book/10.1002/9781394231126

. Elghanuni, R. H., Ali, M. A., & Swidan, M. B. (2019, August). An overview of anomaly detection for online social network. In 2019 IEEE 10th Control and System Graduate Research Colloquium (ICSGRC) (pp. 172-177). IEEE.DOI: 10.1109/ICSGRC.2019.8837066

. Kaddoura, S., & Henno, S. (2024). Dataset of Arabic spam and ham tweets. Data in Brief, 52, 109904.https://doi.org/10.1016/j.dib.2023.109904

. Lal, S., Tiwari, L., Ranjan, R., Verma, A., Sardana, N., & Mourya, R. (2020). Analysis and classification of crime tweets. Procedia computer science, 167, 1911-1919.https://doi.org/10.1016/j.procs.2020.03.211

. Rasheed, J., Akram, U., & Malik, A. K. (2018, December). Terrorist network analysis and identification of main actors using machine learning techniques. In Proceedings of the 6th international conference on information technology: IoT and smart city (pp. 7-12).https://doi.org/10.1145/3301551.3301573

. Mashechkin, I. V., Petrovskiy, M. I., Tsarev, D. V., & Chikunov, M. N. (2019). Machine learning methods for detecting and monitoring extremist information on the Internet. Programming and Computer Software, 45(3), 99-115.https://doi.org/10.1134/S0361768819030058

. Ji, X., Chun, S. A., Wei, Z., & Geller, J. (2015). Twitter sentiment classification for measuring public health concerns. Social Network Analysis and Mining, 5, 1-25.https://doi.org/10.1007/s13278-015-0253-5

. Ourlis, L., & Bellala, D. (2019). SIMD Implementation of the Aho-Corasick Algorithm Using Intel AVX2. Scalable Computing: Practice and Experience, 20(3), 563-576.https://doi.org/10.12694/scpe.v20i3.1572

. Tam, S., & Tanriöver, Ö. Ö. (2023). Multimodal Deep Learning Crime Prediction Using Tweets. IEEE Access, 11, 93204-93214.DOI: 10.1109/ACCESS.2023.3308967

. Agarwal, P., Sharma, M., & Chandra, S. (2019, August). Comparison of machine learning approaches in the prediction of terrorist attacks. In 2019 Twelfth International Conference on Contemporary Computing (IC3) (pp. 1-7). IEEE.DOI: 10.1109/IC3.2019.8844904

. Lin, W. C., Ke, S. W., & Tsai, C. F. (2015). CANN: An intrusion detection system based on combining cluster centers and nearest neighbors. Knowledge-based systems, 78, 13-21.https://doi.org/10.1016/j.knosys.2015.01.009

. Badri, N., Kboubi, F., & Habacha Chaibi, A. (2024). Abusive and Hate speech Classification in Arabic Text Using Pre-trained Language Models and Data Augmentation. ACM Transactions on Asian and Low-Resource Language Information Processing.https://doi.org/10.1145/3679049

. Zulkarnine, A. T., Frank, R., Monk, B., Mitchell, J., & Davies, G. (2016, September). Surfacing collaborated networks in dark web to find illicit and criminal content. In 2016 IEEE Conference on Intelligence and Security Informatics (ISI) (pp. 109-114). IEEE.DOI: 10.1109/ISI.2016.7745452

. Saini, S., Punhani, R., Bathla, R., & Shukla, V. K. (2019, April). Sentiment analysis on twitter data using R. In 2019 International Conference on Automation, Computational and Technology Management (ICACTM) (pp. 68-72). IEEE.DOI: 10.1109/ICACTM.2019.8776685

. Silivery, A. K., Rao, K. R. M., & Kumar, S. L. (2024). Rap-Densenet Framework for Network Attack Detection and Classification. Journal of Information & Knowledge Management, 2450033.https://doi.org/10.1142/S0219649224500333

. L'huillier, G., Alvarez, H., Ríos, S. A., & Aguilera, F. (2011). Topic-based social network analysis for virtual communities of interests in the dark web. ACM SIGKDD Explorations Newsletter, 12(2), 66-73.'https://doi.org/10.1145/1964897.1964917

. Rawat, R., & Rajavat, A. (2024). Perceptual Operating Systems for the Trade Associations of Cyber Criminals to Scrutinize Hazardous Content. International Journal of Cyber Warfare and Terrorism (IJCWT), 14(1), 1-19.DOI: 10.4018/IJCWT.343314.

. Godawatte, K., Raza, M., Murtaza, M., & Saeed, A. (2019, December). Dark Web along with the dark Web marketing and surveillance. In 2019 20th International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT) (pp. 483-485). IEEE.DOI: 10.1109/PDCAT46702.2019.00095

. Cai, Y. (2024). Multi pattern matching algorithm for embedded computer network engineering intrusion detection system. Intelligent Decision Technologies, 18(2), 705-716.DOI: 10.3233/IDT-230249.

. Abdalrdha, Z. K., Al-Bakry, A. M., & Farhan, A. K. (2023, December). Crimes Tweet Detection Based on CNN Hyperparameter Optimization Using Snake Optimizer. In National Conference on New Trends in Information and Communications Technology Applications (pp. 207-222). Cham: Springer Nature Switzerland.https://doi.org/10.1007/978-3-031-62814-6_15

. Kini, S., Patil, A. P., Pooja, M., & Balasubramanyam, A. (2022, May). SQL Injection Detection and Prevention using Aho-Corasick Pattern Matching Algorithm. In 2022 3rd International Conference for Emerging Technology (INCET) (pp. 1-6). IEEE.DOI: 10.1109/INCET54531.2022.9825040

. Rawat, R., Chakrawarti, R. K., Raj, A. S. A., Mani, G., Chidambarathanu, K., & Bhardwaj, R. (2023). Association rule learning for threat analysis using traffic analysis and packet filtering approach. International Journal of Information Technology, 15(6), 3245-3255.https://doi.org/10.1007/s41870-023-01353-0

. Felix Enigo, V. S. (2020). An Automated System for Crime Investigation Using Conventional and Machine Learning Approach. In Innovative Data Communication Technologies and Application: ICIDCA 2019 (pp. 109-117). Springer International Publishing.https://doi.org/10.1007/978-3-030-38040-3_12

. Abdalrdha, Z. K., Al-Bakry, A. M., & Farhan, A. K. (2023, December). Improving the CNN Model for Arabic Crime Tweet Detection Based on an Intelligent Dictionary. In 2023 16th International Conference on Developments in eSystems Engineering (DeSE) (pp. 748-753). IEEE.DOI: 10.1109/DeSE60595.2023.10469560

. Rawat, R., Oki, O. A., Sankaran, K. S., Olasupo, O., Ebong, G. N., & Ajagbe, S. A. (2023). A new solution for cyber security in big data using machine learning approach. In Mobile Computing and Sustainable Informatics: Proceedings of ICMCSI 2023 (pp. 495-505). Singapore: Springer Nature Singapore.https://doi.org/10.1007/978-981-99-0835-6_35

. Taiwo, G. A., Saraee, M., & Fatai, J. (2024). Crime Prediction Using Twitter Sentiments and Crime Data. Informatica, 48(6). https://doi.org/10.31449/inf.v48i6.4749

. Liu, Y., & Pan, B. (2024). Profit Estimation Model and Financial Risk Prediction Combining Multi-scale Convolutional Feature Extractor and BGRU Model. Informatica, 48(11).https://doi.org/10.31449/inf.v48i11.5941

. Sabir, A., Ali, H. A., & Aljabery, M. A. (2024). ChatGPT Tweets Sentiment Analysis Using Machine Learning and Data Classification. Informatica, 48(7).https://doi.org/10.31449/inf.v48i7.5535

Downloads

Published

2024-11-08

How to Cite

Rawat, R., Arun Raj, A. S., Chakrawarti, D. R. K., Sankaran, D. K. S., Sarangi, D. S. K., Rawat, H., & Rawat, A. (2024). Enhanced Cybercrime Detection on Twitter Using Aho-Corasick Algorithm and Machine Learning Techniques. Informatica, 48(18). https://doi.org/10.31449/inf.v48i18.6272