Sentiment Analysis of Algerian Dialect Using Machine Learning and Deep Learning with Word2vec
DOI:
https://doi.org/10.31449/inf.v46i6.3340Abstract
In this paper, we deal with the issue of sentiment analysis on dialectal comments extracted from social media. These comments concern the Algerian spoken language, written in Arabic and/or Latin characters, which could be either Modern Standard Arabic, French or local dialect. This complexity gives rise to a large number of text processing issues.The contributions of this work are fourfold. First, we build an Algerian dialect sentiment dataset of 11760 comments collecting from diverse social media platforms. Second, we also create Skip-Gram and CBOW model by word2vec from a corpus containing 466424 comments, these latter are used for enhancing the sentiment dataset by semantically similar words. Third, we propose an adapted preprocessing step set to deal with dialectal texts. Finally, we implement and conduct different machine learning classifiers (SVM, Naive Bayes via its three variants (Bernoulli NB, Gaussian NB and Multinomial NB)) and two deep learning architectures (CNN, RNN) to evaluate and compare the dataset in original version, in a transcribed to Latin character version and then in a semantically-enhanced version by word2vec models. Experiments reach performances of sentiment classifiers applied on "dataset transcribed to Latin characters" of accuracies = (MNB:84.21%, CNN:64.11%) and on "transcribed dataset and enhanced by word2vec models" of accuracies = (SVM:83.70%, RNN:65.21%).References
B. Liu, Sentiment analysis: Mining opinions, sentiments, and emotions. Cambridge university press, 2020.
L. Zhang, S. Wang, and B. Liu, “Deep Learning for Sentiment Analysis : A Survey,” Lang. Linguist. Compass, vol. 10, no. 12, pp. 701–719, Jan. 2018.
B. Agarwal, R. Nayak, N. Mittal, and S. Patnaik, Deep Learning-Based Approaches for Sentiment Analysis. Springer, 2020.
T. Mikolov, K. Chen, G. Corrado, and J. Dean, “Efficient estimation of word representations in vector space,” 1st Int. Conf. Learn. Represent. ICLR 2013 - Work. Track Proc., pp. 1–12, 2013.
A. M. Alayba, V. Palade, M. England, and R. Iqbal, “Improving Sentiment Analysis in Arabic Using Word Representation,” 2nd IEEE Int. Work. Arab. Deriv. Scr. Anal. Recognition, ASAR 2018, pp. 13–18, 2018.
C. Alfaro, J. Cano-Montero, J. Gómez, J. M. Moguerza, and F. Ortega, “A multi-stage method for content classification and opinion mining on weblog comments,” Ann. Oper. Res., vol. 236, no. 1, pp. 197–213, 2016.
O. Araque, I. Corcuera-Platas, J. F. Sánchez-Rada, and C. A. Iglesias, “Enhancing deep learning sentiment analysis with ensemble techniques in social applications,” Expert Syst. Appl., vol. 77, pp. 236–246, 2017.
M. Amjad, I. Voronkov, A. Saenko, and A. Gelbukh, “Comparison of text classification methods using deep learning neural networks,” in Proceedings of the 20th International Conference on Computational Linguistics and Intelligent Text Processing (CICLing), 2019.
Y. Zhang, Z. Zhang, D. Miao, and J. Wang, “Three-way enhanced convolutional neural networks for sentence-level sentiment classification,” Inf. Sci. (Ny)., vol. 477, pp. 55–64, 2019.
O. Habimana, Y. Li, R. Li, X. Gu, and G. Yu, “Sentiment analysis using deep learning approaches: an overview,” Sci. China Inf. Sci., vol. 63, no. 1, p. 111102, 2019.
P. Ray and A. Chakrabarti, “A mixed approach of deep learning method and rule-based method to improve aspect level sentiment analysis,” Appl. Comput. Informatics, 2020.
A. Yadav and D. K. Vishwakarma, “Sentiment analysis using deep learning architectures: a review,” Artif. Intell. Rev., vol. 53, no. 6, pp. 4335–4385, 2020.
E. M. Alshari, A. Azman, S. Doraisamy, N. Mustapha, and M. Alkeshr, “Improvement of Sentiment Analysis Based on Clustering of Word2Vec Features,” in 2017 28th International Workshop on Database and Expert Systems Applications (DEXA), 2017, pp. 123–126.
J. Acosta, N. Lamaute, M. Luo, E. Finkelstein, and C. Andreea, “Sentiment analysis of twitter messages using word2vec,” Proc. Student-Faculty Res. Day, CSIS, Pace Univ., vol. 7, pp. 1–7, 2017.
Q. Chen and M. Sokolova, “Word2Vec and Doc2Vec in Unsupervised Sentiment Analysis of Clinical Discharge Summaries.,” CoRR, vol. 1805.00352. 2018.
B. Shi, J. Zhao, and K. Xu, “A Word2vec Model for Sentiment Analysis of Weibo,” in 2019 16th International Conference on Service Systems and Service Management (ICSSSM), 2019, pp. 1–6.
H. ElSahar and S. R. El-Beltagy, “Building large arabic multi-domain resources for sentiment analysis,” in International Conference on Intelligent Text Processing and Computational Linguistics, 2015, pp. 23–34.
A. Dahou, S. Xiong, J. Zhou, M. H. Haddoud, and P. Duan, “Word embeddings and convolutional neural network for arabic sentiment classification,” in Proceedings of coling 2016, the 26th international conference on computational linguistics: Technical papers, 2016, pp. 2418–2427.
M. Abdullah and M. Hadzikadic, “Sentiment analysis on arabic tweets: Challenges to dissecting the language,” in International Conference on Social Computing and Social Media, 2017, pp. 191–202.
S. Siddiqui, A. A. Monem, and K. Shaalan, “Evaluation and enrichment of Arabic sentiment analysis,” in Intelligent Natural Language Processing: Trends and Applications, Springer, 2018, pp. 17–34.
M. Al-Smadi, O. Qawasmeh, M. Al-Ayyoub, Y. Jararweh, and B. Gupta, “Deep recurrent neural network vs. support vector machine for aspect-based sentiment analysis of Arabic hotels’ reviews,” J. Comput. Sci., vol. 27, pp. 386–393, 2018.
M. Heikal, M. Torki, and N. El-Makky, “Sentiment analysis of Arabic Tweets using deep learning,” Procedia Comput. Sci., vol. 142, pp. 114–122, 2018.
I. Guellil, F. Azouaou, and F. Chiclana, “ArAutoSenti: automatic annotation and new tendencies for sentiment classification of Arabic messages,” Soc. Netw. Anal. Min., vol. 10, no. 1, p. 75, 2020.
A. Ghallab, A. Mohsen, and Y. Ali, “Arabic Sentiment Analysis: A Systematic Literature Review,” Appl. Comput. Intell. Soft Comput., vol. 2020, 2020.
K. Meftouh, N. Bouchemal, and K. Smaïli, “A study of a non-resourced language: an Algerian dialect,” in Spoken Language Technologies for Under-Resourced Languages, 2012.
H. Saadane and N. Habash, “A Conventional Orthography for Algerian Arabic,” in Proceedings of the Second Workshop on Arabic Natural Language Processing (ANLP), 2015, pp. 69–79.
M. Bettiche, M. Z. Mouffok, and C. Zakaria, “Opinion Mining in Social Networks for Algerian Dialect,” in International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, 2018, pp. 629–641.
A. Soumeur, M. Mokdadi, A. Guessoum, and A. Daoud, “Sentiment analysis of users on social networks: overcoming the challenge of the loose usages of the Algerian Dialect,” Procedia Comput. Sci., vol. 142, pp. 26–37, 2018.
L. Moudjari, K. Akli-Astouati, and F. Benamara, “An Algerian Corpus and an Annotation Platform for Opinion and Emotion Analysis,” in Proceedings of The 12th Language Resources and Evaluation Conference, 2020, pp. 1202–1210.
D. Holmes and M. C. McCabe, “Improving precision and recall for soundex retrieval,” in Proceedings. International Conference on Information Technology: Coding and Computing, 2002, pp. 22–26.
V. I. Levenshtein, “Binary codes capable of correcting deletions, insertions, and reversals,” in Soviet physics doklady, 1966, vol. 10, no. 8, pp. 707–710.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika