Discriminating Between Closely Related Languages on Twitter"

Nikola  Ljubešić; Denis Kranjčić

Discriminating Between Closely Related Languages on Twitter"

Authors

Nikola Ljubešić
Denis Kranjčić

Abstract

Editorial: "In this paper we tackle the problem of discriminating Twitter users by the language they tweet in, taking into account very similar South-Slavic languages – Bosnian, Croatian, Montenegrin and Serbian. We apply the supervised machine learning approach by annotating a subset of 500 users from an existing Twitter collection by the language the users primarily tweet in. We show that by using a simple bag-ofwords model, univariate feature selection, 320 strongest features and a standard classifier, we reach user classification accuracy of 98%. Annotating the whole 63,160 users strong Twitter collection with the best performing classifier and visualizing it on a map via tweet geo-information, we produce a Twitter language map which clearly depicts the robustness of the classifier."

Downloads

Issue

Vol. 39 No. 1 (2015)

Section

Regular papers

License

I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.

I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.

In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.

I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.

How to Cite

Discriminating Between Closely Related Languages on Twitter". (2015). Informatica, 39(1). https://puffbird.ijs.si/index.php/informatica/article/view/746

Download Citation

Discriminating Between Closely Related Languages on Twitter"

Authors

Abstract

Downloads

Issue

Section

License

How to Cite

Information

SUPPORT & INDEXING

Make a Submission

Latest publications

Browse