String Transformation Based Morphology Learning
DOI:
https://doi.org/10.31449/inf.v43i4.2520Abstract
There are several morphological methods that can solve the morphological rule induction problem. For different languages this task represents different difficulty levels. In this paper we propose a novel method that can learn prefix, infix and suffix transformations as well. The test language is Hungarian, and we chose a previously generated word pair set of accusative case for evaluating the method, comparing its training time, memory requirements, average inflection time and correctness ratio with some of the most popular models like dictionaries, finite state transducers, the tree of aligned suffix rules and a lattice based method. We also provide multiple training and searching strategies, introducing parallelism and the concept of prefix trees to optimize the number of rules that need to be processed for each input word. This newly created novel method can be applied not only for morphology, but also for any problems in the field of bioinformatics and data mining that can benefit from string transformations learning.References
A. Gelbukh, M. Alexandrov and S.-Y. Han (2004) Detecting inflection patterns in natural language by minimization of morphological model, Iberoamerican Congress on Pattern Recognition, Springer, pp. 432–-438.
G. Satta and J. C. Henderson (1997) String transformation learning, In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and Eighth Conference of the European Chapter of the Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 444-451.
J. Hajic (1988) Formal morphology, In Proceedings of International Conference on Computational Linguistics, pp. 223-229.
K. Koskenniemi (1983) Two-level morphology: A General Computational Model for Word-Form Recognition and Production, Department of General Linguistics, University of Helsinki, Finland.
L. Bauer (2003) Introducing linguistic morphology, Edinburgh University Press.
J. Oncina, P. García and E. Vidal (1993) Learning subsequential transducers for pattern recognition interpretation tasks, IEEE Transactions on Pattern Analysis and Machine Intelligence, Volume 15, Number 5, pp. 448-458.
C. De la Higuera (2010) Grammatical inference: learning automata and grammars, Cambridge University Press.
D. Gildea and D. Jurafsky (1995) Automatic Induction of Finite State Transducers for Simple Phonological Rules, Proceedings of the 33rd Annual Meeting on Association for Computational Linguistics, Cambridge, Massachusetts, pp. 9-15.
P. Theron and I. Cloete (1997) Automatic acquisition of two-level morphological rules, Proceedings of the fifth conference on Applied natural language processing, pp. 103-110.
J. Goldsmith (2006) An algorithm for the unsupervised learning of morphology, Natural Language Engineering, Volume 12, Number 4, pp. 353-371.
J. Lee and J. Goldsmith (2016) Linguistica 5: Unsupervised Learning of Linguistic Structure, HLT-NAACL Demos, pp. 22-26.
K. Shalonova and P. Flach (2007) Morphology learning using tree of aligned suffix rules, In ICML Workshop: Challenges and Applications of Grammar Induction.
G. Szabó and L. Kovács (2018) Lattice based morphological rule induction, Acta Universitas Apulensis, Number 53, pp. 93-110.
C. Fellbaum (1998) WordNet: An electronic lexical database, MIT Press, Cambridge.
M. Miháltz, Cs. Hatvani, J. Kuti, Gy. Szarvas, J. Csirik, G. Prósz{'e}ky and T. Váradi (2007) Methods and results of the Hungarian WordNet project, In Proceedings of GWC 2008: 4th Global WordNet Conference, University of Szeged, pp. 311-320.
G. Szabó and L. Kovács (2015) Efficiency analysis of inflection rule induction, In Proceedings of the 2015 16th International Carpathian Control Conference (ICCC), IEEE, pp. 521-525.
N. C. Jones (2004) An introduction to bioinformatics algorithms, MIT press.
E. Mourad and Z. Y. Albert (2011) Algorithms in computational molecular biology: techniques, approaches and applications, Volume 21, John Wiley & Sons.
V. I. Levenshtein (1966) Binary codes capable of correcting deletions, insertions, and reversals, Soviet physics doklady, Volume 10, Number 8, pp. 707-710.
S. Cucerzan and E. Brill (2004) Spelling Correction as an Iterative Process that Exploits the Collective Knowledge of Web Users, EMNLP, Volume 4, pp. 293-300.
Downloads
Published
How to Cite
Issue
Section
License
I assign to Informatica, An International Journal of Computing and Informatics ("Journal") the copyright in the manuscript identified above and any additional material (figures, tables, illustrations, software or other information intended for publication) submitted as part of or as a supplement to the manuscript ("Paper") in all forms and media throughout the world, in all languages, for the full term of copyright, effective when and if the article is accepted for publication. This transfer includes the right to reproduce and/or to distribute the Paper to other journals or digital libraries in electronic and online forms and systems.
I understand that I retain the rights to use the pre-prints, off-prints, accepted manuscript and published journal Paper for personal use, scholarly purposes and internal institutional use.
In certain cases, I can ask for retaining the publishing rights of the Paper. The Journal can permit or deny the request for publishing rights, to which I fully agree.
I declare that the submitted Paper is original, has been written by the stated authors and has not been published elsewhere nor is currently being considered for publication by any other journal and will not be submitted for such review while under review by this Journal. The Paper contains no material that violates proprietary rights of any other person or entity. I have obtained written permission from copyright owners for any excerpts from copyrighted works that are included and have credited the sources in my article. I have informed the co-author(s) of the terms of this publishing agreement.
Copyright © Slovenian Society Informatika