Using Semi-Supervised Learning and Wikipedia to Train an Event Argument Extraction System

Authors

  • Patrik Zajec Jožef Stefan Institute and Jožef Stefan International Postgraduate School
  • Dunja Mladenić Jožef Stefan Institute and Jožef Stefan International Postgraduate School

DOI:

https://doi.org/10.31449/inf.v46i1.3577

Abstract

The paper presents a methodology for training an event argument extraction system in a semi-supervised setting. We use Wikipedia and Wikidata to automatically obtain a small noisily labeled dataset and a large unlabeled dataset. The dataset consists of event clusters containing Wikipedia pages in multiple languages. The unlabeled data is iteratively labeled using semi-supervised learning combined with probabilistic soft logic to infer the pseudo-label of each example from the predictions of multiple base learners. The proposed methodology is applied to Wikipedia pages about earthquakes and terrorist attacks in a  cross-lingual setting. Our experiments show improvement of the results when using the proposed methodology. The system achieves F1-score of 0.79 when only the automatically labeled dataset is used, and F1-score of 0.84 when trained according to the methodology with semi-supervised learning combined with probabilistic soft logic.

Author Biographies

  • Patrik Zajec, Jožef Stefan Institute and Jožef Stefan International Postgraduate School
    E3, Student
  • Dunja Mladenić, Jožef Stefan Institute and Jožef Stefan International Postgraduate School
    E3, Department Leader

Downloads

Published

2022-03-15

Issue

Section

Student papers

How to Cite

Using Semi-Supervised Learning and Wikipedia to Train an Event Argument Extraction System. (2022). Informatica, 46(1). https://doi.org/10.31449/inf.v46i1.3577