Semantic Annotation of Documents Based on Wikipedia Concepts

Authors

  • Janez Brank Jozef Stefan Institute
  • Gregor Leban Jozef Stefan Institute
  • Marko Grobelnik Jozef Stefan Institute

Abstract

Semantic annotation is the task of augmenting an unstructured textual document with semantic information, such as concepts from an ontology. In wikification, the Wikipedia is used as an ontology and its pages (articles) are regarded as (representations of) concepts. We describe an efficient approach for annotating a document with relevant concepts from the Wikipedia. A global disambiguation method based on constructing a mention-concept graph and computing pagerank over it is used to identify a coherent set of relevant concepts considering the input document as a whole. The presented approach is suitable for parallel processing and can support any language for which a sufficiently large Wikipedia is available. Several heuristics involved in the disambiguation of candidate annotations are discussed and an experimental evaluation of their influence is presented.

Downloads

Published

2018-03-26

How to Cite

Brank, J., Leban, G., & Grobelnik, M. (2018). Semantic Annotation of Documents Based on Wikipedia Concepts. Informatica, 42(1). Retrieved from https://puffbird.ijs.si/index.php/informatica/article/view/2228

Issue

Section

AI in Slovenia