Evaluating Semantic Vectors for Norwegian

  • Cathrine Stadsnes
  • Lilja Øvrelid
  • Erik Velldal

Abstract

In this article, we present two benchmark data sets for evaluating models of semantic word similarity for Norwegian. While such resources are available for English, they did not exist for Norwegian prior to this work. Furthermore, we produce large-coverage semantic vectors trained on the Norwegian Newspaper Corpus using several popular word embedding frameworks. Finally, we demonstrate the usefulness of the created resources for evaluating performance of different word embedding models on the tasks of analogical reasoning and synonym detection. The benchmark data sets and word embeddings are all made freely available.

Published
2018-08-08
Section
Artikler