PUBLICATIONS

Slovene Translation of the Atomic 2020 data set SloATOMIC 2020

Authors

Adrian Mladenić Grobelnik,

Erik Novak,

Dunja Mladenić,

Publication date

2022

Publisher

Jožef Stefan Institute

Total citations

Cited by

Description

Description The SloATOMIC 2020 corpus contains the Slovene translations of the ATOMIC 2020 data set, a commonsense knowledge graph with 1.33 M everyday inferential knowledge tuples about entities and events. The translations were acquired using the DeepL translation service, where a selection of about 10k examples was also manually inspected and appropriately fixed. The corpus consists of 1.331. 114 examples distributed across the train, validation, and test data sets. The corpus was created as part of work package 4 of the Slovene in the Digital Environment project. The corpus consists of the following files:-sloatomic_train. tsv: The training set.-sloatomic_dev. tsv: The validation set.-sloatomic_test. tsv. automatic_all: The test set containing all of the automatically translated examples.-sloatomic_test. tsv. automatic_10k: The selection of 10k examples from the complete test set.-sloatomic_test. tsv …

Publication

PUBLICATIONS

Slovene Translation of the Atomic 2020 data set SloATOMIC 2020

OptimalAI