Generating training data for word sense disambiguation in Russian | [АВТОМАТИЧЕСКИЙ СБОР И РАЗМЕТКА ОБУЧАЮЩЕЙ КОЛЛЕКЦИИ ДЛЯ ЗАДАЧИ РАЗРЕШЕНИЯ ЛЕКСИЧЕСКОЙ НЕОДНОЗНАЧНОСТИ НА РУССКОМ ЯЗЫКЕ]

Авторы: Loukachevitch N.V., Bolshina A.S.
Сборник: Proceedings of the conference Komp'yuternaya lingvistika i intellektual'nyye tekhnologii Dialoque-2020
Год издания: 2020
Первая страница: 119
Последняя страница: 132
DOI: 10.28995/2075-7182-2020-19-119-132
Аннотация: The best approaches in Word Sense Disambiguation (WSD) are supervised and rely on large amounts of hand-labelled data, which is not always avail-able and costly to create. For the Russian language there is no sense-tagged resource of the size sufficient to train supervised word sense disambiguation algorithms. In our work we describe an approach that is used to create an au-tomatically labelled collection based on the monosemous relatives (related unambiguous entries). The main contribution of our work is that we extracted monosemous relatives that can be located at relatively long distances from a target ambiguous word and ranked them according to the similarity mea-sure to the target sense. The selected candidates are then used to extract training samples from the news corpus. We evaluated word sense disam-biguation models based on a nearest neighbor classification on BERT and ELMo embeddings. Our work relies on the Russian wordnet RuWordNet.
Добавил в систему: Лукашевич Наталья Валентиновна

	ИСТИНА	Войти в систему Регистрация
	Интеллектуальная Система Тематического Исследования НАукометрических данных
	Главная Поиск Статистика О проекте Помощь

ИСТИНА