ИСТИНА |
Войти в систему Регистрация |
|
Интеллектуальная Система Тематического Исследования НАукометрических данных |
||
The detection of human values, beliefs or tonality in large text collections, e.g. publications in social networks, requires ML algorithms and an interdisciplinary expertise. Narratives and worldviews can be uncovered via a context-dependent information markup. The formalization is achieved by the markup representation as a hyper-graph model. Here the vertices correspond to the text spans while the edges link with the markup elements labeled by the values or emotions concepts in classifiers. Any markup element contains arbitrary text fragments, and their set correlates with manifestation of values or sentiments. After typing a sufficient number of marked-up documents the model is trained to automatically determine values or emotions expressed in the texts. The authors illustrate their methodology with the case of finding cultural codes in a collection of social media publications. The first section of the paper reviews scientific schools of exposing value codes and arguments the task relevance, its humanitarian, mathematical and software aspects. The second one introduces mathematical definitions and the problem statement, algorithmic approaches in natural language processing applicable to its solution. The third section overviews a project of the MSU IAI laboratory of machine learning and semantic analysis. The outcomes of the project are a software developed for a context-dependent markup and various applications in the media analysis industry, training samples fulfilled by linguists, sociologists, economists, and political science experts, as well as some relevant pretest statistics on the contemporary cultural landscape.