While data analysis at the level of papers is performed (Mryglod et al.,
2021), many interesting questions can be put at the level of authors. To give an example, typical individual productivity or authors' collaboration patterns have to be known to set benchmarks for comparing, assessing, or detecting examples of unusual publishing behavior. What is also important, the authors' gender is typically (and in this work) inferred from the given names. Therefore, the gender label cannot be assigned if only initials are specified instead of the full name. However, merging various records related to the same person, allows us to enlarge the statistics of papers with genderized authors. For example, gender can be defined for
KAFKA S. (?) merged with
KAFKA SOFIYA (Female). However, the widely-known problem of name disambiguation appears if unique digital identifiers are not commonly used. With only names, it is impossible to guarantee that two identically written names correspond to the same person. The uncertainty is higher if only initials are used. But everything is even more complicated in the case of publications by authors who are not native English speakers. It is possible to find numerous alternative transliterations of Cyrillic names for the same authors in our data set. Moreover, speaking of Ukrainian names, one should take into account the tradition of “translating” given names, and sometimes even last names, into Russian. For example, an author
Bosovskaya can be also mentioned as
Bosovska;
Orlovskaya -
Orlovska;
Mostenskaya -
Mostenska. Many of the first names can be transliterated to English using Ukrainian or Russian Cyrillic versions; some of the most used are:
Mykola - Nikolay, Oleksandr - Aleksandr, Kateryna - Ekaterina, Olena - Elena. It can be instantly noted that the names in these pairs correspond to different initials:
M - N, O - A, K - E, O - E, respectively. The space of possible alternatives is also expanded by using different short versions of names:
Olena →
Lena, Oleksiy →
Alex, Anastasiya →
Nastya, Tetyana →
Tanya, etc. Sometimes, the same name can be written in many ways, and each of them is automatically recognized as a separate name. Moreover, metadata for Ukrainian Economics journals can be deposited not only in English, but also in Ukrainian (Russian). Last but not least, inaccurate usage of Latin and Cyrillic alphabets is a problem. The homoglyphs - letters that look the same on a screen but are coded differently - are used arbitrarily. After all, 40 versions of the name
Eugen are found in the data set. Taken together, all these peculiarities of metadata of Ukrainian (non-native English) publications complicate the process of disambiguating the names of authors.