After removing punctuation marks, numeric values, articles, prepositions, conjunctions, and auxiliary verbs, 4,854 unique words in ETIS, 4,421 unique words in SICRIS, and 3,950 unique words in FP were identified (Table 2). Co-occurrence analyses were made on the basis of the top 200 words. Across all funding instruments, about a quarter of the top words constitute half of the word occurrences (Figure 1).
Figure 1. The curve of the Top 200 most frequent words of projects in FP, Estonia, and Slovenia during 2007-2018. |
Word frequency is an important measure in content analysis. This measure is used to identify the most important research topics or concepts in a field by focusing on the most frequently occurring words (
Milojević et al., 2011). As one aim of this paper was to examine to what extent FP affects national programs, we can see from
Figure 2 that in the majority of cases words do not overlap. There is more overlapping between words in the case of SL and EE (20.85%), and also in the case of SL and FP (15.6%), in the case of EE and FP it is almost half of these (8.05%).
Figure 2. The overlapping and unique words in Top 200 of CORDIS, ETIS, SICRIS. |
As stated by Milojević et al. (2011), all words are specific or nonspecific to some degree, depending on the context. Adjectives and nouns play a major role in understanding the content of the text. Three types of words were distinguished in the sample. The first group consists of the so-called project classics. Since project preparation is subject to certain standards, there are terms in the text that are not related to the content but are at the same time necessary to achieve the given criteria (project, research, study, analysis, focus, deliver, evaluate, implement, network, develop, publish, area, findings, process, platform, etc.). They make up the majority of the top 200 words. The most common pairs of words are formed from them (research project, proposed project, proposed research, long term, project aims). The second group is content words which form the core of the projects and enable them to follow the research trends in the given time frame. It is worth considering in the future to analyze data based solely on content words. Geographic locations form a separate group. It looks like good practice of project writing to mention the target area (Europe, Estonia, and Slovenia). At the same time, research topics are influenced by the past and proximity—in the case of Estonia, top 200 words include “Russia,” “Baltic,” “Livonia,” and “German”; “Yugoslavia” in the case of Slovenia; “Mediterranean,” “China,” and “Africa” in the case of FP.
From the point of view of our study, the content words deserve closer examination. There is a definite set of content words that overlap in all datasets throughout the period: culture, education, environment, history, human, identity, innovation, national, policy, social.
Throughout the period some words overlap between different databases: EE-FP (age, east, public), SL-EE (individual, language, literature). For the most part, overlapping words in different periods are specific to a database. In the case of Estonia: children, school, student, linguistic; in the case of Slovenia: art, legal, spatial; in the case of FP: citizen, employment, migration, mobility, urban, young. Unique words form a separate group, in the case of Estonia: dialogue, interdisciplinary, music, semiotics, teacher, ancient, collection, genetic, infrastructure, medieval; in the case of Slovenia: family, minority, territory, tradition, values, memory, tourism, war. The largest number of unique words appears in FP where periods also differ. FP7 (2007-2013): carbon, exclusion, foresight, gender, humanities, lifestyle, peace, poverty, rural, security, SSH, transition, unemployment, welfare; H2020 (2014-20): investment, job, justice, cohesion, crises, emergence, inclusion, inequalities, reflective, responsible, transparency.
However, using co-word analysis, we see that the meaning of the overlapping words varies from one database to another (Figure 3). Thus, taking for example one of the most commonly used words “culture,” we see differences throughout different datasets. The most commonly used co-words in the case of FP are “heritage,” “scientific,” “program,” “Europe,” “identity,” in case of Slovenia: “national,” “media,” “relations,” “religion,” “history,” “saints”; in case of Estonia: “narrative,” “conflicts,” “semiotic,” “social,” “history,” “Estonian,” “memory,” etc. This also applies to words which show the geographic location. For example, the word “Europe” is related in the Estonian dataset to “Central Eastern European,” “union,” “public administration”; in the Slovenian dataset to “political,” “court”; in FP dataset to “union,” “research,” “policy,” and “integration.”
Figure 3. The most commonly used co-words in the three datasets for the terms “Culture” and “Innovation.” |
At the same time, we have to take into account the specifics of SSH again and again—the one-to-one meaning of terms/words is not as important as, for example, in the exact sciences. Thus, even in co-word analysis, the final content may go unnoticed. For example, one of the most commonly used co-words with “culture” was “landscape.” Going back to the original data, we selected out projects with these keywords. The words were interpreted in many ways: “Where Land Meets the Sea. Maritime Cultural Landscapes in Prehistoric and Medieval Estonia,” or “Positioning life-writing on Estonian literary landscapes,” or “The Politics of Peace and Conflict Knowledge: Syria and the Diverse Landscape of Local Knowledge/Experience,” or “Orthodox People in Estonia and Orthodox Churches in Estonian Landscape (18th-21st Century).” Projects belonged to a variety of fields, starting from archaeology, cultural anthropology, ethnology, general and comparative literature, literary criticism, and literary theory, political and administrative sciences, ending with social geography.