Research evaluation reform and the heterogeneity of researchers’ metric-wiseness

Sandra Rousseau; Cinzia Daraio

doi:10.2478/jdis-2025-0012

Journal of Data and Information Science >

2025 , Vol. 10 >Issue 1: 47 - 73

DOI: https://doi.org/10.2478/jdis-2025-0012

Research Papers

Research evaluation reform and the heterogeneity of researchers’ metric-wiseness

Sandra Rousseau ^,^† ,
Cinzia Daraio ²

Expand

¹CEDON, KU Leuven, Brussel B-1000, Belgium
²DIAG, Sapienza University of Rome, Rome 00185, Italy

†Sandra Rousseau (Email: sandra.rousseau@kuleuven.be).

Received date: 2024-06-19

Revised date: 2024-11-21

Accepted date: 2024-12-20

Online published: 2025-01-09

Fold

Abstract

Purpose: We aimed to measure the variation in researchers’ knowledge and attitudes towards bibliometric indicators. The focus is on mapping the heterogeneity of this metric-wiseness within and between disciplines.

Design/methodology/approach: An exploratory survey is administered to researchers at the Sapienza University of Rome, one of Europe’s oldest and largest generalist universities. To measure metric-wiseness, we use attitude statements that are evaluated by a 5-point Likert scale. Moreover, we analyze documents of recent initiatives on assessment reform to shed light on how researchers’ heterogeneous attitudes regarding and knowledge of bibliometric indicators are taken into account.

Findings: We found great heterogeneity in researchers’ metric-wiseness across scientific disciplines. In addition, within each discipline, we observed both supporters and critics of bibliometric indicators. From the document analysis, we found no reference to individual heterogeneity concerning researchers’ metric wiseness.

Research limitations: We used a self-selected sample of researchers from one Italian university as an exploratory case. Further research is needed to check the generalizability of our findings.

Practical implications: To gain sufficient support for research evaluation practices, it is key to consider researchers’ diverse attitudes towards indicators.

Originality/value: We contribute to the current debate on reforming research assessment by providing a novel empirical measurement of researchers’ knowledge and attitudes towards bibliometric indicators and discussing the importance of the obtained results for improving current research evaluation systems.

Key words： Research assessment; Research reform; Metric-Wiseness; Heterogeneity of researchers; Bibliometric indicators; Researchers attitudes

Cite this article

Sandra Rousseau , Cinzia Daraio . Research evaluation reform and the heterogeneity of researchers’ metric-wiseness[J]. Journal of Data and Information Science, 2025 , 10(1) : 47 -73 . DOI: 10.2478/jdis-2025-0012

1 Introduction and contribution

In a context where bibliometric indicators are omnipresent, large strands of the literature have discussed the design and quality of these indicators as well as the appropriateness of using bibliometric indicators for the assessment of researchers, research output, research institutions, and research quality (Gingras, 2016; Haustein & Larivière, 2014; Rousseau et al., 2018). Indicators are used to understand the basic features of science and the social networks of scientists and scientific institutes, as well as to inform evaluation practices. However, it is also important to investigate how researchers react to the presence of these indicators and the use of indicators in assessment procedures.

Bibliometric indicators can help with research evaluation, as they provide structure in a context where assessments are challenging (Rousseau & Rousseau, 2021). They can improve researchers’ upward mobility based on merit rather than networking (van Dalen & Henkens, 2012). In addition, the use of bibliometric indicators may be reproducible and objective. However, for example, authors may cite for many other reasons besides recognizing scientific quality and the databases can contain errors (e.g. Franceschini et al., 2016), this objectivity may be challenged in practice. If appropriately defined, standard bibliometric indicators can be used in most disciplines, with the possible exception of the humanities and some social sciences (Hicks, 2004). Publication and reference practices in the humanities and social sciences differ from those of natural and medical sciences, for example, because of the importance of monographs, limiting the value of citation-based indicators. However, by normalizing indicators based on field and time (Bornmann & Wohlrabe, 2019), assessments within one discipline and across disciplines become possible. To provide evidence that scientific results have societal relevance, altmetrics or social media metrics can be useful (Bornmann, 2014). However, these social media indicators include a large variety of incomparable data (Bornmann, 2014; Lin & Fenner, 2013; Penny, 2016), which raises the challenging question of normalization and data quality control (Haustein, 2016). Furthermore, bibliometric indicators are versatile and can be applied to various research outcomes. Therefore, their use in research evaluations is feasible and cost-effective for large-scale research assessments (Anfossi et al., 2016).

In addition to the many strengths of bibliometric indicators, several weaknesses have been identified (Biagioli & Lippman, 2020; Rousseau & Rousseau, 2021; Thelwall & Kousha, 2021). Clearly, intellectual achievements are not the same as writing many papers or receiving numerous citations. Indeed, not all scientific results lead to text, and other outcomes such as patents or software may be relevant. Moreover, care is needed when interpreting citation-based indicators, as they are often related to visibility and not necessarily originality. In addition, these indicators are by definition backward-looking and are thus likely to be poor predictors of future original achievements by the research unit or individual researcher. Therefore, using mainstream indicators, such as the h-index, for high-impact decisions about individual researchers should be avoided (Hicks et al., 2015; Moed, 2020). Furthermore, citation numbers can be manipulated, for example, via self-citations at the author and journal levels (Necker, 2014). Here it is interesting to mention the law of Goodhart (1975), which states that when a feature is picked as an indicator, it ceases to function as an indicator because people start to game it. Researchers with better knowledge of bibliometric indicators may also have an edge over their peers without necessarily reflecting on higher-quality research. Not all databases are able to reflect the local research context, such as the multitude of languages spoken and cultural differences. The dominance of English and Western cultures in academic research may imply an under-appreciation of research outcomes in other languages and cultures (Mason et al., 2021). Besides the language bias, research evaluation is also susceptible to multiple other biases such as gender bias (Corsi et al., 2019), a tendency to reward mainstream research (Rafols et al., 2012), reinforcing a discipline’s hierarchy, and leading to a systematic bias in favor of mono-disciplinary research (Brooks & Schopohl, 2018; Tourish & Willmott, 2015). As mentioned by Biagioli and Lippman (2020), publishing in itself is no longer sufficient within the academic world; publications should have an impact, and this impact is measured by a variety of metrics, including citations, views, and downloads. The book by Biagioli and Lippman (2020) examined how the increasing reliance on impact metrics to evaluate publications has produced new forms of academic fraud and misconduct such as citation rings and rigged peer reviews (e.g. Biagioli, 2016; Ferguson et al., 2014).

In response to decades of this and other criticisms (e.g. Gingras, 2016; Haustein & Larivière, 2014), several calls for reforms were launched (see a review by Curry et al. (2020) that lists 15 such initiatives). At the European level, in July 2022 an “Agreement on Reforming Research Assessment” was signed ^① . This agreement includes the main principles introduced in the scoping paper of the European Commission (2021). It reports the ten main principles for assessment reform that include, among other elements, the recognition of the diversity of contributions and careers in research, the support of responsible use of quantitative indicators while focusing on qualitative evaluation, and the abandonment of inappropriate uses of journal- and publication-based metrics. The signatories of this Agreement support “the need to reform research assessment practices.” They shared their vision that “the assessment of research, researchers and research organizations should recognize the diverse outputs, practices and activities that maximize the quality and impact of research.” After all, research evaluation plays an important role in deciding which researchers to recruit, promote, or reward, selecting which research proposals to fund, and identifying which research units and organizations to support. An important question that should be analyzed in this context of research assessment reform is how to apply in practice the “responsible use of quantitative indicators” called for in the Agreement cited above.

An important element to consider regarding the responsible use of research indicators is researchers’ perceptions and awareness of bibliometric indicators. Many studies have investigated researchers’ familiarity, adoption, and opinions about research-related metrics (e.g. Buela-Casal & Zych, 2012; Chen & Lin, 2018; Haddow & Hammarfelt, 2019; Hammarfelt & Haddow, 2018; Hammarfelt & Rushforth, 2017; Kamrani et al., 2021; Ma & Ladisch, 2019; Rousseau & Rousseau, 2017). For example, researchers’ perceptions of citation-based research evaluation are often ambiguous since citations can be seen as a measure of impact, while at the same time, they are criticized for not reflecting actual scientific contribution (e.g. Aksnes, 2006; Aksnes & Rip, 2009; Lemke et al., 2019). As a result, some concepts have been put forward to capture this awareness aspect, including metric-wiseness (Rousseau & Rousseau, 2015) and metric literacy (Dorsch et al., 2021; Maggio et al., 2022). The latter concept has inspired the Metrics Literacies project ^② , which is guided by the overarching question: “How can the understanding and use of scholarly metrics in academia be improved?”. Furthermore, Hammarfelt and Rushforth (2017) proposed the concept of “citizen bibliometrics” to remind researchers and research evaluators that indicators are constantly modified, (re)created, and criticized in the context of their usage. Thus, it is not only relevant to study researchers’ perceptions of bibliometrics, but also how they use and interpret bibliometric indicators in practice when evaluating funding applications, reviewing submissions, and hiring new colleagues (Derrick & Gillespie, 2013; Hammarfelt & Rushforth, 2017). The level of researchers’ perceptions and use of bibliometric indicators have been shown to depend on factors related to the researchers themselves, such as their age (e.g. Kamrani et al., 2021), the research fields in which they are active (e.g. Hammarfelt & Rushforth, 2017; Kulczycki et al., 2018; Söderlind & Geschwınd, 2020), and institutional contexts (e.g. Cheung, 2008).

In this paper, we contribute to the current debate on reforming research assessment by providing a novel empirical measurement of researchers’ knowledge and attitudes towards bibliometric indicators by operationalizing the concept of “metric-wiseness” and discussing the importance of the obtained results for improving current research evaluation systems. As an exploratory case, we analyzed Sapienza University of Rome scholars at the faculty level. Founded in 1303, Sapienza University of Rome is one of the oldest and largest generalist European universities. Focusing on researchers from one university allows us to control for factors related to the regional and institutional context. We find evidence of large heterogeneity at the researchers’ level, with a high presence of supporters and critics of bibliometric indicators within each faculty. We investigate whether these different perceptions of researchers are included in the current reform initiatives of research evaluation. We discuss the importance of considering this diversity in the current debate on research evaluation reforms.

The paper is organized as follows. Section 2 introduces the concept of metric-wiseness in detail. Section 3 presents the methods applied in this study. Section 4 reports the results of the analyses, and Section 5 discusses the obtained results. Section 6 reports some concluding remarks, and the Supplementary material reports all the detailed frequency analyses at the disciplinary level.

2 Background

In the previous section, several advantages and challenges related to the use of responsible indicators in research evaluation were discussed. In the current section, we focus on a specific concept, that is, metric-wiseness, to obtain a proxy for researchers’ knowledge and attitudes towards bibliometric indicators. Next, we discuss the existence of disciplinary differences in academics’ views on metrics. Finally, we highlight how metrics are used in research evaluation in Italy and Sapienza University.

2.1 The concept of metric-wiseness

Rousseau and Rousseau (2015) defined the metric-wiseness of researchers as: “a researcher’s capacity to use the characteristics and formats of scientometric indicators to present one’s true research value”. This definition consists of two parts: one aspect is knowing the existence, mathematical definition and logical implications of scientometrics indicators; while the second one is knowing their proper use.

The term “metric-wiseness” is based on the related concept of “test-wiseness.” As stated by Millman et al. (1965, p.707), “test-wiseness is defined as a subject’s capacity to utilize the characteristics and formats of the test and/or the test taking situation to receive a high score. Test-wiseness is logically independent of the examinee’s knowledge of the subject matter for which items are supposedly measures.” Similar to test-wiseness, metric-wiseness is logically independent of the researcher’s scientific capacities regarding his/her subject matter, which the indicators supposedly measure. Thus, a researcher can be metric-wise or not. However, being metric-wise does not depend on the quality of the researcher in his or her field. Note that this assumption does not hold for scientometric or informetric researchers, whom we assume to be metric-wise by default.

Rousseau and Rousseau (2017) discuss the impact of metric-wiseness on how and why researchers do research but also on how they communicate about their research and research portfolio. Metric-wiseness can be seen as an additional tool useful in reporting one’s research portfolio. It contributes to leveling the playing field and providing a clearer picture of a researcher’s quality. In addition, metric-wiseness can change the research process by influencing the relative weight associated with intrinsic and external research motivations. A strong focus on bibliometric indicators may increase the risk of crowding out intrinsic motivation and magnify some of the adverse effects of a ‘publish or perish’ culture.

As the first approach to measuring a researcher’s degree of metric-wiseness, four components of metric-wiseness are identified (Rousseau & Rousseau, 2017). First, technical knowledge measures the extent to which participants have in-depth knowledge of indicators and can influence indicators. Second, the use of indicators is important and specifically participants’ view on how indicators should be used or not be used. Third, researchers’ intrinsic motivation matters and the extent to which participants believe in indicators as quality measures, as well as their willingness to go beyond indicators and use other research evaluation approaches. Deci and Ryan (2013) described intrinsic motivation as the reward in the activity itself rather than in any external result or consequence of the action. Fourth, external pressure is relevant, and the extent to which participants are forced by their institutions, funding agencies, colleagues, or co-authors to take bibliometric indicators into account in their research and publication activities. To obtain insights into these four components, Rousseau and Rousseau (2017) proposed using a set of 18 test questions in Table 1 of their paper.

Table 1. Description of the dataset (Adapted from Rousseau et al., 2021).

		Sample			Total Sapienza
Faculty	Group	N	% in sample	% in category	N	%
Mathematics, Physics and Natural Sciences	Exact sciences	59	18.85	14.15	417	12.61
Architecture	Engineering & Technology	9	2.88	5.36	168	5.08
Civil and Industrial Engineering		37	11.82	12.80	289	8.74
Information Engineering, Informatics and Statistics		38	12.14	17.12	222	6.72
School of Aerospace Engineering		2	0.64	20.00	10	0.30
Pharmacy and Medicine	Medical sciences	36	11.50	7.81	461	13.94
Medicine and Dentistry		32	10.22	5.50	584	17.66
Medicine and Psychology		29	9.27	8.76	331	10.01
Arts and Humanities	Humanities and Social sciences	38	12.14	10.11	376	11.37
Economics		11	3.51	6.15	179	5.41
Law		4	1.28	4.65	86	2.60
Political Science, Sociology and Communication Science		18	5.75	9.84	183	5.54
TOTAL		313		9.47	3,306

*We report the size of the sample (N), % of faculty in the sample (% in sample = column (3)/313), and % of faculty compared to the relevant group within Sapienza (% in category = column (3)/column (6)). The last two columns show the total number of academics in place in Sapienza on December 31, 2018 (“N”) and their % (“%”) distributed by faculty.

2.2 Disciplinary differences in academics’ views on metrics

Disciplinary differences significantly shape academics’ perceptions and usage of bibliometric indicators, influencing how metrics are interpreted, valued, and applied in research evaluation (e.g. Hammarfelt & Rushforth, 2017; Kulczycki et al., 2018; Lemke et al., 2019; Söderlind & Geschwınd, 2020). In fields such as the natural and life sciences, where quantitative research outputs are prevalent and metrics such as the h-index or journal impact factors are readily applicable, bibliometric indicators are often seen as effective and relatively objective measures of impact and productivity. These fields typically emphasize peer-reviewed journal articles and citation counts, aligning well with established bibliometric measures (Bornmann & Wohlrabe, 2019; Hicks, 2004). Consequently, scholars in STEM fields generally express greater acceptance of metrics in evaluations, viewing them as straightforward proxies for research quality and impact.

Conversely, in the humanities and certain social sciences, reliance on bibliometric indicators is contentious. These disciplines often prioritize monographs, book chapters, and non-journal outputs, which are typically less well-covered by citation databases (Gingras, 2016; Hammarfelt & Rushforth, 2017). The evaluative focus on qualitative contributions over quantitative measures renders standard citation-based metrics less relevant or even inadequate for capturing the scholarly impact in these fields. Thus, humanities and social science scholars are more likely to be critical of metrics, viewing them as overly simplistic and not reflective of disciplinary norms, especially when they are indiscriminately applied across fields (Tourish & Willmott, 2015).

Another relevant factor is the difference in citation practices across disciplines. For example, engineering and applied sciences may have fewer citations owing to shorter citation windows and greater emphasis on conference proceedings over journal articles. This can lead to lower citation counts and impact factors compared to disciplines with longer citation half-lives, such as life sciences (Lin & Fenner, 2013; van Raan, 2005). Consequently, scholars in fields with differing publication and citation norms may view metrics as biased or insufficiently nuanced for fair evaluation across disciplinary boundaries.

Moreover, emerging interdisciplinary fields present additional challenges to metric standardization. These fields often merge methodologies and citation practices from multiple domains, leading to variability in publication types and citation norms (Rafols et al., 2012). Academics in interdisciplinary areas may be particularly concerned that standard metrics will fail to capture the full scope of their contributions, thereby risking undervaluation in institutional evaluations.

Overall, the literature underscores that disciplinary norms shape academics’ receptivity to bibliometric measures, with STEM fields generally showing more alignment with metrics-based evaluations than the humanities, social sciences, or interdisciplinary fields. These variations highlight the need for flexible, context-sensitive research evaluation systems that accommodate diverse disciplinary practices and perspectives on metrics (Hicks et al., 2015).

2.3 Research evaluation in Italy

In Italy, research evaluation has increasingly relied on quantitative metrics, particularly since the beginning of the National Agency for the Evaluation of Universities and Research’s (ANVUR) activities in 2011 (Abramo & D’Angelo, 2023; Bonaccorsi, 2020a, 2020b). ANVUR’s implementation of the “VQR” (Research Quality Evaluation) system underscores Italy’s commitment to a metrics-based approach. The VQR relies heavily on bibliometric indicators such as citation counts and journal impact factors, especially for STEM disciplines (Abramo et al., 2013), while qualitative assessments are more common in humanities and social sciences (Akbaritabar et al., 2021). This evaluation framework aims to ensure accountability, promote excellence, and effectively allocate funding. However, this reliance has also raised concerns about the overemphasis on quantitative metrics, potentially narrowing academic focus and fostering a “publish or perish” culture among Italian scholars. For this reason, the most recent VQR exercise, which is currently in progress, relies less on metrics and more on peer reviews.

At Sapienza University of Rome, one of Italy’s largest and oldest institutions, the application of ANVUR’s metrics has had notable effects on academic life. Sapienza faculty members are evaluated periodically based on their publication output, with particular emphasis on indexed journals and citation-based metrics. Metrics such as the h-index and impact factor influence hiring, promotion, and resource allocation decisions, aligning with ANVUR’s broader objectives. However, as a diverse generalist university, Sapienza experiences challenges in applying these metrics universally across disciplines, particularly in fields where traditional bibliometric measures may not accurately capture scholarly impact (Abramo & D’Angelo, 2023; Checchi et al., 2020).

This context underpins the empirical results in the study by illustrating the prevailing metric-driven environment in Italian academia, specifically at Sapienza. This reliance on metrics has significant implications for research practices, academic behavior, and career progression. Understanding these implications helps readers gauge the broader applicability of this study’s findings beyond the immediate case.

3 Methods

The methods used in this study include (i) an exploratory survey of researchers at La Sapienza University of Rome, (ii) a descriptive frequency analysis of the survey responses, (iii) a series of chi²-tests to analyze the significant difference in the patterns across disciplines, and (iv) an exploratory document analysis of recent initiatives for responsible research evaluation.

3.1 Survey design

As a first step in developing a reliable and useable scale to measure researchers’ metric-wiseness, an online survey was developed in Qualtrics to gather data on the publication preferences of professors affiliated with Sapienza University of Rome. The survey was developed in English and asks for information about the socio-demographic characteristics of the participants, their function and research discipline as well as past publication behavior, their attitudes towards and knowledge of bibliometric indicators, and a discrete choice experiment designed to learn about publication preferences. More information on the discrete choice experiment and its analysis can be found in Rousseau et al. (2021).

In order to measure metric-wiseness, we use the eighteen statements proposed by Rousseau and Rousseau (2017) as a starting point in our study. Besides some minor changes to improve the clarity of the statements, we also added two new statements to the test of component 2 and five statements to component 4 after several brainstorming sessions with researchers at Sapienza University. The complete list of statements can be found in the tables in the Supplementary material.

Finally, as we aim to obtain a baseline measurement of researchers’ familiarity with and attitudes towards bibliometric indicators, we do not provide definitions and information about the specific indicators. The survey thus allowed us to create a picture of the situation “as is” at Sapienza University.

3.2 Data collection

In 2018, Sapienza University was home to 112,557 students, 3,411 academics, 2,306 employees, technicians, librarians, and 1,812 administrative staff in the associated university hospitals. After several test rounds, the survey was finalized and distributed via email on November 22, 2018. Using a similar approach as Kamrani et al. (2021), we focused on university professors as the sample population (excluding other academic staff, doctoral, and post-doctoral researchers) because we want to concentrate on people who have an established career path in academia and who are highly oriented towards publishing their research results. We contacted all faculty members of Sapienza (N=3,306) through a general mailing list. No reminders were sent. Data collection was terminated on January 31, 2019.

Overall, 502 participants (15.2%) started the survey, of which 43 stopped after reading the introduction, 33 stopped describing their field of expertise (7^th question), and 94 stopped when asked to provide information about their publications in the past three years. Finally, 313 (9.5%) were useful, though not fully complete, responses and 263 (8.0%) were complete. Thus, the response rate was quite low. However, this is not unusual for online surveys (Fan & Yan, 2010).

In total, 64% male and 36% female researchers participated in the study. All participants, except for three, had the Italian nationality. The average age of the participants was 52.7 with a minimum of 29 and a maximum of 76. The participating researchers were affiliated with 11 different faculties plus the School of Aerospace Engineering, with the largest proportion (19%) from the Faculty of Mathematics, Physics, and Natural Sciences, followed by the Faculty of Arts and Philosophy (12%) and the Faculty of Information Engineering, Computer Science, and Statistics (12%) (Table 1). The participants had different functions, with associate professors as the largest group (39%), followed by the group of assistant professors (35%) (see Table 2). To compare the faculty, function, and gender distributions of the sample with the population, we use chi-squared goodness-of-fit tests. These tests allow us to reject the null hypothesis that the distribution of the sample over faculties is equal to the population distribution (chi² = 46.11, p=0.0000) and not reject the null hypothesis for the distribution of functions (chi² = 13.77, p=0.0554) and gender in our sample (chi²=1.86, p=0.3952).

Table 2. Distribution of participants by function and gender compared with the distribution of the population of Sapienza University on December 31, 2018 (“Total N” and “%”). (Adapted from Rousseau et al., 2021).

Function	N. of resp.	% in sample	% sample compared to total N in category	Total N	%
Full professor	60	19.17	8.90	674	20.39
Associate professor	121	38.66	10.42	1,161	35.12
Full-time assistant professor	69	22.04	6.26	1,102	33.33
Contract professor (professori incaricati)	8	2.56	-	-	-
Temporary assistant professor L. 230/05	1	0.32	-	-	-
Temporary assistant professor L 240/10 Tipo A	23	7.35	10.85	212	6.41
Temporary assistant professor 240/10 Tipo B	16	5.11	10.19	157	4.75
Other (retired)	15	4.79	-	-	-
Gender
Male	199	63.58	10.04	1,982	59.95
Female	113	36.10	8.53	1,324	40.05
X	1	0.32		-	-

*We report the sample size (N of resp.), % of function/gender in the sample (% in sample = column (2)/313), and % of function/gender compared to the relevant group within Sapienza (% sample compared to total N in category = column (2)/column (5)). The last two columns show the total number of academics in place in Sapienza on December 31, 2018 (“N”) and their % ( “%”) distributed by function and gender.

3.3 Exploratory document analysis

To explore how recent initiatives for responsible research evaluation address the themes of diversity and heterogeneity in metrics use, we conducted an exploratory analysis of four influential documents: the DORA Declaration (2012), Leiden Manifesto (Hicks et al., 2015), Hong Kong Principles (Moher et al., 2020), and European Commission Scoping Document on Research Reform (2021). These documents were selected for their prominence in the field of research assessment reform, each representing a widely recognized framework that proposes principles and practices for more responsible, context-sensitive evaluation systems. The four selected documents served as representative frameworks within the discourse on responsible metrics, each contributing distinct perspectives.

- The DORA Declaration (2012) was one of the earliest formal statements advocating for a reduction in the reliance on journal-based metrics and proposing an alternative, multifaceted approach to research evaluation;

- The Leiden Manifesto (Hicks et al., 2015) introduces ten principles aimed at promoting the ethical and fair use of research metrics across fields and evaluation systems, widely endorsed by institutions worldwide;

- The Hong Kong Principles (Moher et al., 2020) emphasize transparency, accountability, and ethical considerations in research evaluation, with a focus on integrity in scholarly contributions;

- The European Commission Scoping Document (2021) represents the recent EU approach to research reform, calling for inclusive evaluation practices that recognize the diversity of research contributions and institutional contexts across member states.

This selection allows for an analysis that captures both foundational and recent perspectives on responsible metrics, offering a comprehensive view of the guiding principles currently shaping research evaluation reforms. To analyze how these initiatives treat the concepts of diversity and flexibility in research assessment, we examined keywords relevant to recognizing varied contributions: “heterogeneity,” “diversity,” “variety,” “multiplicity,” and “dissimilarity.” These terms were selected because they reflect the central themes of inclusiveness and adaptability in evaluation practices, aligning with our study’s focus on the need for metrics that respect disciplinary differences and individualized assessment approaches. Beyond counting occurrences, we conducted qualitative content analysis to understand the context in which each keyword appears. This approach enables us to identify not only the presence of these terms but also their role in broader evaluative frameworks. For example, we examined whether these terms appear in key sections of the documents, such as guiding principles, recommendations, or specific evaluative criteria, thereby highlighting the emphasis each document places on diversity in research evaluation.

4 Results: Measuring researchers’ heterogeneity

In this section, we show the results of the empirical estimates of the four components of metric-wiseness and some other relevant aspects of research evaluation for researchers at Sapienza University. First, we checked the internal consistency of the scale statements measuring the different components via Cronbach’s alpha (see Table 3). As the Cronbach’s alphas are lower than 0.7 for each of the components, the measurement scales are not internally consistent. Therefore, we used a descriptive approach to compare the patterns across disciplines by a series of chi²-tests and the patterns within disciplines through a frequency analysis of the answers.

Table 3. Cronbach’s alpha for the components capturing metric-wiseness.

		Cronbach’s alpha
Component 1	Technical knowledge of indicators	Not applicable (true - false statements)
Component 2	Use of indicators	0.3278
Component 3	Researchers’ intrinsic motivation	0.3735
Component 4	External pressure	0.6135

In the main text, we present the aggregate responses for the entire sample based on 263 complete responses. In the Supplementary material, we provide all the detailed tables, reporting the distribution of all responses available per faculty. We use the following notation for scale statements in the remainder of the text (and in the Supplementary material): first, we refer to one of the four components of metric-wiseness as and next, we refer to indicators by using the capital I and to statements by using the capital S followed by a number. For example, C1.I1 refers to the first indicator (I1) of the first component of metric-wiseness (C1 - Technical knowledge of indicators), which is the ISI journal impact factor provided by the Web of Science. In another example, C2.S1 refers to the first statement (S1) of the second component of metric-wiseness (C2 - Use of indicators) which reads as “Bibliometric indicators are equally useful in evaluating disciplinary and interdisciplinary research.”

4.1 Component 1: Technical knowledge of indicators

Even with a sample of experienced researchers - all participants are faculty members of a university - Figure 1 shows that there is still a high level of ignorance about bibliometric indicators, especially the more recent ones. While four in five participants were familiar with the ISI journal impact factor provided by Clarivate through the Web of Science and the Hirsch index, the reverse holds for the eigenfactor and altmetrics scores. For example, 191 (72.6%) respondents did not know altmetrics. Based on the detailed tables in the Supplementary material (Tables A.C1.I1 to A.C1.I5), we observe researchers that have a high technical knowledge and researchers with limited technical knowledge in each faculty. Still, the chi²-tests reveal that there are significant differences in the patterns across faculties, with the social sciences and humanities being less familiar with the technical aspects of bibliometric indicators than other disciplines.

View original graphic|Download|PPT slide

Figure 1. Technical knowledge of indicators (N=263).

In addition to investigating knowledge about the calculation of indicators, we present four true/false statements about bibliometric indicators. Table 4 describes the answers and reports the percentage of correct answers (sum of participants who were sure and those who were unsure) in the sample in the last column. While a large majority (76%) of the sample knew that mainstream bibliometric indicators cannot be easily compared across disciplines, only a minority (38%) knew that citations received in conference proceedings are not always included in an article’s total number of received citations in the Web of Science database. Again, answer patterns include the full range of answer options for all faculties, while the patterns significantly differ across faculties based on chi²-tests.

Table 4. Technical knowledge of indicators (ranked according to the number of correct answers).

N = 263	True, I am sure	True, I think	I do not know	False, I think	False, I am sure	Correct answers
C1.S1 - Bibliometric indicators can easily be compared across disciplines [FALSE]	8	21	34	95	105	200
	(3.0%)	(8.0%)	(12.9%)	(36.1%)	(39.9%)	(76.0%)
C1.S2 - Open Access journals never have a Web of Science impact factor [FALSE]	7	16	78	82	80	162
	(2.7%)	(6.1%)	(29.7%)	(31.2%)	(30.4%)	(61.6%)
C1.S3 - On average older researchers have higher h-indices [TRUE]	54	96	52	49	12	150
	(20.5%)	(36.5%)	(19.8%)	(18.6%)	(4.6%)	(57.0%)
C1.S4 - Citations received in conference proceedings are always included in an article’s total number of received citations in the Web of Science [FALSE]	12	35	117	68	31	99
	(4.6%)	(13.3%)	(44.5%)	(25.9%)	(11.8%)	(37.6%)

Thus, we cannot assume that all researchers have the same level of technical knowledge of the bibliometric indicators. This implies that the first component of the metric-wiseness scale is not trivial for several researchers to comply with.

4.2 Component 2: Attitudes towards the use of indicators

Besides having technical knowledge about indicators, researchers also have insights into how bibliometric indicators should or should not be used in the research evaluation process. Such deeper knowledge of the interpretation and usefulness of indicators can incentivize researchers to use indicators to reflect their quality as researchers. However, knowledge can also lead to misuse by manipulating publications and citations to present a more attractive research portfolio for review, without actual foundations (Moed, 2006; Rousseau et al., 2018). Thus, to gain insight into researchers’ perspectives regarding the use of indicators, we presented them with six statements and measured their agreement or disagreement with these statements using a five-point Likert scale. The results are shown in Table 5.

Table 5. Component 2 was ordered by the increasing number of participants who agreed.

N = 263	Absolutely agree	Agree	Neutral	Disagree	Absolutely disagree
C2.S1 - Bibliometric indicators are equally useful in evaluating disciplinary and interdisciplinary research	7	46	79	87	44
	(2.7%)	(17.5%	(30.0%)	(33.1%)	(16.7%)
C2.S2 - Besides citation-based indicators, one must, in applied fields, also take patent-based and similar indicators into account when evaluating researchers	22	94	108	22	17
	(8.4%)	(35.7%)	(41.1%)	(8.4%)	(6.5%)
C2.S3 - The social influence of research must be taken into account in evaluating researchers	34	81	83	45	20
	(12.9%)	(30.8%)	(31.6%)	(17.1%)	(7.6%)
C2.S4 - Besides citation-based indicators, one must also take journal standing within a field into account	41	130	64	18	10
	(15.6%)	(49.4%)	(24.3%)	(6.8%)	(3.8%)
C2.S5 - A purely bureaucratic, automatic and quantitative approach to research evaluation is unbiased for an individual researcher	62	39	42	47	73
	(23.6%)	(14.8%)	(16.0%)	(17.9%)	(27.8%)
C2.S6 - The quality of a researcher should be measured in relative terms within a field rather than in absolute terms	126	108	22	4	3
	(47.9%)	(41.1%)	(8.4%)	(1.5%)	(1.1%)

Again, we find evidence of heterogeneity in researchers’ perceptions, as in each of the faculties, we have answers covering the whole range of the Likert scale. Based on the chi²-test results (see Supplementary material Tables A.C2.S1 - A.C2.S6), the pattern of heterogeneity is similar across disciplines for four statements (S2, S3, S4, and S5), while it seems to be different for two of the six statements (S1 and S6). Hardly any participating researcher (3%) disagreed with statement C2.S6 that the quality of a researcher should be measured in relative terms within a field rather than in absolute terms. Still, some differences between patterns per faculty can be observed with participants from the social sciences and humanities (15%) and medical sciences (19%) more likely to disagree or to be neutral regarding this statement. As statement C2.S1 (bibliometric indicators are equally useful in evaluating disciplinary and interdisciplinary research) is related to statement C2.S6, we found a similar but less extreme (mirrored) image with only 20% of the sample, in agreement with this statement. Opinions seem most diverse for statement C2.S4, stating that the social influence of research must be taken into account when evaluating researchers, with 44% agreeing and 25% disagreeing.

4.3 Component 3: Intrinsic motivation of researchers

The extent of researchers’ metric-wiseness can affect the balance between intrinsic and extrinsic research motivations (Rousseau & Rousseau, 2017). This may lead to a crowding out of intrinsic motivational factors for conducting research. Thus, research topics and publication avenues are no longer selected out of a desire to contribute to the universal pool of knowledge but to optimize bibliometric indicator levels. To measure the importance of intrinsic research motivations (component 3), we use three statements (see Table 6), while the impact of external research pressures is captured in component 4 (see Section 4.4). Note that we interpret intrinsic motivation as a focus on doing research and looking for solutions for their own sake rather than a focus on external rewards (status, money, and publications). This leads to openness towards knowledge and technology transfer (Olaya Escobar et al., 2017). We found that a clear majority (77%) agreed with the statement that they selected research topics based on their potential to advance science, while only 30% agreed that they selected research problems inspired by their own curiosity (Table 6). Again, heterogeneity was present, and the full range of possible answers was used by participants. The chi²-tests reported in the Supplementary material (Tables A.C3.S1 - A.C3.S3) did not indicate differences in the response patterns across faculties.

Table 6. Component 3 statements were ordered in increasing levels of absolutely agreeing participants.

N = 263	Absolutely agree	Agree	Neutral	Disagree	Absolutely disagree
C3.S1 - If I do not have the expertise to solve a particular problem, I do not hesitate to ask a colleague to collaborate with me	14	25	68	80	76
	(5.3%)	(9.5%)	(25.9%)	(30.4%)	(28.9%)
C3.S2 - I select research problems inspired by my own curiosity	29	49	48	51	86
	(11.0%)	(18.6%)	(18.3%)	(19.4%)	(32.7%)
C3.S3 - I select topics for research based on their potential to advance science	82	120	49	7	5
	(31.2%)	(45.6%)	(18.6%)	(2.7%)	(1.9%)

4.4 Component 4: External pressure

The fourth and last component of metric-wiseness aims to measure the impact of external factors in the research decision process (Rousseau & Rousseau, 2017). External pressure can come from the researcher’s institution, funding institutions, and specific characteristics of the research evaluation process. We used 12 statements (see Table 7) as a proxy for a wide set of external factors and to cover publication practices. While a large majority confirms the key role of Sapienza University (90%, 236 respondents) and the Italian Ministry of Education and Research (73%, 193 respondents) in their publication strategies, they do not seem to feel constrained by this as an equality large proportion (86%, 227 respondents) also agrees with the following statement “I feel completely free to publish my research in any way I want.” The role of the Italian National Agency for the Evaluation of Universities and Research Institutes (ANVUR) is less clear and reveals a wide variety of answers (Table 7 - C4.S7).

Table 7. Component 4 statements were ordered in increasing levels of absolutely agreeing participants.

N = 263	Absolutely agree	Agree	Neutral	Disagree	Absolutely disagree
C4.S1 - My institution influences how I communicate the results of my research	10	52	87	70	44
	(3.8%)	(19.8%)	(33.1%)	(26.6%)	(16.7%)
C4.S2 - I feel ‘publish or perish’ pressure in carrying out my research	12	48	96	70	37
	(4.6%)	(18.3%)	(36.5%)	(26.6%)	(14.1%)
C4.S3 - I select topics for research based on their potential to get published quickly	15	35	84	88	41
	(5.7%)	(13.3%)	(31.9%)	(33.5%)	(15.6%)
C4.S4 - It is important to use social media (Twitter, blogs…) to share the results of my research	27	104	88	36	8
	(10.3%)	(39.5%)	(33.5%)	(13.7%)	(3.0%)
C4.S5 - It is important to use academic research networks (Mendeley, ResearchGate…) to share the results of my research	30	90	65	59	19
	(11.4%)	(34.2%)	(24.7%)	(22.4%)	(7.2%)
C4.S6 - My likelihood of being promoted depends only on the number of articles published in journals indexed in WoS or Scopus	38	96	55	55	19
	(14.4%)	(36.5%)	(20.9%)	(20.9%)	(7.2%)
C4.S7 - ANVUR influences my publication strategies	41	58	51	49	64
C4.S7 - ANVUR influences my publication strategies	(15.6%)	(22.1%)	(19.4%)	(18.6%)	(24.3%)
C4.S8 - Open Science (including publication, conservation and reuse of research data) is relevant for my research	42	95	101	21	4
	(16.0%)	(36.1%)	(38.4%)	(8.0%)	(1.5%)
C4.S9 - My likelihood of being promoted depends mainly on the number of articles of which I am first or corresponding author	53	104	68	32	6
	(20.2%)	(39.5%)	(25.9%)	(12.2%)	(2.3%)
C4.S10 - I feel completely free to publish my research in any way I want	99	128	27	7	2
	(37.6%)	(48.7%)	(10.3%)	(2.7%)	(0.8%)
C4.S11 - The Ministry of Education and Research (MIUR) influences my publication strategies	107	86	32	29	9
	(40.7%)	(32.7%)	(12.2%)	(11.0%)	(3.4%)
C4.S12 - My institution (Sapienza) influences my publication strategies	144	92	19	6	2
	(54.8%)	(35.0%)	(7.2%)	(2.3%)	(0.8%)

In line with the results of the other three components, we observe a wide range of opinions and perceptions regarding the fourth component within faculties and, to a lesser extent, across faculties (see Tables A.C4.S1 - A.C4.S12 in Supplementary material). A surprising result is the response pattern related to statement S2 “I feel ‘publish or perish’ pressure in carrying out my research” as only 23% of the participants agreed with this statement.

4.5 Additional relevant results from the survey

In this section, we present some interesting results related to researchers’ motivations for publishing (Table 8) and familiarity with a set of four calls to reform the research evaluation process (Table 9).

Table 8. The most important motivation for publishing (participants could select 3 out of 16 listed items).

N = 313	N	%
To contribute to the scientific progress in your discipline	177	57%
To share your research findings with the academic community	146	47%
To improve your chances of receiving research funding	95	30%
Your personal intrinsic motivation	84	27%
To increase your chances to be promoted	69	22%
To improve your standing among your peers	55	18%
To help others (e.g. doctoral students, project collaborators...)	54	17%
To increase your probability of finding a new position	32	10%
To improve your standing in your current institution	30	10%
To make your current position permanent	26	8%
To increase the prestige and the resources allocated to your department	23	7%
To improve the standing of your institution	19	6%
To share your research findings with policymakers and practitioners	18	6%
To fulfill project requirements	16	5%
To fulfill administrative requirements	16	5%
To get a monetary reward	2	1%

Table 9. Familiarity with calls to reform existing research evaluation practices.

Do you know…? (N = 263)	Yes, I know this		Yes, I have heard about this but do not know its content		No, I don’t know this
The DORA declaration	27	(10.3%)	29	(11.0%)	207	(78.7%)
The Leiden Manifesto	25	(9.5%)	41	(15.6%)	197	(74.9%)
Responsible metrics	14	(5.3%)	42	(16.0%)	207	(78.7%)
Metric Tide report	7	(2.7%)	22	(8.4%)	234	(89.0%)

Looking at the reasons for publishing research results (Table 8), we see that intrinsic motivational elements dominate, with contributing to scientific progress being the most frequently selected (57%), followed by sharing findings with the academic community (47%). However, personal benefits, such as obtaining funding (30%) or increasing the likelihood of getting promoted (22%), also play an important role.

Next, we tested participants’ familiarity with four calls to reform research evaluation practices: the DORA declaration (DORA, 2015), the Leiden manifesto (Hicks et al., 2015), the Metric Tide report (Wilsdon et al., 2015), and the concept “responsible metrics” outlined in the latter report. Responsible research metrics refer to the appropriate and responsible use of quantitative indicators in the assessment of research performance, and the concept is thus closely related to the second component of metric-wiseness. Table 8 shows that a large majority of the sample is unfamiliar with any of these reform initiatives. In the next section, we analyze in more detail the contents of the DORA declaration (DORA, 2015) and Leiden Manifesto (Hicks et al., 2015), as well as two recent initiatives, the Hong Kong Declaration (Moher et al. 2020) and the European Commission (2021) Scoping document on Research Reform.

4.6 Results of the qualitative content analysis

We used exploratory content analysis to check if and how the heterogeneity of researchers’ knowledge and perception of bibliometric indicators is included in recent initiatives. Table 10 shows the results of the keyword analysis carried out on four documents: the DORA declaration (DORA, 2015), the Leiden Manifesto (Hicks et al., 2015), the Hong Kong Declaration (Moher et al., 2020), and the European Commission (2021) Scoping document on Research Reform. This provides us with anecdotal evidence of the importance of heterogeneity among researchers in research evaluation.

Table 10. Keyword search analysis of four documents of recent research evaluation reform initiatives.

Initiative document	Number of times the keyword “heterogeneity” or “diversity” or “variety” or “multiplicity” or “dissimilarity” is present in the document
DORA declaration (DORA, 2015)	Variety 1 time cited (“6. Greatly reduce emphasis on the journal impact factor as a promotional tool, ideally by ceasing to promote the impact factor or by presenting the metric in the context of a variety of journal-based metrics”.)
Leiden Manifesto (Hicks et al., 2015)	No keywords cited
Hong Kong Declaration (Moher et al., 2020)	Variety 2 times cited but with a generic meaning of many not related to evaluation principles (“Selective publishing of research with positive results (i.e. publication bias) distorts science’s evidence base and has been demonstrated in a variety of disciplines including economics, psychology, and clinical and preclinical health research”; “The Center for Open Science’s Transparency and Openness Promotion initiative provides information on data transparency standards for a wide variety of discipline journals”.) Diversity 3 times cited (“We present five principles: responsible research practices; transparent reporting; open science (open research); valuing a diversity of types of research; and recognizing all contributions to research and scholarly activity”; “Some funders have already recognized the relevance of a broad range of research activities. The Research Impact Assessment Platform (Researchfish) works to capture some of this diversity and can generate reports on the impact of a broad spectrum of funded research”; “The HKPs do not address gender and other forms of diversity, inclusiveness, and related issues”.)
European Commission (2021) Scoping document	Variety 2 times cited (“Career assessment should take into account the variety of activities of academics such as teaching, research, entrepreneurship, management or leadership”; “To achieve excellent and relevant higher education, support is also needed to stimulate pedagogical innovation, focused on the learners, with a variety of learning spaces and flexible, interdisciplinary paths”.) Diversity 10 times cited (“Foster diversity, inclusiveness and gender equality”, “Develop a European framework for diversity and inclusion, including on gender gaps, identifying challenges and solutions for universities, and the needed support of public authorities”, “To encourage universities to implement institutional change through concrete measures for diversity and inclusion, including voluntary, quantified targets for inclusion and inclusive gender equality plans…”; “Universities are key to promote active citizenship, tolerance, equality and diversity, openness and critical thinking for more social cohesion and social trust, and thus protect European democracies”; “To support the diversity within the European higher education sector”; “The diversity and international standing of the EU education systems” the different types of higher education institutions are all hallmarks of our European way of life. This diversity is a strength, as it allows for choice and for creativity and synergy through mobility and cooperation”, “The European Union and Member States have a shared interest in supporting the higher education sector by joining their forces around a joint vision for the higher education sector, building on the richness of its diversity”; “Diversity, inclusiveness and gender equality in the higher education sector have become more important than ever”; “support universities as lighthouses of our European way of life:… 2) diversity and inclusion”.)

In the DORA declaration, “Variety” is mentioned one time but in a context that refers to publishers of bibliometric indicators rather than researchers. In the Leiden manifesto, none of the keywords referring to heterogeneity, diversity, and variety could be found. In the Hong Kong Declaration, diversity is mentioned in the explanation of Principle 5: “Value a range of other contributions to responsible research and scholarly activity, such as peer review for grants and publications, mentoring, outreach, and knowledge exchange.” However, again, this does not refer to the heterogeneity of researchers’ perceptions and knowledge about bibliometric indicators. Finally, the European Commission (2021) Scoping document includes more of the keywords than all the other initiatives, reporting 2 mentions of “variety” and 10 mentions of ‘diversity’, as detailed in Table 9. However, in the European Commission’s (2021) Scoping document, there is no reference to the individual heterogeneity of researchers in terms of perceptions and knowledge of bibliometric indicators.

5 Discussion

As we have seen in the previous section, the survey study of Sapienza University revealed a large amount of heterogeneity regarding the level of metric-wiseness of researchers for all four components. In contrast to previous literature (e.g. Hammarfelt & Rushforth, 2017; Kulczycki et al., 2018; Söderlind & Geschwınd, 2020), the heterogeneity observed in this study is not clearly linked with observable criteria, such as the faculty to which the researchers belong. The non-probabilistic sampling technique used for this survey did not allow us to generalize our findings to the full population of researchers. These insights should be viewed as indicative and interpreted with care.

The studied researchers have different knowledge, opinions, and perceptions about bibliometric indicators and how they should be used. For each of the 30 statements used in this study, each of the scale ankers was selected by at least one respondent (see Figure 1, Table 4, Table 5, Table 6, and Table 7), revealing the wide and consistent variation in both researchers’ knowledge and perceptions. In addition, this variation was found to be statistically different across faculties for the first component of metric-wiseness, that is, technical knowledge, with the exception of the definition of Altmetrics, which the majority of the sample did not know (see Supplementary material). The variation was not statistically different across faculties for the third component, that is, intrinsic motivation. The results for the second and fourth components show a mixed picture with similar patterns for some statements (e.g. C2.S4 - Besides citation-based indicators, one must also take journal standing within a field into account, and C4.S2 - I feel ‘publish or perish’ pressure in carrying out my research) and not for other statements (e.g. C2.S1 - Bibliometric indicators are equally useful in evaluating disciplinary and interdisciplinary research, and C4.S6 - My likelihood of being promoted depends only on the number of articles published in journals indexed in the Web of Science and/or Scopus).

As discussed in the literature review, the heterogeneity in researchers’ metric-wiseness—the varying levels of knowledge, attitudes, and application of bibliometric indicators across academic disciplines and individual researchers—can be explained through a combination of disciplinary norms, career incentives, and cultural-cognitive factors that shape researchers’ beliefs and values about metrics. These components often determine whether researchers view metrics as beneficial tools for evaluation or overly simplistic measures that may distort academic priorities. We discuss four relevant determinants of heterogeneity in researchers’ metric-wiseness:

• Disciplinary norms and publication practices

The disciplinary context strongly influences how metrics are perceived and valued. In fields such as life sciences, engineering, and physics, where publication and citation norms align closely with journal-based indicators, metrics such as the h-index or journal impact factors are more commonly accepted. In these fields, researchers often support bibliometric indicators as useful measures of visibility, relevance, and research impact because they correspond well with existing disciplinary practices that emphasize frequent publications in high-impact journals (Hammarfelt & Rushforth, 2017; Hicks et al., 2015). In contrast, disciplines such as the humanities and parts of the social sciences, where the value of work is often assessed qualitatively through peer review or based on broader societal impacts, view bibliometric indicators as less relevant or even reductive. For these researchers, metrics fail to capture essential outputs such as monographs or policy contributions, leading to greater criticism of metrics-based evaluations.

• Career stage and institutional incentives

Support for or opposition to metrics can also be explained by career-related incentives. Early-career researchers in metrics-intensive fields may view bibliometric indicators positively, as they align with clear pathways to career advancement based on publication and citation counts. Conversely, senior researchers, who may be more established and less dependent on meeting metric-based performance indicators, might be more critical, particularly if they have witnessed the unintended consequences of over-reliance on metrics, such as a focus on quantity over quality. Additionally, institutional pressures can amplify metric-wiseness, with institutions emphasizing metrics in evaluations, promotions, and funding decisions. Researchers in institutions that strongly promote metrics may show greater acceptance owing to pragmatism, even if they question the broader implications of metric-centric evaluations (Ma & Ladisch, 2019).

• Cultural cognition and normative beliefs

Cultural-cognitive frameworks provide another layer of explanation, as discussed by Guba (2024), in the context of sociologists who adopt bibliometric indicators despite disciplinary misalignments. According to Guba, cultural cognition and normative beliefs can shape whether researchers view metrics as fair and legitimate evaluation tools. Researchers who see value in standardized, ostensibly objective measures may support bibliometric indicators as tools that ensure accountability and comparability across contexts. This perspective aligns with normative beliefs that favor merit-based evaluations, where metrics provide transparent and reproducible standards. Conversely, critics may prioritize values such as academic freedom, intellectual diversity, and epistemic richness, leading them to reject the notion that research quality can be fully quantified through metrics alone (Hicks et al., 2015).

• Interest-based explanations

Interest-based perspectives, as outlined by Guba (2024), suggest that researchers’ stances on metrics may also be driven by self-interest. Those whose research is more easily captured by existing metrics, and thus stands to gain from metrics-focused evaluations, may show greater support. Conversely, researchers whose work aligns poorly with standard bibliometric measures, such as interdisciplinary research, policy-driven work, or locally focused studies, may view metrics as insufficient and biased. Thus, researchers’ interests, shaped by how well metrics align with their work’s visibility and evaluative success, can influence their stance on bibliometric indicators.

Given the complex and multifaceted nature of metric-wiseness heterogeneity, reform efforts should consider flexible approaches to research evaluation that respect disciplinary differences, while acknowledging the normative and interest-based dimensions of metric support. Our findings suggest that any reform in the academic evaluation system should account for this heterogeneity for several reasons. First, any reform should be shared and agreed upon by the majority of scholars to be fair and inclusive. In addition, general acceptance and support for the proposed reforms are key to guaranteeing effective implementation. Researchers need to be aware of the issues the reforms aim to tackle and agree with the general approach set out by the reform in order to be willing to accept the funding, hiring, and promotion decisions resulting from the reformed research evaluation strategies. Taking into account researchers’ metric-wiseness may be helpful in the context of reforming research assessment because it plays a relevant role in the academic environment both as a tool and as a motivator. Looking at metric-wiseness as a tool allows researchers, for example, to create a unique researcher identification (e.g. ORCID) and to display their complete research portfolio in a convincing way. If every researcher became metric-wise, research assessment processes would potentially be less distorted, and the advantage of more knowledgeable researchers would be reduced. Metric-wiseness as a motivator supports the Hawthorne effect, that is, the fact that knowing that one will be evaluated may already lead to a change in behavior (Adair, 1984). Finally, metric-wiseness can magnify the adverse effects associated with the ‘publish or perish’ culture (Rousseau & Rousseau, 2017).

From our analysis of the documents summarized in Table 10, we find no reference to individual heterogeneity concerning researchers’ metric-wiseness. The analysis of these initiatives provides a context for understanding researchers’ perceptions of metrics within our survey data. Specifically, while these documents support a move towards diverse evaluation practices, the degree to which they address researchers’ actual experiences and concerns regarding bibliometric indicators remains variable. Our survey results, which highlight significant heterogeneity in researchers’ attitudes towards metrics, underscore the importance of not only advocating for diversity in principle but also implementing policies that accommodate this diversity at the level of individual and disciplinary needs. While this document analysis provides initial insights into how responsible metric initiatives address themes of diversity and flexibility, a more comprehensive study could involve a detailed thematic analysis of each document. Such an analysis would examine how terms related to diversity and heterogeneity are contextualized within each framework, providing a deeper understanding of how these initiatives operationalize the principles of responsible metrics. This analysis is left to future studies that could complement our survey findings by offering a richer view of how institutional and policy-level efforts alignor misalignwith the lived experiences of researchers regarding metric-based evaluations.

Some implications of the measurement of the four components of metric-wiseness for reform initiatives include the following. First, it is important to provide a strong foundation and strengthen researchers’ basic knowledge of bibliometric indicators. For example, technical training on bibliometric indicators and research evaluation challenges should be included in doctoral courses. State-of-the-art training for practitioners and people involved in research evaluation processes should also be provided. However, it is important to balance the benefits of such training with the additional burden placed on research staff, who are often overworked and dissatisfied in the context of decades of reform within the New Public Management framework. Academics feel limited in their autonomy and often perceive quality and performance measurement as an administrative burden of ticking boxes (e.g. Broucker & De Wit, 2015; Schubert, 2009). Second, it is important to allow for flexibility and a tailored assessment approach by creating a menu of evaluation tools that balances both quantitative and qualitative measures: for example, use of narrative CVs, biological sketches (also called biosketches) with a structured approach focusing on a limited number of key messages or competences, a list of relevant publications, and bibliometric indicators. The weight of a specific tool can be adapted, depending on the context of the research assessment. For instance, career decisions for individual researchers should not depend only on quantitative bibliometric indicators (Hicks et al., 2015) but should be complemented with qualitative approaches such as a biosketch. Finally, it is important to note that our case study reveals that allowing disciplinary specialization in research assessment is insufficient to address heterogeneity, as we found evidence of different opinions and knowledge about bibliometric indicators within and across disciplines.

6 Some concluding remarks

Our explorative analysis of the Sapienza sample and the documents about recent initiatives to reform research evaluation systems showed that there is strong evidence of the heterogeneity of researchers’ metric-wiseness, and that this has to be taken into account to apply in practice the principles stated in the initiatives to reform research evaluation. Our results suggest that there can be no universal support for a specific reform, as long as the heterogeneity of researchers’ perceptions and knowledge about bibliometric indicators is not taken into account. However, other cases must be studied before scaling up these insights. Still, we advocate that any reform should allow flexibility and avoid extreme positions, such as fully quantitative or fully qualitative assessments of research and researchers. Thus, it is important not to blindly use only bibliometric indicators but also not to fully focus on peer review or qualitative evaluations. For example, Guba et al. (2023) investigated the use of indicators as a screening device in the evaluation of grant proposals. There is a need to balance the use of indicators, taking into account the aims of the evaluation. We support a menu of assessment possibilities (ranging from purely quantitative indicators to purely qualitative and narrative approaches) in opposition to the one-size-fits-all approach. The selection of items from this menu should be done according to the aim of the assessment, taking into account the diversity of scholars’ attitudes and cultures towards bibliometric indicators.

Furthermore, we present some suggestions for addressing the heterogeneity in metric-wiseness. First, it is important to tailor metrics to disciplinary contexts. Research institutions should use metrics selectively and adaptively with discipline-specific benchmarks that account for differing publication practices and impact criteria. This would alleviate the concerns of researchers who feel that standard metrics overlook the unique contributions of their fields. Second, the introduction of balanced evaluation models is key and can be obtained by combining quantitative indicators with qualitative evaluations, such as narrative CVs and peer assessments. This allows researchers in diverse fields to showcase their achievements without relying solely on bibliometric indicators. This balance can help avoid the reduction of research quality to metric-based measures alone, preserving room for qualitative assessments valued in the humanities and social sciences. Third, supporting metric literacy and awareness can play an important role. Training researchers on the appropriate use and limitations of metrics can help cultivate a critical but informed approach to bibliometric indicators. By fostering metric literacy, institutions can support researchers in understanding how metrics may affect their evaluations, thereby reducing the likelihood of uncritical metric adoption or outright rejection based on misconceptions.

By understanding and addressing the varying bases for the support and criticism of bibliometric indicators, evaluation systems can become more adaptable and sensitive to disciplinary, career-related, and cultural factors that shape academic views on metrics. This approach can foster broader acceptance and a more nuanced integration of metrics within research evaluation practices. Overall, this study supports a multidimensional evaluation approach (Moed, 2020) tailored to different contexts, creating an evaluation environment that has at least a minimal level of acceptance from all the researchers involved.

Author contributions

Sandra Rousseau (sandra.rousseau@kuleuven.be): Conceptualization (Equal), Investigation (Equal), Methodology (Lead), Writing - original draft (Lead), Writing - review & editing (Equal).

Cinzia Daraio(daraio@diag.uniroma1.it): Conceptualization (Equal), Investigation (Equal), Methodology (Supporting), Writing - original draft (Supporting), Writing - review & editing (Equal).

Funding information

This study was supported by the Sapienza Università di Roma Sapienza Awards no. 6H15XNFS.

Supplementary material

-pdf file

The data can be accessed at: https://doi.org/10.57760/sciencedb.19424

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	Abramo G., Cicero T., & D’Angelo C.A. (2013). National peer-review research assessment exercises for the hard sciences can be a complete waste of money: The Italian case. Scientometrics, 95, 311-324.

[2]	Abramo G., & D’Angelo C.A. (2023). The impact of Italian performance-based research funding systems on the intensity of international research collaboration. Research Evaluation, 32(1), 47-57.

[3]	Adair J.G. (1984). The Hawthorne effect: A reconsideration of the methodological artifact. Journal of Applied Psychology, 69(2), 334-345.

[4]	Akbaritabar A., Bravo G., & Squazzoni F. (2021). The impact of a national research assessment on the publications of sociologists in Italy. Science and Public Policy, 48(5), 662-678.

[5]	Aksnes D. W. (2006). Citation rates and perceptions of scientific contribution. Journal of the American Society for Information Science and Technology, 57(2), 169-185.

[6]	Aksnes D. W., & Rip A. (2009). Researchers’ perceptions of citations. Research Policy, 38(6), 895-905.

[7]	Anfossi A., Ciolfi A., Costa F., Parisi G., & Benedetto S. (2016). Large-scale assessment of research outputs through a weighted combination of bibliometric indicators. Scientometrics, 107(2), 671-683.

[8]	Biagioli M. (2016). Watch out for cheats in citation game. Nature, 535(7611), 201-201.

[9]	Biagioli M., & Lippman A. (Eds.). (2020). Gaming the metrics: Misconduct and manipulation in academic research. MIT Press.

[10]	Bonaccorsi A. (2020a). Two decades of experience in research assessment in Italy. Scholarly Assessment Reports, 2(1), 16.

[11]	Bonaccorsi A. (2020b). Two decades of research assessment in Italy. Addressing the criticisms. Scholarly Assessment Reports, 2(1), 17.

[12]	Bornmann L. (2014). Do altmetrics point to the broader impact of research? An overview of benefits and disadvantages of altmetrics. Journal of Informetrics, 8(4), 895-903.

[13]	Bornmann L. & Wohlrabe K. (2019). Normalization of citation impact in economics. Scientometrics, 120(2), 841-884. DOI

[14]	Brooks C., & Schopohl L. (2018). Topics and trends in finance research: What is published, who publishes it and what gets cited? The British Accounting Review, 50(6), 615-637.

[15]	Broucker B., & De Wit K. (2015). New public management in higher education. In The Palgrave international handbook of higher education policy and governance (pp. 57-75). London: Palgrave Macmillan UK.

[16]	Buela-Casal G., & Zych I. (2012). What do the scientists think about the impact factor? Scientometrics, 92(2), 281-292.

[17]	Checchi D., Mazzotta I., Momigliano S., & Olivanti F. (2020). Convergence or polarisation? The impact of research assessment exercises in the Italian case. Scientometrics, 124, 1439-1455.

[18]

Chen

C. M.-L.

, & Lin

W.-Y. C.

(2018). What indicators matter? The analysis of perception towards research assessment indicators and Leiden Manifesto:The case study of Taiwan. In R. Costas, T. Franssen, & A. Yegros-Yegros (Eds.), Proceedings of the 23rd International Conference on Science and Technology Indicators (STI 2018) (pp. 688-698). Leiden, Netherlands: Centre for Science and Technology Studies (CWTS). https://openaccess.leidenuniv.nl/bitstream/handle/1887/65192/STI2018_paper_121.pdf?sequence=1

[19]	Cheung W.W. (2008). The economics of post-doc publishing. Ethics in Science and Environmental Politics, 8(1), 41-44.

[20]	Corsi M., D’Ippoliti C. & Zacchia G. (2019). On the evolution of the glass ceiling in Italian academia: the case of economics. Science in Context, 32(4), 411-430. DOI PMID

[21]	Curry S., de Rijcke S., Hatch A., Pillay D. G., van der Weijden I., & Wilsdon J. (2020). The changing role of funders in responsible research assessment: progress, obstacles and the way ahead. Research on Research Institute Working Paper, No. 3. https://doi.org/10.6084/m9.figshare.13227914.v1

[22]	Deci E.L., & Ryan R.M. (2013). Intrinsic motivation and self-determination in human behavior. NY, USA: Springer Science & Business Media.

[23]	Derrick G.E., & Gillespie J. (2013). “A number you just can’t get away from”:Characteristics of adoption and the social construction of metrics use by researchers’. In S. Hinze & A. Lottman (Eds.), zProceedings of the 18th international conference on science and technology indicators (pp. 104-116).

[24]	DORA. (2012). San Francisco Declaration on Research Assessment. Retrieved April 20, 2023, from https://sfdora.org/read

[25]

Dorsch

, Jeffrey

, Ebrahimzadeh

, Maggio

L.A.

, & Haustein

(2021). Metrics literacies: On the State of the Art of Multimedia Scholarly Metrics Education. In Proceedings of the 18th international conference on scientometrics and informetrics (pp. 1465-1466). Leuven, Belgium: Zenodo. https://doi.org/10.5281/ZENODO.5101306

[26]	European Commission. (2021). Towards a reform of the research assessment system: scoping report. Luxembourg: Publications Office of the European Union.

[27]	Fan W., & Yan Z. (2010). Factors affecting response rates of the web survey: A systematic review. Computers in Human Behavior, 26(2), 132-139.

[28]	Ferguson C., Marcus A., & Oransky I. (2014). The peer-review scam. Nature, 515(7528), 480.

[29]	Franceschini F., Maisano D. & Mastrogiacomo L. (2016). Empirical analysis and classification of database errors in Scopus and Web of Science. Journal of Informetrics, 10(4), 933-953.

[30]	Gingras Y. (2016). Bibliometrics and research evaluation: Uses and abuses. MIT Press.

[31]	Goodhart C. A. E. (1975). Problems of monetary management:The UK experience. In C. A. E. Goodhart (Ed.), Monetary theory and practice: The UK experience. Papers in monetary economics (Vol. 1, pp. 91-121) Sydney, Australia: Reserve Bank of Australia.

[32]	Guba K. (2024). Why do sociologists on academic periphery willingly support bibliometric indicators?. Scientometrics, 129(1), 497-518.

[33]	Guba K., Zheleznov A., & Chechik E. (2023). Evaluating grant proposals: Lessons from using metrics as screening device. Journal of Data and Information Science, 8(2), 66-92. DOI

[34]	Haddow G., & Hammarfelt B. (2019). Quality, impact, and quantification: Indicators and metrics use by social scientists. Journal of the Association for Information Science and Technology, 70(1), 16-26. DOI

[35]	Hammarfelt B., & Haddow G. (2018). Conflicting measures and values: How humanities scholars in Australia and Sweden use and react to bibliometric indicators. Journal of the Association for Information Science and Technology, 69(7), 924-935.

[36]	Hammarfelt B., & Rushforth A.D. (2017). Indicators as judgment devices: An empirical study of citizen bibliometrics in research evaluation. Research Evaluation, 26(3), 169-180.

[37]	Haustein S. (2016). Grand challenges in altmetrics: Heterogeneity, data quality and dependencies. Scientometrics, 108, 413-423.

[38]

Haustein

, & Larivière

(2014). The use of bibliometrics for assessing research:Possibilities, limitations and adverse effects. In I. M. Welpe, J. Wollersheim, S. Ringelhan, & M. Osterloh (Eds.) Incentives and performance: Governance of research organizations (pp. 121-139). Cham: Springer International Publishing.

[39]	Hicks D. (2004). The four literatures of social science. In H.F. Moed, W. Glänzel, & U. Schmoch (Eds.), Handbook of quantitative science and technology research: The use of publication and patent statistics in studies of S&T systems (pp.473-496). Dordrecht, Netherlands: Springer.

[40]	Hicks D., Wouters P., Waltman L., de Rijcke S. & I. Rafols, (2015). The Leiden Manifesto for research metrics. Nature, 520(7548), 429-431. https://doi.org/10.1038/520429a

[41]	Kamrani P., Dorsch I., & Stock W.G. (2021). Do researchers know what the h-index is? And how do they estimate its importance? Scientometrics, 126(7), 5489-5508.

[42]	Kulczycki E., Engels T.C., Pölönen J., Bruun K., Dušková M., Guns R.,... & Zuccala A. (2018). Publication patterns in the social sciences and humanities:Evidence from eight European countries. Scientometrics, 116, 463-486.

[43]	Lemke S., Mehrazar M., Mazarakis A., & Peters I. (2019). “When you use social media you are not working”: Barriers for the use of metrics in Social Sciences. Frontiers in Research Metrics and Analytics, 3, 39.

[44]	Lin J. & Fenner M. (2013). Altmetrics in evolution: Defining and re-defining the ontology of article-level metrics. Information Standards Quarterly, 25(2), 19-26.

[45]	Ma L., & Ladisch M. (2019). Evaluation complacency or evaluation inertia? A study of evaluative metrics and research practices in Irish universities. Research Evaluation, 28(3), 209-217.

[46]	Maggio L.A., Jeffrey A., Haustein S., & Samuel A. (2022). Becoming metrics literate: An analysis of brief videos that teach about the h-index. Plos One, 17(5), e0268110.

[47]	Mason S., Merga M.K., Canche M.S.G., & Roni S.M. (2021). The internationality of published higher education scholarship: How do the ‘top’journals compare? Journal of Informetrics, 15(2), 101155.

[48]	Millman J., Bishop C. H., & Ebel R. (1965). An analysis of test-wiseness. Educational and Psychological Measurement, 25(3), 707-726.

[49]	Moed H.F. (2006). Citation analysis in research evaluation. Springer Science & Business Media.

[50]	Moed H.F. (2020). Appropriate use of metrics in research assessment of autonomous academic institutions. Scholarly Assessment Reports, 2(1), 1. http://doi.org/10.29024/sar.8

[51]	Moher D., Bouter L., Kleinert S., Glasziou P., Sham M.H., Barbour V.,... & Dirnagl U. (2020). The Hong Kong Principles for assessing researchers: Fostering research integrity. PLoS Biology, 18(7), e3000737. https://doi.org/10.1371/journal.pbio.3000737

[52]	Necker S. (2014). Scientific misbehavior in economics. Research Policy, 43(10), 1747-1759. https://doi.org/10.1016/j.respol.2014.05.002

[53]	Olaya Escobar E.S., Berbegal‐Mirabent J., Alegre I., & Duarte Velasco O.G. (2017). Researchers’ willingness to engage in knowledge and technology transfer activities: An exploration of the underlying motivations. R&D Management, 47(5), 715-726.

[54]	Penny D. (2016). What matters where? Cultural and geographical factors in science. Slides presented at the 3rd Altmetrics Conference, Bucharest, Romania. Retrieved from https://figshare.com/articles/What_matters_where_Cultural_and_geographical_factors_in_science/3969012

[55]	Rafols I., Leydesdorff L., O’Hare A., Nightingale P. & Stirling A. (2012). How journal rankings can suppress interdisciplinary research: A comparison between Innovation Studies and Business and Management. Research Policy, 41(7), 1262-1282.

[56]	Rousseau R., Egghe L. & Guns R. (2018). Becoming metric-wise. A bibliometric guide for researchers. Kidlington: Chandos (Elsevier).

[57]	Rousseau S., Catalano G., & Daraio C. (2021). Can we estimate a monetary value of scientific publications? Research Policy, 50(1), 104116.

[58]	Rousseau S., & Rousseau R. (2015). Metric‐wiseness. Journal of the Association for Information Science and Technology, 66(11), 2389.

[59]	Rousseau S., & Rousseau R. (2017). Being metric-wise: Heterogeneity in bibliometric knowledge. El Profesional de la Información, 26(3), 480-487.

[60]	Rousseau S., & Rousseau R. (2021). Bibliometric techniques and their use in business and economics research. Journal of Economic Surveys, 35(5), 1428-1451.

[61]	Schubert T. (2009). Empirical observations on new public management to increase efficiency in public research—Boon or bane? Research policy, 38(8), 1225-1234.

[62]	Söderlind J., & Geschwınd L. (2020). Disciplinary differences in academics’ perceptions of performance measurement at Nordic universities. Higher Education Governance and Policy, 1(1), 18-31.

[63]	Thelwall M., & Kousha K. (2021). Researchers’ attitudes towards the h-index on Twitter 2007-2020: Criticism and acceptance. Scientometrics, 126(6), 5361-5368.

[64]	Tourish D. & Willmott H. (2015). In defiance of folly: Journal rankings, mindless measures and the ABS Guide. Critical Perspectives on Accounting, 26, 37-46.

[65]	van Dalen H.P. & Henkens K. (2012). Intended and unintended consequences of a publish-or-perish culture: A worldwide survey. Journal of the American Society for Information Science and Technology, 63(7), 1282-1293.

[66]	van Raan A.F. (2005). Fatal attraction: Conceptual and methodological problems in the ranking of universities by bibliometric methods. Scientometrics, 62, 133-143.

[67]

Wilsdon

, Allen

, Belfiore

, Campbell

, Curry

, Hill

, Jones

, Kain

, Kerridge

, Thelwall

, Tinkler

, Viney

, Wouters