Recent trends in the policy of research and its development include, among others:
• The explosion of the assessments in the “evaluation” society (Dahler-Larsen, 2012; Whitley and Gläser, 2007).
• The
need of policy-makers to have a comprehensive framework. We refer to the STAR METRICS
(1) (
(1) STAR METRICS is a data platform that is voluntarily and collaboratively developed by US federal science agencies and research institutions to describe investments in science and their results ((
Largent and Lane, 2012).)in the US (
Largent and Lane, 2012) and to the European Commission (2014) “Expert Group to support the development of tailor-made impact assessment methodologies for ERA European Research Area”) in Europe
(2)(
(2) The first objective of the European Commission (2014) Expert Group, in which the author of the present paper took part, was indeed to “propose an analytical framework for identifying how the implementation of different ERA priorities and components observed at institutional level (i.e. research performing organizations) and national level (i.e. national policies and funding organizations policies) impact the research system performance (at institutional and national level).”).
• The criticisms of the traditional assessment metrics. The traditional methods of research evaluation have recently been under attack in different contexts, in particular by the San Francisco Declaration on Research Assessment (DORA) and the Leiden Manifesto (Hicks et al., 2015) for the inherent problems of
the evaluation of research, although some of the crucial limits and problems have already been known to the specialized community for decades; see e.g. Glänzel and Schoepflin (1994); Glänzel (1996) and Moed and van Leeuwen (1996). A recent review on the role of metrics in research assessment and management (Wilsdon et al., 2015) has found that: “There is considerable scepticism among researchers, universities, representative bodies, and learned societies about the broader use of metrics in research assessment and management” as one of the main findings of the study.
• The crisis of science. Benessia et al. (2016) identify the most heated points
of discussion in reproducibility (see also Munafò et al. (2017)), peer review, publication metrics, scientific leadership, scientific integrity, and the use of science for policy (see also Saltelli and Funtowicz (2015) in The End of the Cartesian Dream). The transmission channel of this crisis from science to scientific advice is attributed to the collapse of the dual legitimacy system which was the basis of modernity, namely, the arrangement by which science provided legitimate facts, policy, and legitimate norms. The obsolescence of the classical opposition between scientific approach and dogmatic approach, generated by the problems of the empirical evidence (Saltelli and Funtowicz, 2015) may be a possible root of this crisis.
• The recent debate on modeling of research and innovation activities and on the use of qualitative or quantitative models for the analysis of science and innovation policies (Martin, 2016).
The
advent of the big data era is another main recurring trend. Recently, innovative data sources and tools offer new ways of studying science and technology and more data-driven knowledge discovery (
Ding and Stirling, 2016). At the same time, these sources are casting some doubts on the extensive use of traditional data sources used by the scholars in the field. The results obtained are obviously linked to intrinsic potential or limitations in the kind of data used in the analysis. This tendency has led to the “computerization” of bibliometrics that has been linked to the development of
altmetrics approaches (
Moed, 2016).
Is science really becoming increasingly data-driven? Are we moving toward a data-driven science (
Kitchin, 2014), supporting “the end of theory” (
Anderson, 2008), or will theory-driven scientific discoveries remain unavoidable (
Frické, 2015)? There is little agreement in the literature. More balanced views emerging from a critical analysis of the current literature are also available (
Debackere, 2016;
Ekbia et al., 2015), leading the information systems community to further deeply analyze the critical challenges posed by the big data development (
Agarwal and Dhar, 2014).
Data sources indeed “are not simply addenda or second-order artifacts; rather, they are the heart of much of the narrative literature, the protean stuff that allows for inference, interpretation, theory building, innovation, and invention” (Cronin, 2013, p. 435). Making data widely available is very important for scientific research as it relates to the responsibilities of the research community toward transparency, standardization, and data archiving. However, to make data available, researchers have to face the huge amount, complexity, and variety of the data that is being produced (
Hanson, Sugden, and Alberts, 2011). Moreover, the availability of data is not homogeneous for all disciplines and the cases of “little data” and “no data” are not exceptions (
Borgman, 2015).
These recent trends and the issues they underline require a new framework for the analysis. The theoretical framework (intended as a group of related ideas) that we propose in this paper is designed to be a reference for the development of models for the assessment of the research activities and their impacts. A framework is required to develop models of metrics. Models of metrics are necessary to assess the meaning, validity, and robustness of metrics.
We claim that our framework can support the development of the appropriate metrics for a given research assessment problem or for the understanding of existing metrics. This is a very difficult question because, among other things, it refers to a complex phenomenon for which there is the lack of a reference or a benchmark to compare the metrics against. The purpose of our proposed framework is exactly to offer a reference to develop models of research assessment.
Often, indicators and metrics are used as synonyms (see also
Wilsdon et al. (2015)). In this paper, indicators are combinations of data that produce
values, while metrics are considered parameters or measures of quantitative assessment used for measurement, comparison, or to track performance. Hence, an indicator is a metric if it is used as a parameter in a research assessment. It is more difficult to develop metrics than indicators due to the “implementation” problem (see
Daraio (2017a) for further details).
It is important to develop models for different reasons, including:
• Learning, to learn about the explicit consequences of assumptions, test the assumptions, and highlight relevant relations;
• Improving, to better operate, document/verify the assumptions, decompose analysis and synthesis, systematize the problem and the evaluation/choice made, and state clearly and in detail the dependence of the choice to the scenario.
More specifically, a model is an abstract representation, which from some points of view and for some ends represents an object or real phenomenon
(3)(
(3) Some interesting readings on modeling can be found in
Morris (1967),
pollock (1976),
Willemain (1994),
Myung (2000), and
Zucchini (2000).). The representation of reality is achieved through the
analogy established between aspects of reality and aspects of the model.
For quantitative models the analogy with the real world takes place in two steps:
1) Quantification of objects, facts, and phenomena in an appropriate way; and
2) Identification of the relationships existing between the previously identified objects, closest to the reality (that is the object of the model).
The practical use of a model depends on the different roles that the model can have and from the different steps of the decisional process in which the model can be used. A model can be considered a tool for understanding the reality. The potentiality of models can be expressed for description, interpretation, forecasting, and intervention. These different roles may be correlated or not, depending on the objective of the analysis and the way the model is built. To be successful the modeling has to take into account the specificities of the processes and systems under investigation, and in particular consider that the behavior is free and finalized to given aims; history and evolution matter as the behavior of systems and processes changes over time (see e.g. Georgescu-Roegen (1971)).
Hence, the modeling activity related to the assessment of research involves several methodological challenges. What is required today is to develop models, able to characterize strongly connected or interdependent model components, dominated by theirinteractions, including complex model behavior, emergent collective behavior which implies new and often unexpected model behavior, counter intuitive behavior, and extreme events with less predictable outcomes, and management based on setting rules for bottom up self-organization (Helbing and Carbone, 2012, p. 15). This is very different from the traditional models, characterized by independent model components, based on simple model behavior, where the sum of properties of individual components characterizes model behavior, conventional wisdom works well, and a well predictable and controllable top-down model seems to be inappropriate to capture the complexity and dynamics involved in the research assessment.
Evaluation(4)((4) In this paper evaluation and assessment are used as synonyms.) is a complex activity that consists of at least three levels of analysis: outputs, processes, and purposes.
The finalization of the analysis to the specific evaluation problem can help to specialize and simplify components, identifying those relevant aspects for the purpose. The finalization may encourage a functional analysis of the systems involved in the assessment. The external behavior of the systems may be explained focusing the analysis on the aims and the ways of interacting with the environment without entering into the details of the internal structures and organization (the organization may become relevant only if it is a limit to pursuing the objectives of the system).
Some pitfalls of models are:
• Theoretical limits (limitation of the concepts and their relations considered relevant in the models);
• Interpretative and forecasting limits (uncertainty of the phenomena, necessity of exogenous assumptions, errors in the estimates, approximation between model and theory, deviations between theory and reality, and evolution of behaviors);
• Limits in the decision context (quantifiability of the objectives, multiplicity, and variety of objectives, predictability of the external effects of the decisions, interdependencies with decisions of other subjects, computational complexity, and implementation of the decisions).
There are some difficulties, which arise in modeling:
• possibility that the targets are not quantifiable, or are multiple and conflicting; or that there are several decision-makers with different interests;
• Complexity, uncertainty, and changeability of the environment in which the observed system works and, after environmental stimuli, the difficulty of predicting the consequences of certain actions and relative responses;
• The limits (in particular of an organizational nature) within which the analyzed system adapts to the directives of the decision-maker; and
• The intrinsic complexity of calculation of the objective of the analysis.
The ambition of our framework is to be a general basis able to frame the main dimensions (features) relevant to developing multidimensional and multilevel models for the evaluation of research and its impacts
(5)(
(5) Vinkler (2010) presents a systematic view of units and levels of analysis in research assessment.).
We propose a framework, illustrated in Figure 1, based on three dimensions:
1) Theory, broadly speaking, identifies the conceptual content of the analysis, answering the question of “what” is the domain of interest, and delineating the perimeter of the investigation;
2) Methodology, generally refers to “how” the investigation is handled, what are the kind of tools that can be applied to the domain of interest, and tools which represent the means by which the analyses are carried out; and
3) Data, largely, and roughly, refers to instances coming from the domain of interest, and represents the means, on (or through) which the analyses are carried out.
We detail each dimension in three main building blocks and identify three operational factors for implementation purposes. The main building blocks of theory are: 1) education, 2) research, and 3) innovation. See Table 1 for their definition.
Figure 1. An illustration of our framework including its three implementation factors (tailorability, transparency, and openness) and its three enabling conditions: convergence, mixed methods, and knowledge infrastructures. |
Table 1 Definitions of education, research, and innovation. |
Term | Definition |
Education | In general, education is the process of facilitating the acquisition or assignment of special knowledge or skills, values, beliefs, and habits. The methods applied are varied and may include storytelling, discussion, teaching, training, and direct research. It is often done under the guidance of teachers, but students can also learn by themselves. It can take place in formal or informal settings and can embrace every experience that has a formative effect. Education is commonly organized into stages: preschool, primary school, secondary school, and after that higher education level. See the International Standard Classification of Education (ISCED, 2011) for a more technical presentation. |
Research | According to the OECD’s Frascati Manual (2002), research and development (R and D) is the “creative work undertaken on a systematic basis in order to increase the stock of knowledge, including knowledge of man, culture, and society, and the use of this stock of knowledge to devise new applications.” The term R and D covers three activities: “basic research, applied research and experimental development. Basic research is experimental or theoretical work undertaken primarily to acquire new knowledge of the underlying foundation of phenomena and observable facts, without any particular application or use in view. Applied research is also original investigation undertaken in order to acquire new knowledge. It is, however, directed primarily toward a specific practical aim or objective. Experimental development is systematic work, drawing on existing knowledge gained from research and/or practical experience, which is directed to producing new materials, products, or devices, to installing new processes, systems, and services, or to improving substantially those already produced or installed. R and D covers both formal R and D in R and D units and informal or occasional R and D in other units.” See also the more recent Frascati Manual (OECD, 2015b). |
Innovation | According to the OECD (2005), an innovation is “the implementation of a new or significantly improved product (good or service), or process, a new marketing method, or a new organizational method in business practices, workplace organization or external relations. The minimum requirement for an innovation is that the product, process, marketing method or organizational method must be new (or significantly improved) to the firm. Innovation activities are all scientific, technological, organizational, financial and commercial steps which actually, or are intended to, lead to the implementation of innovations. Innovation activities also include R and D that is not directly related to the development of a specific innovation.” |
The main building blocks of methodology are: 1) efficiency, 2) effectiveness, and 3) impact. The main building blocks of data are: 1) availability, 2) interoperability, and 3) unit-free property.
The problem of evaluation of the research activities, in our set-up, is framed in a systematic way, taking also into account education and innovation together with the other components of the methodology and data dimensions.
The three main implementation factors (see Section 4) we propose are:
1) Tailorability (broadly, the adaptability to the features of the problem at hand);
2) Transparency (approximately, description of the choices made and underlying hypothesis masked in the proposed/selected theory/methodology/data combination); and
3) Openness (roughly, accessibility to the main elements of the modeling).
The more we are able to go to the deep, fine-grain of the most atomic level-unit of analysis, the higher the level of tailorability, the higher the level of transparency and openness may be, and the better will be the conceptualization and formalization of quality within a model.
In this paper, we assert that the ability of developing (and afterward understanding and effectively using) models for the assessment of research is linked and depends, among other factors, on the degree or depth of the conceptualization (intended here as the formulation of the content of the general ideas and of the most important details) and formalization (intended here as “to make it official” or explicit), in an unambiguous way, of the underlying idea of Quality. Quality, here, is intended as “fitness for use.”
The level of conceptualization and formalization of Quality, however, is neither objective nor unique. It depends on the purposes and the subject or unit of the analysis (e.g. scholars, groups, institutions, up to meso or macro aggregated units, as regional or national entities) and it relates, in the end, to the specific evaluation problem under investigation.
We propose, finally, three enabling conditions that foster the connection of our framework with the empirical and policy worlds. The three enabling conditions are:
1) Convergence (as an evolution of the transdisciplinary approach, which allows for overcoming the traditional paradigms and increasing the dimensional space of thinking);
2) Mixed methods (as an intelligent combination of quantitative and qualitative approaches); and
3) Knowledge infrastructures (as networks of people that interact with artifacts, tools, and data infrastructures).
We maintain that these three enabling conditions contribute to the conceptualization and formalization of the idea of Quality that is related and fosters the overlap of the different perspectives, namely modeling world, empirical world, and policy world (see Section 4 and Figure 2 in Section 5).
Figure 2. An illustration of the relationship between modeling world, empirical world, and policy world: they are all somewhat overlapping visions or projections of the real worlds. |
Summing up, evaluating research and its impacts is a real complex task. perhaps the key problem is that research performance is not fully quantifiable. Hence, research assessment has to deal with non-fully quantifiable concepts.
There are several approaches to evaluating research. In order to adopt and use our framework, the following three postulates, intended as general validity conditions or principles, have to be accepted.
postulate 1: Models of metrics
Each metric is based on at least one model. The model can be implicitly or explicitly defined and discussed.
This postulate is a proposition that we assume to be true because it is obvious. The implication of postulate 1 is that if the model underlying the metric is not described, this does not mean that it is more robust to modeling choice. It simply means that you do not state clearly and in detail and account for the underlying theoretical choices, methodological assumptions, and data limits considerations. put in other words, the metric cannot be more robust than the model, and it is possible to assess the robustness of the model only if it is explicitly described.
postulate 2: Conceptualization and formalization of “Quality”
The accuracy, completeness, and consistency of the research assessment depends on the level (degree) of conceptualization and formalization, in an unambiguous way, of the “Quality” and its different layers and meanings.
This is the cornerstone postulate of our framework. The accuracy, completeness, and consistency of the research assessment depends upon and is limited by, among other factors, the complexity of the research evaluation. A further discussion on this issue can be found in
Daraio (2017a).
postulate 3: “Responsible” metrics
A metric developed according to a model that conceptualizes and formalizes in an unambiguous way the idea of Quality in its different layers and meanings is able to substantiate and give content to the concept of “responsible” metrics.
postulate 3 should be considered to be an open conjecture that needs to be further studied and demonstrated (see further discussion in Section 5).
The main contributions of the paper are:
• To introduce a simple framework that could be helpful in developing models for metrics of research assessment (e.g. a kind of checklist when practitioners plan an assessment);
• To propose a basis for the research of the ethics of research evaluation; and
• To outline directions for further research.
Our framework acts as a common denominator for different analytical levels and relevant aspects and is able to embrace many different and heterogeneous streams of literature. An outline is described in the next section.