Special Collections
Statistics in Science of Science
Statistics have been widely used in bibliometric analyses. Yet, it is well-known that not all uses follow best practices, possibly resulting in invalid conclusions or irreproducibility. This Research Topic aims to provide insights into the proper use of statistics in bibliometrics.
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • Research Papers
    Mingyue Sun, Mingliang Yue, Tingcan Ma
    Journal of Data and Information Science. 2023, 8(3): 47-60. https://doi.org/10.2478/jdis-2023-0017

    Purpose: This paper aims to investigate the differences between conference papers and journal papers in the field of computer science based on Bayesian network.

    Design/methodology/approach: This paper investigated the differences between conference papers and journal papers in the field of computer science based on Bayesian network, a knowledge-representative framework that can model relationships among all variables in the network. We defined the variables required for Bayesian networks modeling, calculated the values of each variable based Aminer dataset (a literature data set in the field of computer science), learned the Bayesian network and derived some findings based on network inference.

    Findings: The study found that conferences are more attractive to senior scholars, the academic impact of conference papers is slightly higher than journal papers, and it is uncertain whether conference papers are more innovative than journal papers.

    Research limitations: The study was limited to the field of computer science and employed Aminer dataset as the sample. Further studies involving more diverse datasets and different fields could provide a more complete picture of the matter.

    Practical implications: By demonstrating that Bayesian networks can effectively analyze issues in Scientometrics, the study offers valuable insights that may enhance researchers’ understanding of the differences between journal and conference in computer science.

    Originality/value: Academic conferences play a crucial role in facilitating scholarly exchange and knowledge dissemination within the field of computer science. Several studies have been conducted to examine the distinctions between conference papers and journal papers in terms of various factors, such as authors, citations, h-index and others. Those studies were carried out from different (independent) perspectives, lacking a systematic examination of the connections and interactions between multiple perspectives. This paper supplements this deficiency based on Bayesian network modeling.

  • Research Paper
    Meiling Li, Yang Zhang, Yang Wang
    Journal of Data and Information Science. 2023, 8(2): 43-65. https://doi.org/10.2478/jdis-2023-0008

    Purpose With the availability of large-scale scholarly datasets, scientists from various domains hope to understand the underlying mechanisms behind science, forming a vibrant area of inquiry in the emerging “science of science” field. As the results from the science of science often has strong policy implications, understanding the causal relationships between variables becomes prominent. However, the most credible quasi-experimental method among all causal inference methods, and a highly valuable tool in the empirical toolkit, Regression Discontinuity Design (RDD) has not been fully exploited in the field of science of science. In this paper, we provide a systematic survey of the RDD method, and its practical applications in the science of science.

    Design/methodology/approach First, we introduce the basic assumptions, mathematical notations, and two types of RDD, i.e., sharp and fuzzy RDD. Second, we use the Web of Science and the Microsoft Academic Graph datasets to study the evolution and citation patterns of RDD papers. Moreover, we provide a systematic survey of the applications of RDD methodologies in various scientific domains, as well as in the science of science. Finally, we demonstrate a case study to estimate the effect of Head Start Funding Proposals on child mortality.

    Findings RDD was almost neglected for 30 years after it was first introduced in 1960. Afterward, scientists used mathematical and economic tools to develop the RDD methodology. After 2010, RDD methods showed strong applications in various domains, including medicine, psychology, political science and environmental science. However, we also notice that the RDD method has not been well developed in science of science research.

    Research Limitations This work uses a keyword search to obtain RDD papers, which may neglect some related work. Additionally, our work does not aim to develop rigorous mathematical and technical details of RDD but rather focuses on its intuitions and applications.

    Practical implications This work proposes how to use the RDD method in science of science research.

    Originality/value This work systematically introduces the RDD, and calls for the awareness of using such a method in the field of science of science.

  • Opinion
    Fan Chao, Guang Yu
    Journal of Data and Information Science. 2023, 8(1): 21-28. https://doi.org/10.2478/jdis-2023-0006

    Regression is a widely used econometric tool in research. In observational studies, based on a number of assumptions, regression-based statistical control methods attempt to analyze the causation between treatment and outcome by adding control variables. However, this approach may not produce reliable estimates of causal effects. In addition to the shortcomings of the method, this lack of confidence is mainly related to ambiguous formulations in econometrics, such as the definition of selection bias, selection of core control variables, and method of testing for robustness. Within the framework of the causal models, we clarify the assumption of causal inference using regression-based statistical controls, as described in econometrics, and discuss how to select core control variables to satisfy this assumption and conduct robustness tests for regression estimates.

  • Research Paper
    Yurui Huang, Chaolin Tian, Yifang Ma
    Journal of Data and Information Science. 2023, 8(1): 29-46. https://doi.org/10.2478/jdis-2023-0003

    Purpose: In recent decades, with the availability of large-scale scientific corpus datasets, difference-in-difference (DID) is increasingly used in the science of science and bibliometrics studies. DID method outputs the unbiased estimation on condition that several hypotheses hold, especially the common trend assumption. In this paper, we gave a systematic demonstration of DID in the science of science, and the potential ways to improve the accuracy of DID method.

    Design/methodology/approach: At first, we reviewed the statistical assumptions, the model specification, and the application procedures of DID method. Second, to improve the necessary assumptions before conducting DID regression and the accuracy of estimation, we introduced some matching techniques serving as the pre-selecting step for DID design by matching control individuals who are equivalent to those treated ones on observational variables before the intervention. Lastly, we performed a case study to estimate the effects of prizewinning on the scientific performance of Nobel laureates, by comparing the yearly citation impact after the prizewinning year between Nobel laureates and their prizewinning-work coauthors.

    Findings: We introduced the procedures to conduct a DID estimation and demonstrated the effectiveness to use matching method to improve the results. As a case study, we found that there are no significant increases in citations for Nobel laureates compared to their prizewinning coauthors.

    Research limitations: This study ignored the rigorous mathematical deduction parts of DID, while focused on the practical parts.

    Practical implications: This work gives experimental practice and potential guidelines to use DID method in science of science and bibliometrics studies.

    Originality/value: This study gains insights into the usage of econometric tools in science of science.