Special Collections
Research Integrity
Research integrity and responsible research practices are increasingly being discussed by the academic community and the public. In recent years, a number of significant cases of academic misconduct have been reported worldwide. With the improvement in the availability of scientific papers and their related data, our understanding of academic misconduct is also improving. Nevertheless, we still notice a gap among policymakers, researchers, and societal participants on their understanding of academic misconduct and related policies and measures. To provide insights into research integrity and academic misconduct from multiple perspectives, this research topic aims to answer the question – what can policymakers, scientometricians, publishers, institutions and researchers do to counter academic misconduct?
In addition to organizing special paper solicitations, JDIS also facilitates broader exchanges among stakeholders on research integrity issues through symposiums.
Promoting Research Integrity-4th Data-Driven Knowledge Discovery Symposium successfully held.
Sort by Default Latest Most read  
Please wait a minute...
  • Select all
    |
  • Opinion
    Sichao Tong, Zhesi Shen, Tian-Yuan Huang, Liying Yang
    Journal of Data and Information Science. 2022, 7(2): 4-5. https://doi.org/10.2478/jdis-2022-0013
  • Research Note
    Jaime A. Teixeira da Silva, Serhii Nazarovets
    Journal of Data and Information Science. 2023, 8(2): 118-125. https://doi.org/10.2478/jdis-2023-0009

    Cancer research is occasionally described as being in a reproducibility crisis. The cancer literature has ample papers retracted due to misconduct, including the use of paper mills, invalid authorship, or fake data. The objective of this paper was to gain an appreciation of the balance of retractions and associated retraction notices of 23 retracted Cancer Biotherapy and Radiopharmaceuticals papers associated with paper mills. By 23 March 2023, these retracted papers had already accumulated 287 citations according to Web of Science Core Collection, 253 according to Scopus, and 365 according to Google Scholar, i.e., metrically speaking, they were highly rewarded. All authors had an affiliation (71% being a hospital) in China. Most (12/21; 57%) of corresponding authors had emails with a @163.com suffix. Four of the retraction notices (i.e., 17%) explicitly indicated paper mills as a reason for retraction although, in general, the retraction notices lacked details and background that could assist readers’ understanding of the retractions.

  • Perspective
    Sabina Alam, Laura Wilson
    Journal of Data and Information Science. 2023, 8(3): 1-14. https://doi.org/10.2478/jdis-2023-0018

    It is imperative that all stakeholders within the research ecosystem take responsibility to improve research integrity and reliability of published research. Based on the unique experiences of a specialist publishing ethics and research integrity team within a major publisher, this article provides insights into the observed trends of misconduct and how those have evolved over time, and addresses key actions needed to improve the interface between researchers, funders, institutions and publishers to collectively improve research integrity on a global scale.

  • Research Papers
    Zi-han Yuan, Yi Liu
    Journal of Data and Information Science. 2023, 8(4): 84-101. https://doi.org/10.2478/jdis-2023-0022

    Purpose: The number of retracted papers from Chinese university-affiliated hospitals is increasing, which has raised much concern. The aim of this study is to analyze the retracted papers from university-affiliated hospitals in mainland China from 2000 to 2021.

    Design/methodology/approach: Data for 1,031 retracted papers were identified from the Web of Science Core collection database. The information of the hospitals involved was obtained from their official websites. We analyzed the chronological changes, journal distribution, discipline distribution and retraction reasons for the retracted papers. The grade and geographic locations of the hospitals involved were explored as well.

    Findings: We found a rapid increase in the number of retracted papers, while the retraction time interval is decreasing. The main reasons for retraction are plagiarism/self-plagiarism (n=255), invalid data/images/conclusions (n=212), fake peer review (n=175) and honesty error(n=163). The disciplines are mainly distributed in oncology (n=320), pharmacology & pharmacy (n=198) and research & experimental medicine (n=166). About 43.8% of the retracted papers were from hospitals affiliated with prestigious universities.

    Research limitations: This study fails to differentiate between retractions due to honest error and retractions due to research misconduct. We believe that there is a fundamental difference between honest error retractions and misconduct retractions. Another limitation is that authors of the retracted papers have not been analyzed in this study.

    Practical implications: This study provides a reference for addressing research misconduct in Chinese university-affiliated hospitals. It is our recommendation that universities and hospitals should educate all their staff about the basic norms of research integrity, punish authors of scientific misconduct retracted papers, and reform the unreasonable evaluation system.

    Originality/value: Based on the analysis of retracted papers, this study further analyzes the characteristics of institutions of retracted papers, which may deepen the research on retracted papers and provide a new perspective to understand the retraction phenomenon.

  • Research Papers
    Jun Zhang, Jianhua Liu, Haihong E, Tianyi Hu, Xiaodong Qiao, ZiChen Tang
    Journal of Data and Information Science. 2025, 10(1): 167-187. https://doi.org/10.2478/jdis-2025-0003

    Purpose: In this paper, we develop a heterogeneous graph network using citation relations between papers and their basic information centered around the “Paper mills” papers under withdrawal observation, and we train graph neural network models and classifiers on these heterogeneous graphs to classify paper nodes.

    Design/methodology/approach: Our proposed citation network-based “Paper mills” detection model (PDCN model for short) integrates textual features extracted from the paper titles using the BERT model with structural features obtained from analyzing the heterogeneous graph through the heterogeneous graph attention network model. Subsequently, these features are classified using LGBM classifiers to identify “Paper mills” papers.

    Findings: On our custom dataset, the PDCN model achieves an accuracy of 81.85% and an F1-score of 80.49% in the “Paper mills” detection task, representing a significant improvement in performance compared to several baseline models.

    Research limitations: We considered only the title of the article as a text feature and did not obtain features for the entire article.

    Practical implications: The PDCN model we developed can effectively identify “Paper mills” papers and is suitable for the automated detection of “Paper mills” during the review process.

    Originality/value: We incorporated both text and citation detection into the “Paper mills” identification process. Additionally, the PDCN model offers a basis for judgment and scientific guidance in recognizing “Paper mills” papers.

  • Research Notes
    Zhesi Shen, Li Li, Yu Liao
    Journal of Data and Information Science. 2024, 9(3): 1-3. https://doi.org/10.2478/jdis-2024-0024
  • Research Papers
    Menghui Li, Fuyou Chen, Sichao Tong, Liying Yang, Zhesi Shen
    Journal of Data and Information Science. 2024, 9(2): 41-55. https://doi.org/10.2478/jdis-2024-0012

    Purpose: The notable increase in retraction papers has attracted considerable attention from diverse stakeholders. Various sources are now offering information related to research integrity, including concerns voiced on social media, disclosed lists of paper mills, and retraction notices accessible through journal websites. However, despite the availability of such resources, there remains a lack of a unified platform to consolidate this information, thereby hindering efficient searching and cross-referencing. Thus, it is imperative to develop a comprehensive platform for retracted papers and related concerns. This article aims to introduce “Amend,” a platform designed to integrate information on research integrity from diverse sources.

    Design/methodology/approach: The Amend platform consolidates concerns and lists of problematic articles sourced from social media platforms (e.g., PubPeer, For Better Science), retraction notices from journal websites, and citation databases (e.g., Web of Science, CrossRef). Moreover, Amend includes investigation and punishment announcements released by administrative agencies (e.g., NSFC, MOE, MOST, CAS). Each related paper is marked and can be traced back to its information source via a provided link. Furthermore, the Amend database incorporates various attributes of retracted articles, including citation topics, funding details, open access status, and more. The reasons for retraction are identified and classified as either academic misconduct or honest errors, with detailed subcategories provided for further clarity.

    Findings: Within the Amend platform, a total of 32,515 retracted papers indexed in SCI, SSCI, and ESCI between 1980 and 2023 were identified. Of these, 26,620 (81.87%) were associated with academic misconduct. The retraction rate stands at 6.64 per 10,000 articles. Notably, the retraction rate for non-gold open access articles significantly differs from that for gold open access articles, with this disparity progressively widening over the years. Furthermore, the reasons for retractions have shifted from traditional individual behaviors like falsification, fabrication, plagiarism, and duplication to more organized large-scale fraudulent practices, including Paper Mills, Fake Peer-review, and Artificial Intelligence Generated Content (AIGC).

    Research limitations: The Amend platform may not fully capture all retracted and concerning papers, thereby impacting its comprehensiveness. Additionally, inaccuracies in retraction notices may lead to errors in tagged reasons.

    Practical implications: Amend provides an integrated platform for stakeholders to enhance monitoring, analysis, and research on academic misconduct issues. Ultimately, the Amend database can contribute to upholding scientific integrity.

    Originality/value: This study introduces a globally integrated platform for retracted and concerning papers, along with a preliminary analysis of the evolutionary trends in retracted papers.

  • Research Papers
    Er-Te Zheng, Hui-Zhen Fu
    Journal of Data and Information Science. 2024, 9(2): 22-40. https://doi.org/10.2478/jdis-2024-0010

    Purpose: Recently, global science has shown an increasing open trend, however, the characteristics of research integrity of open access (OA) publications have rarely been studied. The aim of this study is to compare the characteristics of retracted articles across different OA levels and discover whether OA level influences the characteristics of retracted articles.

    Design/methodology/approach: The research conducted an analysis of 6,005 retracted publications between 2001 and 2020 from the Web of Science and Retraction Watch databases. These publications were categorized based on their OA levels, including Gold OA, Green OA, and non-OA. The study explored retraction rates, time lags and reasons within these categories.

    Findings: The findings of this research revealed distinct patterns in retraction rates among different OA levels. Publications with Gold OA demonstrated the highest retraction rate, followed by Green OA and non-OA. A comparison of retraction reasons between Gold OA and non-OA categories indicated similar proportions, while Green OA exhibited a higher proportion due to falsification and manipulation issues, along with a lower occurrence of plagiarism and authorship issues. The retraction time lag was shortest for Gold OA, followed by non-OA, and longest for Green OA. The prolonged retraction time for Green OA could be attributed to an atypical distribution of retraction reasons.

    Research limitations: There is no exploration of a wider range of OA levels, such as Hybrid OA and Bronze OA.

    Practical implications: The outcomes of this study suggest the need for increased attention to research integrity within the OA publications. The occurrences of falsification, manipulation, and ethical concerns within Green OA publications warrant attention from the scientific community.

    Originality/value: This study contributes to the understanding of research integrity in the realm of OA publications, shedding light on retraction patterns and reasons across different OA levels.

  • Research Papers
    Ping Ni, Lianhui Shan, Yong Li, Xinying An
    Journal of Data and Information Science. 2023, 8(4): 36-46. https://doi.org/10.2478/jdis-2023-0024

    Purpose: To reveal the typical features of text duplication in papers from four medical fields: basic medicine, health management, pharmacology and pharmacy, and public health and preventive medicine. To analyze the reasons for duplication and provide suggestions for the management of medical academic misconduct.

    Design/methodology/approach: In total, 2,469 representative Chinese journal papers were included in our research, which were submitted by researchers in 2020 and 2021. A plagiarism check was carried out using the Academic Misconduct Literature Check System (AMLC). We generated a corrected similarity index based on the AMLC general similarity index for further analysis. We compared the similarity indices of papers in four medical fields and revealed their trends over time; differences in similarity index between review and research articles were also analyzed according to the different fields. Further analysis of 143 papers suspected of plagiarism was also performed from the perspective of sections containing duplication and according to the field of research.

    Findings: Papers in the field of pharmacology and pharmacy had the highest similarity index (8.67 ± 5.92%), which was significantly higher than that in other fields, except health management. The similarity index of review articles (9.77 ± 10.28%) was significantly higher than that of research articles (7.41 ± 6.26%). In total, 143 papers were suspected of plagiarism (5.80%) with similarity indices ≥ 15%; most were papers on health management (78, 54.55%), followed by public health and preventive medicine (38, 26.58%); 90.21% of the 143 papers had duplication in multiple sections, while only 9.79% had duplication in a single section. The distribution of sections with duplication varied among different fields; papers in pharmacology and pharmacy were more likely to have duplication in the data/methods and introduction/background sections, however, papers in health management were more likely to contain duplication in the introduction/background or results/discussion sections. Different structures for papers in different fields may have caused these differences.

    Research limitations: There were three limitations to our research. Firstly, we observed that a small number of papers have been checked early. It is unknown who conducted the plagiarism check as this can be included in other evaluations, such as applications for Science and technology projects or awards. If the authors carried out the check, text with high similarity indices may have been excluded before submission, meaning the similarity index in our research may have been lower than the original value. Secondly, there were only four medical fields included in our research. Additional analysis on a wider scale is required in the future. Thirdly, only a general similarity index was calculated in our study; other similarity indices were not tested.

    Practical implications: A comprehensive analysis of similarity indices in four medical fields was performed. We made several recommendations for the supervision of medical academic misconduct and the formation of criteria for defining suspected plagiarism for medical papers, as well as for the improved accuracy of text duplication checks.

    Originality/value: We quantified the differences between the AMLC general similarity index and the corrected index, described the situation around text duplication and plagiarism in papers from four medical fields, and revealed differences in similarity indices between different article types. We also revealed differences in the sections containing duplication for papers with suspected plagiarism among different fields.