Home Browse Just accepted

Online First

Accepted, unedited articles published online and citable. The final edited and typeset version of record will appear in the future.
Please wait a minute...
  • Select all
    |
  • Research Article
    Ying Lou, Zhengyi Zhou, Menghui Li
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2026-0064
    Accepted: 2026-06-26
    Abstract
    Purpose

    Retraction count is a widely used metric for research integrity, but it overlooks the heterogeneous impact of retracted papers. Erroneous information from retracted papers tends to spread to subsequent research, compromising downstream reliability. The citation impact and lifespan of retracted papers represent a critical yet underexplored dimension of research integrity, warranting inclusion in evaluation frameworks.

    Design/methodology/approach

    This study proposes a framework for analyzing the citation impact and lifespan of retracted papers, applied to over 50,000 retracted articles indexed in the Web of Science (WoS).

    Findings

    Retracted papers accumulate substantial citations, yet their distribution is highly skewed, with 20 % of articles capturing 75 % of all citations. While the majority exhibit short citation lifespans, a small subset sustains influence for decades. Significant disparities exist across disciplines: papers in Clinical & Life Science tend to attract more citations and maintain them over longer periods, whereas papers in Computer Science generally receive fewer citations that decline rapidly. Network-level analysis further reveals that retractions are not randomly scattered but tend to cluster within citation networks, forming patterns resembling propagation along citation pathways. Articles citing retracted works are associated with substantially higher subsequent retraction rates, and this association appears to strengthen with the number of retracted papers cited.

    Research limitations

    The analysis was confined to WoS data, which may underestimate citation lifespan. Citation counts do not distinguish between positive, negative, and neutral citations. As an observational study, we identify associations rather than establish causation.

    Practical implications

    These findings provide a nuanced understanding of how retracted papers influence subsequent research across disciplines. The framework supports targeted integrity governance.

    Originality/value

    This study proposes integrating the citation impact and lifespan of retracted papers into scientific integrity metrics, thereby expanding the dimensional framework for evaluating research integrity beyond simple retraction counts.

  • Research Article
    Borja González-Albo, Luz Moreno-Solano, María Bordons
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2026-0005
    Accepted: 2026-06-26
    Abstract
    Purpose

    The Open Researcher and Contributor Identifier (ORCID) is becoming the de facto standard for researcher identification in scholarly communication, providing a persistent unique identifier and a registry that functions as a digital CV. The purpose of this study is to analyse the ORCID profiles of a selected group of leading researchers to analyse creation, completion, and updating of their records, with particular attention to the Works section.

    Design/methodology/approach

    We focus on the 357 grants awarded by the European Research Council (ERC) to researchers with a Spanish host institution between 2014 and 2020, for whom a high degree of ORCID adoption has been reported. Data included in their ORCID records were downloaded and the completion and dynamics of Personal Information and Activities sections are studied. Differences by domain and researcher career stage are explored.

    Findings

    All ERC researchers have an ORCID iD, and in most cases, their records are publicly available. ORCID profile completion is quite high in the Activities sections, particularly Works and Employment, and lower in the Personal Information sections. Most of the profiles were created by the users themselves and had been updated recently (75 % in the last three months). Sections that allow for automatic input tend to show higher completion rates and are more recently updated. Although ORCID accepts all kind of research outputs, journal articles form the majority (84 %) in our study. Works are added to profiles mainly by commercial entities (especially Elsevier and Clarivate), with non-profit organisations (e.g., Crossref) a distant second. Only 10 % of works are included by researchers themselves.

    Research limitations

    As the study examines a particular group of elite researchers, their practices regarding profile completion and updating cannot be generalized to other researcher populations.

    Practical implications

    ERC-funded researchers show quite high engagement with the ORCID system, but there is uneven completion of record sections and data quality issues that need to be addressed to improve the system and consolidate identifier use.

    Originality/value

    ORCID record completion, work collection and updating behaviour have been insufficiently explored to date. The study of a population of leading researchers helps reveal ORCID record completion practices, as well as data quality issues and controversial approaches to work collection.

  • Research Article
    Daniela De Filippo, Borja González-Albo, Fernanda Morillo, María Bordons
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0410
    Accepted: 2026-06-26
    Abstract
    Purpose

    This article assumes that embracing the values of open science is central to the ongoing reform of research assessment. Two main objectives are pursued: (1) to analyze whether responsible research assessment (RRA) and open science principles are being implemented in the Severo Ochoa (SO) call for proposals for centers of excellence in Spain; (2) to study the engagement of a selection of centers holding the excellence seal with open science practices.

    Methodology

    First, a longitudinal study of the content of the SO calls from 2011 to 2025 is conducted to identify changes reflecting adoption of the RRA and open science principles. Second, the open science practices of the selected centers are analyzed across two operational dimensions: i) open scientific knowledge and infrastructure and ii) openness to non-academic actors.

    Findings

    The call has evolved over the years, adapting its evaluation criteria to the principles of RRA: drastic decrease of the role of publication-based indicators, wider range of contributions considered and use of narrative CVs. Open science practices are increasingly promoted, particularly in centers’ strategic plans. Centers with the seal of excellence show higher levels of open publications and societal impact than the average in their fields. These centers show quite a strong commitment to the values of open science, although there is still room for improvement.

    Research limitations

    The analysis is limited by the lack of solid, reliable sources and indicators for some open science practices. Enhanced standardization and interoperability of information sources are needed to generate reliable indicators for monitoring these practices.

    Practical implications

    The RRA principles are being integrated into the SO call, but some open science practices still require greater promotion. This is relevant, because centers of excellence can set the benchmarks for the country’s entire scientific community.

    Originality/value

    Studying how RRA and open science principles are implemented by funding agencies is crucial, as funders have a clear influence on researchers’ behavior.

  • Research Article
    Hemn Barzan Abdalla, Yulia Kumar, J. Jenny Li, Dov Kruger, Kennedy Ehimwenma
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2026-0038
    Accepted: 2026-06-18
    Abstract
    Purpose

    This paper presents a unified framework for evaluating synthetic data across utility, fidelity, and privacy, with the goal of improving trust and reliability in scientific applications where realism alone is not sufficient.

    Design/methodology/approach

    The article defines synthetic data generation as a constraint-aware process guided by domain knowledge and introduces a structured pipeline that includes data auditing, controlled generation, and multi-criteria evaluation. The framework is validated through a Harmful Algal Bloom case study comparing statistical, deep, and quantum generative models under consistent training and evaluation settings.

    Findings

    Results show that no single model dominates across all criteria. Statistical models best preserve correlation structure and achieve high predictive performance, deep models increase variability but reduce fidelity, and quantum models improve privacy by increasing separation from real data at the cost of accuracy. Combining real and synthetic data improves segmentation results, with the best model achieving mIoU of 0.553 and Dice of 0.668. Overall, synthetic data quality depends on trade-offs rather than a single metric.

    Research limitations

    The evaluation focuses on one environmental dataset and a limited set of generative models. Results depend on data quality and chosen configurations, and fairness is not fully explored.

    Practical implications

    Model selection should match the application goal: high-risk scientific tasks require strong fidelity, while privacy-sensitive settings benefit from models that reduce reconstruction risk. The framework supports structured, repeatable deployment of synthetic data pipelines from analysis, synthetic data is reliable and performance is as good as observation data.

    Originality/value

    This paper introduces a multi-objective evaluation framework that integrates utility, fidelity, and privacy into a single decision process, shifting synthetic data from a realism-focused task to a governed, context-dependent system.

  • Research Article
    Mike Thelwall, Kayvan Kousha, Guoxiu He
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0465
    Accepted: 2026-06-17
    Abstract
    Purpose

    Academic documents require expert time to evaluate, and Large Language Models (LLMs) might support this through score or decision predictions. For confidential structured academic texts, such as grants and Impact Case Studies (ICSs), medium-sized LLMs can be run offline without expensive computing infrastructures, enhancing security.

    Design/methodology/approach

    This study evaluates for the first time how well medium-sized LLMs can score structured academic documents using the UK Research Excellence Framework (REF) 2021 ICSs, and whether LLMs can guess scores from individual sections. We obtained score estimates from five recent popular LLMs (DeepSeek R1 32B, Qwen 3 32B, Magistral Small 24B, Gemma 3 27B, and Llama 4 Scout 27B) across 6,010 REF 2021 ICSs, correlating the scores with a proxy quality rating (departmental average score).

    Findings

    Scoring the full texts was only moderately effective (in terms of correlations with the proxy quality rating) and Llama 4 failed to score most of the longest. Surprisingly, all LLMs except Magistral were able to make statistically significantly above random guesses at ICS scores from each of the individual component sections (summary, underpinning research, references, details of the impacts, and sources to support the impact). A logical two-stage approach mimicking the human reviewer instructions did not outperform focusing on impact alone. The best strategy was to score the summary and the details of the impact sections combined (five times, averaged) with Gemma 3. This gave the highest Spearman correlation (0.37) with departmental average proxy quality scores (0.55 for department-level correlations).

    Practical implications

    Medium sized LLMs can be used to score structured academic documents to support research assessments.

    Research limitations

    This uses a single large case study with a public, albeit obscured, gold standard.

    Originality/value

    This improves on the state of the art despite the additional restrictions and with a much cheaper and potentially private open weights LLM approach.

  • Research Article
    Hao Li, Jianhua Hou, Yang Zhang
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0474
    Accepted: 2026-06-05
    Abstract
    Purpose

    This study aims to investigate the interdisciplinary trend of human-computer interaction (HCI) through scientometric methods, which has been widely discussed in academia but still lacks quantitative evidence.

    Design/methodology/approach

    In this study, combining scientometric measures of disciplinary diversity with network coherence, we examined the evolution of interdisciplinarity of HCI over the past 20 years from the perspective of knowledge integration.

    Findings

    Our findings indicate that the disciplines contributing knowledge to HCI have become increasingly diverse. While tending towards high disciplinary heterogeneity, this knowledge has also led to a more even distribution of disciplinary sources within HCI. In terms of network coherence, the structure of knowledge sources of HCI has tended towards decentralization. The phenomenon of the ‘rich-club’ has shown a trend of ‘obvious presence-fluctuation-absence-obvious presence’.

    Research limitations

    The study has only analyzed the evolution of the interdisciplinarity of HCI from the perspective of knowledge integration, and have not yet analyzed the subsequent knowledge diffusion of HCI research. Meanwhile, this study focuses on the development trend of human-computer interaction as an interdisciplinary field, and does not further analyze the “rich-club” nodes in the annual knowledge integration network, whose attributes and changes may be related to the evolution of HCI.

    Practical implications

    Practitioners and policymakers in HCI should foster collaborations across distant disciplines through targeted seminars, funding initiatives, and open research environments to reduce knowledge barriers and unlock innovative integration. Additionally, leveraging the cyclical “rich-club” dynamics in knowledge networks can aid in balancing core knowledge reliance with interdisciplinary inclusivity.

    Originality/value

    The revealed trends in the evolution of interdisciplinarity can provide insights for interdisciplinary research and the construction of knowledge system in HCI.

  • Research Article
    Dangyi Zou, Jin Mao, Zhentao Liang, Xi Chen
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0357
    Accepted: 2026-06-05
    Abstract
    Purpose

    This study aims to systematically compare patent-to-patent and patent-to-paper citations, and to examine how their differences reflect distinct modes of knowledge flow between technological development and scientific research.

    Design/methodology/approach

    Using United States patent data from PATSTAT, combined with the Reliance on Science dataset and the Microsoft Academic Graph, we conduct a multi-dimensional analysis across four measures: citation frequency, citation time lag, semantic similarity, and science intensity. Patents are further classified by technological domain, innovation type, and citation source to capture heterogeneity in citation patterns.

    Findings

    Patent-to-patent citations are more frequent and exhibit higher semantic similarity, whereas patent-to-paper citations tend to occur with shorter time lags. Patents with higher impact are more likely to be associated with stronger linkages to scientific knowledge, particularly within a moderate range of influence. Emerging technological fields show a greater tendency to integrate recent scientific knowledge, while exploratory innovations draw on more technologically similar and interdisciplinary sources and exhibit shorter citation lags. Applicant citations are slightly more frequent, whereas examiner citations are more temporally proximate and semantically aligned with the citing patents.

    Research limitations

    The analysis is based on citation data, which capture observable linkage patterns but do not directly identify causal mechanisms. In addition, classification of technological domains and innovation types may not fully account for within-field heterogeneity.

    Practical implications

    The findings provide empirical insights for policymakers, research institutions, and corporate R&D practitioners seeking to better understand and manage the interaction between scientific research and technological innovation, and to design more effective innovation and patent strategies.

    Originality/value

    This study offers a comprehensive and systematic comparison of patent-to-patent and patent-to-paper citations across multiple dimensions and contexts, contributing to a more nuanced understanding of science–technology linkages and the structure of knowledge flows in innovation systems.

  • Research Article
    Chongjun Xi, Xiaoting Chen, Dongmei Ye
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0455
    Accepted: 2026-06-01
    Abstract
    Purpose

    To address the limitations of traditional patent metrics in capturing technical substance and the high cost of expert review, this study proposes a hybrid evaluation framework integrating Large Language Models (LLMs) with machine learning to achieve automated, highly accurate identification of high-value patents.

    Design/methodology/approach

    Adopting a “Virtual Assessor” paradigm, we constructed a dataset based on the China Patent Gold Awards. The study integrated semantic scores from three diverse LLMs (DeepSeek, Qwen, GLM) under zero-shot and few-shot prompt strategies into a Stacking ensemble learning model (combining XGBoost, Random Forest, and SVM) to predict patent value across nine comparative experimental setups.

    Findings

    Direct LLM evaluation revealed a “Knowledge Injection Paradox,” where explicit expert prior knowledge caused negative transfer and reduced accuracy due to over-conditioning. However, the Stacking model successfully rectified these biases, transforming subjective LLM evaluations into robust predictive features. The hybrid model achieved over 97 % accuracy in identifying high-value patents, demonstrating strong robustness even in high-noise environments.

    Research limitations

    The study relies on a binary classification of extreme samples (Gold Award vs. non-awarded), potentially oversimplifying the continuous distribution of patent value. Furthermore, the interpretability of the “black box” feature fusion mechanism requires further exploration.

    Practical implications

    The proposed framework offers IP managers and policymakers a scalable, cost-effective tool for automated patent screening, effectively bridging the gap between qualitative expert intuition and quantitative data precision.

    Originality/value

    This research introduces a “Semantic Enhancement + Algorithmic Rectification” paradigm. It empirically demonstrates how machine learning can correct LLM hallucinations and biases, marking a significant shift from data-driven perception to AI-driven cognitive decision-making in patent valuation.

  • Research Article
    Emanuel Kulczycki, José Octavio Alonso-Gamboa, Fernanda Beigel, Luciano Digiampietri, Mikael Laakso, Janne Pölönen, Zehra Taşkın, Gabriel Vélez Cuartas
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0440
    Accepted: 2026-06-01
    Abstract
    Purpose

    This study investigates the diversity of national scholarly journal publishing ecosystems in seven countries across Europe and Latin America: Argentina, Brazil, Colombia, Finland, Mexico, Poland, and Türkiye. It challenges the common perception that global scholarly publishing is dominated by international commercial publishers by examining national publishing structures beyond English speaking contexts.

    Design/methodology/approach

    Using ISSN Centre data and national sources, we analyse journal-level publishing structures rather than article- or citation-level outputs. Publishers were categorized according to their institutional and organizational characteristics. The analysis focuses on active journals, defined as those with a recorded start year and no identified termination date. Journal coverage in Web of Science, Scopus, and OpenAlex was examined to assess how national publishing landscapes are represented in major bibliometric databases.

    Findings

    Educational institutions emerge as the primary publishers in most countries, representing more than 75 % of journals in Colombia and Brazil and more than 50 % in Mexico, Argentina, and Poland. Finland stands out, with scientific and professional associations leading journal publication at 62 %. Commercial publishers hold comparatively small shares, reaching their highest levels in Türkiye at 12.1 % and Poland at 8.2 %. In terms of database representation, OpenAlex indexes over half of the journals in most countries, whereas Web of Science (WoS) and Scopus cover only a small portion.

    Research limitations

    The study relies on ISSN and national datasets, which differ in completeness and standardization. Variations in national reporting practices and database indexing policies may influence coverage comparisons. As the analysis is based on currently active journals, historical trends reflect surviving journals only.

    Practical implications

    The results provide evidence for policymakers, database providers, and research evaluators to recognize the diversity of national publishing systems. They highlight the importance of improving data sources and analytical approaches to ensure more accurate assessment of scholarly communication outside heavily commercialized environments.

    Originality/value

    The study offers a comparative analysis of national journal publishing ecosystems across seven countries, revealing structural differences that challenge the assumption of a globally uniform publishing model. It underscores the need for bibliometric research frameworks that include and accurately represent national and regional publishing structures.

  • Research Article
    Vitor Miranda de Souza, Vinicius Muraro
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0224
    Accepted: 2026-05-19
    Abstract
    Purpose

    This study aims to assess the diffusion and integration of Circular Economy (CE) knowledge across academic disciplines, by examining how CE concepts spread within mainstream academic and disciplinary discourses.

    Design/Methodology/Approach

    CE is conceptualized as a knowledge object, distinguishing between mainstream academic discourse (MAD) and mainstream disciplinary discourse (MDD). The research combines a dataset of CE-related articles from OpenAlex with Scimago’s indexed journals list, employing bibliometric mapping and topic analysis to chart the spread of CE knowledge. Integration is evaluated using a qualitative analysis of selected disciplinary articles, applying a four-stage typology: rhetorical uptake, instrumental uptake, selective translation, and conceptual integration.

    Findings

    Results indicate that the MAD of CE is largely interdisciplinary and concentrated, with 20 journals accounting for nearly half of the 8,582 CE articles. CE first appeared in MDD in 2008 within Medicine, reaching all academic fields by 2024. Certain MDDs predominantly publish CE-related papers in interdisciplinary journals. Nineteen topic clusters were identified and mapped across 23 academic areas. The exploratory analysis reveals that disciplines primarily engage with CE in rhetorical or instrumental terms, utilizing it as a managerial tool or symbolic reference. There are limited instances of selective translation and no evidence of full conceptual integration.

    Research Limitations

    The study is based on bibliographic metadata, with conclusions restricted to articles within MAD and MDD and to the exploratory disciplinary sample analyzed qualitatively.

    Practical Implications

    The findings provide new perspectives on the challenges disruptive concepts like CE face in penetrating and transforming disciplinary knowledge, informing both CE scholarship and broader science mapping methodologies.

    Originality/Value

    This research introduces a novel framework for evaluating how interdisciplinary concepts are recontextualized within disciplinary boundaries, offering valuable insights into the integration of CE and advancing approaches in science mapping.

  • Research Article
    Yuxian Liu, Hongrui Yang, Ronald Rousseau, Raf Guns, Sisi Li, Yafang Fan, Helan Wu, Sanfa Cai
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0478
    Accepted: 2026-05-12
    Abstract
    Purpose

    This study seeks to understand how the map of science has evolved and to identify the forces driving that evolution. By examining changes in these maps over time, we track the development of science through the growing interdisciplinarity of subject categories and the way they cluster together.

    Design/methodology/approach

    We integrate multiple classification schemes from Web of Science products to build a multilevel framework that connects journals, categories, groups, and broad domains. Using Journal Citation Reports (JCR) data from 2011, 2016, and 2024, we construct two types of maps of science: one based on citation relationships and another based on the sharing of journals across categories. We then examine how these maps evolve, identify factors influencing their development, and analyze how knowledge percolates through a multilevel structure.

    Findings

    The map of science has evolved from a bipolar structure into a more interconnected, rounded triangular configuration. In 2016, Arts & Humanities and the Social Sciences comprised a single cluster; by 2024, they had separated, while Biological Sciences and Medical Sciences, once distinct, had merged into a unified cluster. At the same time, categories such as education, special education, and applied psychology shifted toward hearing and speech pathology, forming a new special education cluster at the intersection of the social sciences, biomedicine, and technology. The expansion of Arts & Humanities categories, along with the addition of new categories in the Journal Citation Reports (JCR), and the resulting growth in interdisciplinarity across categories and journals, has played a key role in reshaping the overall map of science. Although categories within a cluster may disperse across multiple groups and broader domains through knowledge percolation in a multilevel system, they nonetheless remain relatively concentrated.

    Research limitations

    As with any empirical investigation, this study has some limitations. Most notably, our analysis is based on only three points in time and relies on a single data source, namely the Web of Science (WoS). Yet, we described our results in far more detail than is usually done.

    Practical implications

    Maps of science serve as tools for navigating the research landscape, helping to inform strategic investments and shape future research directions. Examining how these maps evolve over time and how such changes influence research trajectories is a central concern of the science of science.

    Originality/value

    Mapping clusters to their respective groups and broad categories reveals a hierarchical classification system in which clusters extend beyond disciplinary boundaries. Overlaps among categories, groups and broad categories indicate that scientific knowledge percolates through traditional field divisions.

  • Research Article
    Myroslava Hladchenko
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0374
    Accepted: 2026-05-06
    Abstract
    Purpose

    This study explores how EU integration, globalisation, and geopolitical disruptions have influenced scientific collaboration among European countries at different stages of EU membership. Specifically, it distinguishes between the EU-14 (long-standing Western and Southern European member states prior to the 2004 enlargement), the EU-13 (the Central and Eastern European countries that joined the EU in 2004 or later), and EU candidate countries, reflecting differing historical trajectories, institutional capacities, and levels of integration into European and global research networks.

    Design/methodology

    Using articles from the Scopus database, the study analyses Relative Intensity of Collaboration (RIC) among three distinct groups of countries: EU-14, EU-13, and EU candidate countries, as well as with China, Latin America, the UK, the USA and Russia.

    Findings

    Findings indicate increasing integration within European groups and with global partners, yet persistent hierarchical structures remain. EU-14 countries form the core of the network, exhibiting stable and cohesive collaboration, including with the UK despite Brexit. EU-13 countries occupy an intermediate position, showing moderate collaboration with EU-14 but stronger collaboration within their own group, with EU candidate countries and Russia. EU candidate countries demonstrate even weaker integration with EU-14, focusing on intra-group ties and links with EU-13 and Russia. RIC peaks in 2012 and 2018 for EU-13 and EU candidate countries correspond to Horizon 2020 and Horizon Europe cycles, highlighting the role of EU Framework Programmes. Collaboration with Russia increased following 2014 and only marginally declined after 2022. For EU-14, it exceeds collaboration with the USA. Collaboration with China remains limited due to network and cultural constraints, with similar intensity across all three groups. Overall, funding and policy initiatives are critical for stable international collaboration.

    Research limitations

    The analysis is limited by the Scopus database coverage.

    Policy implications

    Findings suggest that to strengthen the EU’s scientific position, policymakers should prioritise targeted funding and strategic initiatives that bridge collaboration gaps between EU-13, EU-14, EU candidate countries, and global partners.

    Originality/value

    This study provides a comprehensive, longitudinal analysis of European scientific collaboration, highlighting hierarchical structures, the differential roles of EU-14, EU-13, and candidate countries, and the resilience of networks with global partners such as the UK and Russia, while linking collaboration dynamics to EU Framework Programmes.

  • Research Article
    Kang Wang, Xin Zhang, Qiwei Liu, Meng Han, Yuqi Wang
    Journal of Data and Information Science. https://doi.org/10.1515/jdis-2025-0448
    Accepted: 2026-05-06
    Abstract
    Purpose

    Identifying technological opportunities in the field of sixth-generation mobile communications (6G) is crucial for research institutions and enterprises seeking to anticipate technological trajectories, and for policymakers formulating forward-looking innovation strategies.

    Design/methodology/approach

    Grounded in the notion that scientific knowledge drives technological R&D, this study integrates scientific publications and patent data to develop a framework for identifying technological opportunities in the 6G domain. We first employ a pretrained SBERT model to generate vector representations of mixed paper–patent data, followed by dimensionality reduction and visualization to characterize the structural features of science and technology. We then apply the HDBSCAN algorithm to identify thematic clusters across both corpora. Based on the clustering results and the semantic relationships between scientific themes and patented technologies, we construct two indicators – scientific knowledge reserve rate and technological invention competitiveness – to capture scientific and technological dynamics, respectively. A portfolio map is used to systematically identify potential technological opportunities within 6G.

    Findings

    The results demonstrate the feasibility and effectiveness of the proposed framework: eight potential technological opportunities are identified and validated to possess strong technological promise, confirming the robustness of our approach.

    Research limitations

    The analysis relies on scientific publications from WOS and patent data from DII, which may not fully capture the entire landscape of emerging 6G knowledge.

    Practical implications

    The framework provides actionable insights for 6G technology planning, enabling R&D institutions, enterprises, and policymakers to anticipate emerging trajectories and allocate innovation resources more effectively.

    Originality/value

    This study advances technology opportunity identification by bridging scientific research and technological development through semantic analysis. It offers a novel integrative framework that uncovers deep science–technology linkages and supports the cultivation of a global 6G innovation ecosystem and next-generation intelligent communication infrastructures.