Research Papers

A new evolutional model for institutional field knowledge flow network

  • Jinzhong Guo , ,
  • Kai Wang ,
  • Xueqin Liao ,
  • Xiaoling Liu
Expand
  • School of Information Management, Xinjiang University of Finance and Economics, Urumqi, China
†Jinzhong Guo (Email: ).

Received date: 2023-12-12

  Revised date: 2024-01-14

  Accepted date: 2024-01-23

  Online published: 2024-01-30

Abstract

Purpose: This paper aims to address the limitations in existing research on the evolution of knowledge flow networks by proposing a meso-level institutional field knowledge flow network evolution model (IKM). The purpose is to simulate the construction process of a knowledge flow network using knowledge organizations as units and to investigate its effectiveness in replicating institutional field knowledge flow networks.

Design/Methodology/Approach: The IKM model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process Using the PageRank algorithm to calculate the significance of nodes within the knowledge flow network. To compare its performance, the BA and DMS models are also employed for simulating the network. Pearson coefficient analysis is conducted on the simulated networks generated by the IKM, BA and DMS models, as well as on the actual network.

Findings: The research findings demonstrate that the IKM model outperforms the BA and DMS models in replicating the institutional field knowledge flow network. It provides comprehensive insights into the evolution mechanism of knowledge flow networks in the scientific research realm. The model also exhibits potential applicability to other knowledge networks that involve knowledge organizations as node units.

Research Limitations: This study has some limitations. Firstly, it primarily focuses on the evolution of knowledge flow networks within the field of physics, neglecting other fields. Additionally, the analysis is based on a specific set of data, which may limit the generalizability of the findings. Future research could address these limitations by exploring knowledge flow networks in diverse fields and utilizing broader datasets.

Practical Implications: The proposed IKM model offers practical implications for the construction and analysis of knowledge flow networks within institutions. It provides a valuable tool for understanding and managing knowledge exchange between knowledge organizations. The model can aid in optimizing knowledge flow and enhancing collaboration within organizations.

Originality/value: This research highlights the significance of meso-level studies in understanding knowledge organization and its impact on knowledge flow networks. The IKM model demonstrates its effectiveness in replicating institutional field knowledge flow networks and offers practical implications for knowledge management in institutions. Moreover, the model has the potential to be applied to other knowledge networks, which are formed by knowledge organizations as node units.

Cite this article

Jinzhong Guo , Kai Wang , Xueqin Liao , Xiaoling Liu . A new evolutional model for institutional field knowledge flow network[J]. Journal of Data and Information Science, 2024 , 9(1) : 101 -123 . DOI: 10.2478/jdis-2024-0009

1 Introduction

Knowledge Network was first formally proposed by Beckmann (1995), who entails a network structure system that creates relationships amongst numerous organizations to facilitate knowledge exchange among them, thereby resulting in knowledge innovation, integration and sharing. Garfield (1955) found that various standards of reference exist for identifying knowledge, leading to the construction of different levels of knowledge networks through the establishment of diverse knowledge units. Numerous scholars have made research on whether and how the knowledge units need to be divided. Börner et al. (2004) have proposed a process model named TARL (Topics, Aging, and Recursive Linking) model, which puts forward the idea of network layering and hierarchy. The model considers connections within homogeneous and heterogeneous networks, combined with a topic hierarchy to represent knowledge growth and diffusion. Liu et al. (2002) expand the network hierarchy concept to the “article-author-institution-country” structure. They utilize path analysis to detect significant nodes at each entity level. Zhang and Li (2016) contend that introducing citation relationships over or between these different entity levels could yield fresh insights.
The classification of knowledge units consists of three main levels (Chen et al., 2013): the micro-level, which examines knowledge exchange and transfer between individuals, using the paper or scholar(Li et al., 2019; Monechi et al., 2019) as the knowledge unit; the meso-level, which focuses on knowledge exchange and transfer between organizations, with field (Battiston et al., 2019; Shen et al., 2016; Van Noorden, 2015), institutions (Clauset et al., 2015), cities and so on being the knowledge units; and the macro-level, which analyses knowledge exchange and transfer between communities, using the country (Daraio et al., 2018; Li, 2017) as the knowledge unit. Studying the internal relationships and attributes of knowledge networks at varying levels offers significant value. By analyzing the correlation and connectivity between nodes, one can uncover hidden patterns, trends, and new associations. This helps to facilitate knowledge diffusion and innovation; at the same time, revealing the structure of knowledge diffusion (Yan, 2016), influence (Shen et al., 2016; Wu et al., 2018), and knowledge networks provides important clues for policy and decision makers, helping them to better understand the pathways of knowledge diffusion and social influence, and thus to formulate more effective policies and strategies.
In recent years, the rapid development of cutting-edge science and widespread application of information technology have led to exponential growth in knowledge and information (Lyu et al., 2022). Thus, only the study of the internal structure of the current knowledge network can no longer meet the practical needs. Therefore, a thorough investigation into the dynamic evolution process of knowledge networks is crucial. Price (1976,1965) constructed the earliest network model with scientific literature as the knowledge unit, and at the same time found that the number of nodes and edges will continue to grow and show a cumulative advantage. Scholars now extend their analysis of networks beyond the exploration of their current internal structure, and start to use the study of the evolution of network knowledge to sort out and assess the development of knowledge and innovation trends. Barabási and Albert (1999) conducted a study of multiple large-scale networks, including the World Wide Web and citation networks, and found that the network vertices follow the scale-free power law distribution. The proposed BA model with a power-law distribution serves as a fundamental basis for examining the evolution of knowledge networks. It explains, to some extent, the mechanism of uneven connection formation and is characterized by growth and optimality. Consequently, this model is commonly used in the investigation of the evolutionary processes of knowledge or cooperative networks.
However, the limitations of the BA model remain noteworthy and a considerable gap persists between its simulated knowledge network and the actual network. Specifically, the quantity of new joining nodes and newly added edges of new joining nodes in actual institutional field knowledge flow networks is not fixed unlike in the model where they are fixed constants. In the actual network, there appears to be a discernible functional relationship between the number of newly added edges per new joining node and the number of new joining nodes. Therefore, numerous scholars have subsequently adapted the BA model to replicate the progression of knowledge networks at varying levels.
At a micro-level, Bianconi and Barabási (2001) considered the intrinsic properties of nodes in the citation network and suggested a fitness model in which the probability of meritocracy is directly correlated to the product of the node’s degree and fitness. Meanwhile, Liu et al. (2002) observed that links between nodes can also be random at the micro level and then proposed a hybrid model of random and meritocratic links. Krapivsky et al. (2000) presented a nonlinear probabilistic meritocratic model that takes into account the pre-existing edges in the sink node. Fortunato et al. (2018) referred to a DMS model that acknowledges already pre-existing nodes as a potential cause of chain generation.
At the macro-level, Li and Chen (2003) analyse the evolution of knowledge in the local world. Chen et al. (2013) also examine the evolution of knowledge in the local world, based on a study of 2,541 articles published in the international journal Scientometrics. The study investigates how collaboration at the author level impacts collaboration at the institutional and national levels. It analyses the evolution of three networks, leading to a corresponding discussion.
However, there are relatively few studies at the meso-level compared to the micro and macro levels, particularly with regard to corresponding evolutionary models. The absence of this component makes it challenging for us to obtain a complete understanding of the field’s development status. It also prevents us from revealing the flow and transmission of knowledge among organizations, such as fields, institutions, and cities, and from providing effective recommendations for the development of knowledge organizations at the intermediate level. Knowledge exchange and transfer rely on paper dissemination and author collaboration (Guan & Zhu, 2014). When a paper is cited, it indicates that the knowledge within it is being used in a new paper. Co-authorship of papers by different authors facilitates the exchange and collision of ideas and knowledge. Based on this micro-level of thought, this research focuses on papers and authors as knowledge units to explore the path of knowledge exchange and transmission(Sun & Latora, 2020). At the macro level, countries are viewed as knowledge units, with a focus on knowledge transfer and potential impact between them. While both micro and macro-level studies examine knowledge networks from different perspectives, neither can fully address the gap at the meso-level. The meso-level is broader than the micro level, but not as extensive as the macro level.
Furthermore, research conducted at the meso-level can offer strategic guidance to the government, enabling it to play a more significant role in promoting the development of the disciplinary field and enhancing the country’s innovation capacity. By analyzing knowledge networks, the government can gain a better understanding of research hotspots and frontier trends in different fields. This allows for a more accurate grasp of the actual situation and needs of academic and research institutions, and enables targeted allocation of scientific research resources and support in key areas. The government can promote the transfer of knowledge and the generation of innovation by facilitating academic exchanges and cooperation, and building platforms and channels to strengthen collaboration among different knowledge organizations. In the current era of the ‘knowledge economy’, innovation is considered the core driving force for promoting economic transformation and upgrading (Cooke & Leydesdorff, 2006). Innovation is closely linked to the exchange and collision of knowledge. Simultaneously, the network society has facilitated the exchange and flow of knowledge beyond geographical limitations, leading to a constant reconstruction of the spatial scale of knowledge networks(Castells, 1996). As a result, the study of meso-level has become increasingly significant. Institutions and fields play a crucial role in knowledge dissemination. Cooperation among academic institutions, research institutions, and enterprises can effectively promote knowledge exchange and transmission, leading to the development and improvement of the field. Therefore, studying the knowledge network at the middle level is essential.
In this paper, we establish institutional field by examining the citation connections between 379,310 papers published in APS journals during 1993-2013, as well as the institutions and fields to which the papers belong, to form a meso-level directed knowledge flow network with institutional field as knowledge units. An enhanced evolutionary model, institutional field knowledge flow network evolution model (IKM), is suggested for modelling the development of the knowledge flow network concerning network expansion as well as connection preferences. The IKM model’s evolution results are compared with those of the traditional evolution model to assess the rationality and superiority of the former model. Additionally, the classical algorithm for key node identification of flow networks is utilized to compare the simulated network with the actual network in order to evaluate the precision of this IKM model. To enhance the study of knowledge flow network evolution using knowledge organization as the unit of analysis in this research and offer extensive reference and inspiration for related academic fields.

2 Data and methods

2.1 Construction of institutional field knowledge flow network

2.1.1 Data source and collection

The data presented in this paper are derived from papers published in APS journals during 1993-2013, the time, PACS code, author address information and citation relationship between papers are processed, the name of the research institutions in the author address information is extracted in the original address information, and after manual processing of the non-standard English institution name and spelling errors, the addresses that still do not identify the name of the institution are each labelled as “other”, and on this basis, the number of papers submitted by each institution was counted.
After conducting the necessary data processing procedures, it was revealed that a substantial number of research institutions, totaling 10,623, were involved in the study. The results obtained from the analysis shed light on the publication patterns within these research institutions. In particular, it was observed that a significant proportion of these institutions had a limited output of scholarly papers. To be more specific, out of the total research institutions examined, as many as 4,138 institutions were found to have only published a solitary paper during the designated period. Similarly, a staggering number of 7,808 research institutions demonstrated a relatively modest publication record, with no more than ten papers to their name. In stark contrast, only a small fraction of the institutions under investigation exhibited a prolific publication history, surpassing the milestone of one thousand publications. Upon scrutinizing the data presented in Figure 1A, it becomes apparent that the number of publications emanating from the top 100 research institutions has experienced a noticeable decline, shrinking to a mere 603 articles. As a result, the cumulative probability distribution graph illustrated in Figure 1A exhibits a more pronounced growth trend for lower publication counts, gradually plateauing as the number of publications exceeds 13. This finding suggests that there exists a distinct disparity in publication output among research institutions, with a greater concentration of institutions contributing significantly fewer articles.
Figure 1. Statistics on the number of publications by research institutions.
The graphical representation provided in Figure 1B unveils an intriguing trailing pattern, suggesting a substantial variation in the publication output among the research institutions. The observed pattern is consistent with the power law distribution, which aligns with the Pareto principle (Harvey & Sotardi, 2018). This principle dictates that a minority of entities contribute to the majority of the output, thereby resulting in a large variability in the output levels of the remaining entities. Specifically, the Pareto principle suggests that only approximately 20% of the research institutions can be classified as significant contributors, with a vast publication record, while the remaining 80% are considered less important entities with a considerably smaller output. Consequently, to ensure the comprehensiveness of the data and focus on the crucial institutions with a high volume of publications, this paper treats the extracted organizational addresses in the following manner:
For institutions in the top 1,000 in terms of number of papers, the sum of their articles accounted for 79.91% of the total number of papers, which can be attributed to a large-scale institution and is labelled in the form of “city - institution”. For institutions ranked lower than 1,000 in terms of the number of issued papers, the number of individual papers is small and the total number of papers issued by several institutions can only be essentially the same as the number of papers issued by a single large-scale institution, which can be attributed to a small-scale institution and is therefore consolidated according to the provinces and cities to which it belongs and labelled in the form of “city - other”. A portion of the data is not available from the author’s institution also consolidated according to the provinces and cities. and labelled in the form of “Others-city”. This yields a total of 3,185 addresses listed in three different formats.
The Physics and Astronomy Classification Scheme (PACS) codes categories the fields of modern physics, where the highest level PACS code divides modern physics into 10 major subfields 00, 10, 20, 30, 40, 50, 60, 70, 80, 90, for example PACS code “47.56.+r” belongs to field 40 (Electromagnetism, Optics, Classical Mechanics), from which the field corresponding to each paper can be obtained. After identifying the institution and field of each paper, the “ institution-field “ to which the paper belongs is obtained by permutation and combination. The flow of knowledge between institutional field occurs through the citation relationship between papers. This means that knowledge flows from the cited institutional field to the citing institutional field. For instance, in Figure 2, paper C cites papers A and B, which both belong to multiple institutions and fields. Specifically, this indicates that knowledge flows from the institutional fields of A and B to those of C. The knowledge flow is consistent across all institutional fields, with each institutional field having a flow of $ X_{\text {Citng }}^{\text {Cited }} $.
Figure 2. Knowledge flow network construction.
$X_{\text {Citg }}^{\text {Cited }}=\frac{1}{\text { Citing }_{p} \times \text { Cited }_{p} \times \text { Citing }_{n} \times \text { Citing }_{I} \times \text { Cited }_{I}}$
Citingp represents the number of fields divided by PACS encoding of the citing papers, Citingn represents the number of reference articles in the citing papers, CitingI represents the number of institutions contained in the citing papers. Citedp represents the number of fields divided by PACS encoding of the cited papers, CitedI represents the number of institutions contained in the cited papers.
In this way, the knowledge flow network is constructed to represent the institutional field, consisting of 8,900 nodes that represent 8,900 institutional fields. Due to the large number of institutional fields, it is not feasible to display them all. Thus, a network diagram as shown in Figure 3 illustrating the flow of institutional field knowledge among the top 100 institutional fields with the highest number of publications in 2013 is provided for reference. Nodes in the network correspond to specific institutional field, and the direction is from the cited institutional field, to the citing institutional field, in other words, knowledge flows from the cited side to the citing side. The nodes can be classified as either source nodes or sink nodes, with source nodes indicating where knowledge is disseminated outward and sink nodes showing where knowledge is received. Edges indicate the flow of knowledge between the source and sink nodes, which represent the two institutional institutions, and the weights of the edges indicate the flow of knowledge. The node’s total output represents the total publications of the institutional field size, distinguished by the node’s size in the graph. Different colors represent different fields. The top 100 fields with the highest number of publications are mainly concentrated in the 70 fields, with some in the 00, 10, 20, 60, and 90 fields. The remaining fields have relatively few publications.
Figure 3. Part of the Institutional field knowledge Flow network in 2013.

2.2 Institutional field knowledge flow network evolution model (IKM)

2.2.1 Evolution features of institutional field knowledge flow network

When conducting a comparative and analytical examination of the evolution of institutional field and the knowledge transfer between them within the institutional field knowledge flow network over time, it becomes evident that the network’s evolutionary characteristics primarily manifest in the augmentation of both the quantity and selection of edges (knowledge transfer). In particular, Figure 4(A) illustrates the growth trends of nodes and edges, revealing fluctuations in the number of new joining nodes and inconsistent variations in the number of newly added edges per node. This dynamic nature implies that the size and complexity of the institutional field knowledge flow network undergo continuous transformations throughout its development. Furthermore, the expansion of the institutional field knowledge flow network can be attributed to two distinct methods. Firstly, newly added edges are created by leveraging pre-existing nodes as sources, facilitating the integration and dissemination of knowledge within the network. Secondly, new joining nodes, which originate from outside the existing network, act as source nodes, generating newly added edges and fostering network growth. These mechanisms contribute to the evolution and enrichment of the institutional field knowledge flow network. Figure 4(B) provides further insights into the establishment of newly added edges within the network, highlighting four discernible pathways through which this occurs:
Figure 4. Evolution features of nodes and edges in institutional field knowledge flow network.
(1) A new edge is created between a pre-existing source node and other pre-existing nodes. A novel edge is established between two pre-existing nodes that previously had no connection. Specifically, this connection indicates that new knowledge is still being exchanged between pre-existing institutional field, with continuous cross-collaboration and information sharing across institutional field.
(2) New joining nodes act as source nodes to generate new edges with pre-existing nodes. This situation reflects the inclusiveness of pre-existing institutional fields towards the knowledge of emerging institutional fields, and their willingness to accept and assimilate this knowledge. It showcases the constant expansion and growth of knowledge, as well as the acknowledgement of diversity and cross-disciplinarity in academia.
(3) New joining nodes act as source nodes to generate new edges with other new joining nodes, suggesting that there is also knowledge exchange between emerging institutional fields, facilitating cross-fertilization of disciplines and knowledge exchange through exposure to new ideas and methodologies.
(4) The pre-existing nodes act as source nodes to generate new edges with new joining nodes, reflecting the need and urgency for the new institutional field to absorb knowledge from the old institutional field in order to promote its own development.
Moreover, it is imperative to highlight that, when adopting a meso-level perspective, a substantial portion of the network’s expansion occurs within the actual pre-existing nodes. This aspect often eludes the attention of many academic scholars, but it plays a pivotal role in elucidating the intricate dynamics and configuration of knowledge flow networks. By focusing on the interactions and connections between established nodes, we gain valuable insights into the continuous evolution and refinement of the network. These inter-nodal exchanges, although less conspicuous, contribute significantly to the overall growth and transformation of the knowledge flow network. Neglecting this aspect would lead to an incomplete understanding of the network’s development trajectory and its underlying mechanisms.

2.2.2 IKM model construction

The evolutionary mechanism of the BA model and its extended version is based on the principle that new joining nodes tend to preferentially connect to pre-existing nodes with higher degrees. This results in earlier joined nodes having higher degrees and consequently making fewer connections to newer nodes. However, this mechanism diverges from the evolutionary characteristics exhibited by the institutional field knowledge flow network. It fails to account for factors such as the variable number of new joining nodes, the random nature of edge formation, and the utilization of pre-existing nodes as sources for generating newly added edges within the evolutionary process. In order to more accurately capture the evolution traits of the institutional field knowledge flow network, this study introduces an Institutional Field Knowledge Flow Network Evolution Model (IKM). This model enhances the preferential attachment and growth observed in scale-free BA networks, while incorporating three adjustment parameters to simulate the selection of connection targets and the types of nodes involved in the network evolution process. The IKM integrates various connection targets and their respective preferred connection probabilities into the connection strategy, providing a more nuanced approach to modeling the evolution of the institutional field knowledge flow network. By accounting for the diverse factors at play in the evolution process, this model aims to offer a more comprehensive and accurate representation of the network’s development and growth dynamics. The algorithmic flow is illustrated in Figure 5.
Figure 5. IKM model algorithm flow chart.
1. Network Growth Mode: To determine the number of new joining nodes in the network at each time, which can count the source nodes in the newly added edge. As a result, the number of new joining nodes, N, over the years can be calculated by comparing the network data year by year. Upon analysis through mathematical statistics and observation, it has been determined that there exists a linear growth with a coefficient of 0.2 between the number of new joining nodes and time. In consideration of the evolution of the institutional field knowledge flow network, four distinct forms of newly added edges are observed. Therefore, during the evolution process, four distinct edge types must be constructed each time, with the probability of occurrence calculated based on the actual types of newly added edges in the actual network, by adjusting three parameters (Q1, Q2, Q3). This process determines the type of new edge in the simulated network. Initially, Q1 is employed to differentiate between the types of all new source nodes as either pre-existing or new joining nodes. Then, Q2 and Q3 divide the sink nodes connected by the two types of source points, respectively. This results in the identification of four types of edges, namely pre-existing nodes to pre-existing nodes, pre-existing nodes to new joining nodes, new joining nodes to pre-existing nodes, and new joining nodes to new joining nodes.
The specific parameters are set as follows:
$Q_{1}=\frac{S N_{O}}{S N_{O}+S N_{N}}$
SNO represents the sum of newly added edges created by pre-existing nodes as source nodes, while SNN represents the sum of newly added edges created by new joining nodes acting as source nodes. A randomly produced floating-point decimal F ranging from 0 to 1. If F<Q1, the node should be classified as a pre-existing one. Conversely, if F>Q1, the node should be classified as a new joining one.
$Q_{2}=\frac{O N_{O}}{S N_{O}}$
ONO represents the sum of newly added edges created by pre-existing nodes as source nodes and new joining nodes as sink nodes. A random floating-point decimal F1 ranging from 0 to 1. If F1<Q2, the node should be classified as a new joining one. Conversely, if F1>Q2, the node should be classified as a pre-existing one.
$Q_{3}=\frac{N O_{N}}{S N_{N}}$
NON represents the sum of newly added edges created by new joining nodes as source nodes and pre-existing nodes as sink nodes. A random floating-point decimal F2 ranging from 0 to 1. If F2<Q3, the node should be classified as a pre-existing one. Conversely, if F2>Q3 the node should be classified as a new joining one.
2. Connection Probability Based on Preferences: Once the type of source nodes and sink nodes is determined, there are two connection mechanisms established. These mechanisms rely on preference connections based on different objects selected for connection, specifically the degree priority connection mechanism and the random connection mechanism. If the sink node is a pre-existing node, the degree priority connection mechanism is employed. New joining nodes have a greater tendency to connect with those with higher output degree, with the connection probability Pi being proportional to the total output sum of the nodes. In instances where the new joining node acts as a sink node, the random connection mechanism is employed. Pre-existing nodes connect to the new joining node with a random probability Pr.
Degree priority connection mechanism: When a node chooses a pre-existing node as the connection object, the optimal mechanism concept in the BA model is applied, and probability is calculated and assigned based on the output degree of nodes in the network. The node connects to a pre-existing node i with a probability of Pi, which is proportional to the output and Ki of the existing node i (output priority connection). This relation is fulfilled:
$P_{i}=\frac{K_{i}}{\sum_{j} K_{j}}$
Random connection mechanism: If a node selects a new joining node as the connection object, it will randomly connect to a new joining node r with a probability of Pr The number of new joining nodes is S, and the connection probability Pr is determined accordingly:
$P_{r}=\frac{1}{S}$

3 Result

3.1 Comparison of IKM model evolution results with actual networks

Previous research has often relied on node neighbor-based methods to evaluate simulated networks against actual networks. However, this approach fails to capture the full complexity of network dynamics and may overlook important features and nodes within the network. To address this limitation, this paper proposes a novel technique that utilizes a feature vector and the PageRank algorithm(Page et al., 1998; Souma et al., 2020) to identify significant nodes in the network. The PageRank algorithm is a widely used method for determining the significance of web pages by computing the PageRank value to evaluate the importance of a node in relation to other nodes. In this study, the PageRank algorithm is utilized to assess the significance of each institutional field in relation to other fields. This allows for a more comprehensive and scientific evaluation of the effectiveness of the simulation network. To compare the actual network with the simulated network, the study evaluates the number of total publications, total citations, and PageRank value of each institutional field. The results reveal significant variations in the ranking of institutional fields between the actual and simulated networks. The proposed technique offers a more nuanced and effective means of evaluating the alignment between simulated and real networks, providing valuable insights into the dynamics and structure of institutional field knowledge flow networks. Specific findings are presented in Figure 6.
Figure 6. Comparison of the rank of total publication, total citations and PageRank value between actual and simulated networks.
Upon observing the figure, it becomes evident that the differences in institutional field rankings between the actual and simulated networks are primarily located on both sides of the diagonal line, with a relatively minor degree of bias. The distribution of PageRank values exhibits the highest concentration, followed by the number of total citations while the ranking performance based on the number of total publications is comparatively less accurate. This distribution pattern implies that while the simulated networks do not precisely replicate the actual networks, they have achieved a level of similarity in terms of overall structure. Despite some disparities, the variations among the three ranking indicators are relatively minimal, indicating that the IKM model proposed in this paper effectively simulates the network dynamics.
Furthermore, the findings suggest that the IKM model captures the essential features of the institutional field knowledge flow network, as evidenced by the close alignment between simulated and actual network rankings. These results underscore the effectiveness of the proposed model in replicating the complex dynamics of institutional field knowledge flow networks, thereby providing valuable insights into the evolution and structure of institutional fields.

3.2 Comparison of IKM model evolution results with the traditional model

The BA model, a renowned scale-free network generation model, is founded on the concept of an optimal connection mechanism, wherein new joining nodes in the network exhibit a preference for connecting with nodes possessing higher degrees, thus giving rise to the manifestation of the Matthew effect. This optimization mechanism plays a pivotal role in engendering scale-free network properties. On the other hand, Dorogovtsev, Mendes, and Samukhin proposed the Dorogovtsev-Mendes-Samukhin (DMS) model as an extension of the BA model for directed networks. Its fundamental principles revolve around two key aspects: the fixed constants of both the number of new joining nodes and the number of newly added edges to these nodes. It is worth noting that the source node of a newly added edge could be either a new joining node or a pre-existing node. By incorporating the network’s directivity and the dynamic nature of the source node for the newly added edges, the DMS model surpasses the BA model and more accurately aligns with the evolutionary processes observed in actual networks.
In light of the aforementioned background, the present study seeks to leverage the institutional field knowledge flow network constructed in this paper to assess the BA and DMS models. The primary objective is to compare the evolutionary outcomes with those of the IKM model, thereby exploring the rationality behind the improvements made by the IKM model in relation to the BA model, and determining whether the IKM model can more accurately simulate the evolution process of the institutional field knowledge flow network compared to the BA and DMS models. This comparative analysis aims to shed light on the effectiveness and realism of the IKM model in capturing the intricate dynamics of institutional field knowledge flow networks.

3.2.1 Comparison of total publication

Upon careful examination of the foregoing observation, it becomes evident that a notable disparity exists in the ranking of total publications between the actual and the simulated network. In contrast, the number of total citations and the PageRank value demonstrate a relatively favorable alignment. Consequently, in order to gain deeper insights into the variation in total publications between the simulated and actual networks, a detailed investigation is warranted. This entails conducting a comparative analysis of the total publications simulated by the BA and DMS models. By juxtaposing the distribution of total publications in the simulated network from these three models with that in the actual network, a comprehensive understanding of the underlying dynamics can be achieved. The outcomes of this comparison are visually presented in Figure 7, offering a clear representation of the findings for further scrutiny and interpretation.
Figure 7. Comparison of total publication in the simulated networks and the actual network.
It becomes apparent that the distribution of total publications yields distinct patterns across the IKM, DMS, and BA models in comparison to the actual network. Notably, when the number of total publications is at a minimal value of 1, the IKM model demonstrates results that closely resemble those of the actual network, with the DMS model following closely behind. In contrast, the BA model exhibits the most significant deviation in terms of the frequency of total publications. Furthermore, within the range of 10-100 total publications, both the IKM and DMS models present nearly identical outcomes that closely approximate the actual network, while the BA model showcases a higher level of disparity. As the total number of publications surpasses 100, all three simulated networks exhibit alignment with the actual network, indicating convergence in this higher range.
It provides additional insights through the presentation of cumulative probability distributions under varying total publications. Here, the IKM model notably mirrors the characteristics of the actual network with remarkable accuracy. Conversely, the DMS model displays slightly less efficacy, while the simulated BA model’s network demonstrates a more pronounced deviation from the actual network. Through an evaluation of the frequency of total publication and the cumulative distribution probability of total publication, it becomes evident that the IKM model outperforms its counterparts, showcasing superior performance among the three models. This suggests that the IKM model excels in effectively simulating the actual network across varying total publication scenarios, thereby establishing its robustness and reliability. Conversely, the BA model demonstrates greater suitability for larger total publication scenarios. These findings underscore the IKM model’s capacity to enhance the BA model with growth and selection mechanisms, ultimately culminating in an improved model fit and heightened accuracy in representing real-world network dynamics.
In order to further investigate the relationship between the number of total publications in the simulated network and the actual network, this study seeks to assess the extent to which the distribution of total publications in the institutional field knowledge flow network aligns with that of the actual network. To accomplish this, a non-parametric statistical method known as the Two-sample KS test (Berger & Zhou, 2014) is employed to evaluate the similarity between two independent distributions. Specifically, the KS test is utilized to determine whether the quantity of total publications in the simulated network generated by the IKM, BA, and DMS models adheres to the same distribution as observed in the actual network. By subjecting the simulated and actual networks to this rigorous statistical analysis, a comprehensive understanding of the resemblance between their respective distributions can be obtained. The resulting data was compared in Table 1.
Table 1. KS test results of total publication between simulated networks and actual network
Type of simulated network model D P H0
IKM model 0.016 0.142 acceptance
BA model 0.348 2.972×10-309 rejection
DMS model 0.021 0.013 rejection

*Note: Original hypothesis H0: The two groups of samples are identically distributed

Based on the test findings, at a significance level of 0.05, the values of the BA and DMS models’ test outcomes are 2.972×10-309 and 0.013, respectively. Both values are below 0.05, indicating that the number of total publications distribution in the network simulated by the BA and DMS models does not align with the actual network. Conversely, the IKM model outperforms them. At a significance level of 0.05, the value of the IKM model is 0.142, which exceeds the threshold. Therefore, the distribution of the number of total publications in the simulated network by the proposed IKM model aligns with that of the actual network and belongs to the same distribution. This provides further evidence that the IKM model can effectively elucidate the evolution of the institutional field knowledge flow network.

3.2.2 Correlation evolution contrast

To gain further insights into the network dynamics, both the BA and DMS models were utilized to rank the nodes’ importance using PageRank. Additionally, a Pearson correlation test (Benesty et al., 2009) was conducted to examine the relationship between the number of total publications, the number of total citations, and the PageRank values in comparison to the actual network. Figure 8 illustrates the outcomes of these analyses.
Figure 8. Evolution of correlation coefficient from 1994 to 2013.
Over the years, the number of fields within the network steadily increased. Notably, when scrutinizing the correlation between the three simulations produced by the BA model and the actual network, a distinct downward trend becomes apparent. It is important to highlight that prior to 1995, the correlation of the BA model surpassed that of the IKM and DMS models, approaching a value close to 1. During the period spanning 1996 to 1998, the correlation between the IKM and DMS models exhibited similarity. However, from that point onwards, their correlation began to diverge. This observation suggests that the BA model is proficient in replicating a network structure that closely resembles the actual network during its initial stages. Nevertheless, as time progresses, the quality of the simulation deteriorates, leading to an increasing disparity between the model and the actual network. Consequently, the BA model’s ability to accurately capture the network dynamics diminishes, limiting its viability to shorter simulation durations. By conducting these comprehensive analyses, valuable insights are gained regarding the performance and limitations of the BA model in replicating real-world network behavior. These findings contribute to a deeper understanding of the complexities underlying network growth and provide crucial information for future model enhancements.
The correlation analysis reveals a strong positive relationship between the PageRank algorithm and the number of total citations in both the IKM and DMS models. This correlation demonstrates a consistent upward trend throughout the simulation period. Notably, in the IKM model, the correlation between the PageRank algorithm increased from 0.893 to 0.952. Similarly, in the DMS model, this correlation rose from 0.902 to 0.950. Moreover, when examining the correlation between the number of total citations in the IKM and DMS models, it increased from 0.840 to 0.939 and from 0.851 to 0.937, respectively.
The positive correlation between the PageRank algorithm and both the number of total publications and the number of total citations in the IKM and DMS models is noteworthy. However, there exists a notable disparity in the number of total publications generated by these models. The correlation between the number of total publications in the IKM model’s replicated network and the actual network consistently remains around 0.880. On the other hand, the correlation between the total number of publications in the DMS model’s network and the actual network gradually improves from an initial value of 0.801 to 0.878. Despite this improvement, the correlation in the DMS model remains slightly lower than that of the IKM model. These findings indicate that while the IKM model demonstrates superiority in terms of PageRank and total citations, it also holds a distinct advantage in replicating the total number of publications.
Additionally, the highest correlation coefficient obtained through the PageRank algorithm provides insights into the evolution mechanisms of knowledge flow networks. It suggests that nodes within the knowledge flow network that contribute more knowledge tend to acquire a greater number of connections as the network evolves. In summary, the IKM model exhibits significant advancements across various aspects when compared to the BA and DMS models, with its simulation effectiveness consistently improving over time.

4 Discussion and conclusion

This paper details the construction of a meso-level knowledge flow network with knowledge units for institutional field. The institutional field knowledge flow network evolution model (IKM) was then created to simulate the evolution process, with a focus on growth mode and preference connection to align with the characteristics of the knowledge flow network’s evolution of institutional field. Upon comparison with the actual network, it has been determined that the IKM model accurately replicates the total publication distribution of the actual network and exhibits strong correlation with respect to total citation and node importance ranking. In comparison to the BA model and its extended DMS model, the IKM model demonstrates greater efficacy, confirming its rationality, scientific soundness, and superiority.
Both the BA and DMS models have limitations. The BA model is incapable of simulating the creation of new added edges between pre-existing nodes or the evolution of connecting edges between pre-existing and new joining nodes when simulating institutional field knowledge flow network evolution. Therefore, it is the least effective among the three models. The DMS model, on the other hand, considers that new joining nodes, pre-existing nodes, or even external networks can be the source node to generate new edges. The fixed number of new joining nodes and connected edges in the network do not regulate the change of new joining and pre-existing node connections, and the randomly generated connections by nodes are not aligned with the actual network evolution processes. Furthermore, the DMS utilizes node entry and probability to generate connected edges randomly, whereas in knowledge flow network evolution, the BA model proves most effective when compared with node entry and randomly generated connections. When examining the connection probability of a node, it is more appropriate to consider its out-degree rather than its in-degree. This is because the evolution of knowledge networks is primarily demonstrated through the outward diffusion and transmission of knowledge. From an overall perspective, knowledge is only spread and transmitted outwardly. Therefore, it is more logical to consider the node’s out-degree as the connection probability. The evolution of the knowledge network is more likely when nodes inherit knowledge and transmit it. However, transmission does not always lead to network evolution. To understand knowledge diffusion and propagation, it is appropriate to consider the out degree as a factor in the connectivity probability of the knowledge flow network evolution.
Additionally, research indicates that within knowledge flow networks comprised of knowledge organizations as knowledge units, new edges are more frequently produced among pre-existing nodes during the network’s evolution. This implies that primarily new connections between pre-existing nodes are responsible for driving the evolution of knowledge flow networks at the meso-level. Thus, greater focus should be given to observing changes in pre-existing nodes within the study. The model is applicable to other studies regarding the development of knowledge flow networks that comprise knowledge organizations as knowledge units.

Funding information

This work was supported in part by the National Natural Science Foundation of China under Grant 72264036, in part by the West Light Foundation of The Chinese Academy of Sciences under Grant 2020-XBQNXZ-020, Social Science Foundation of Xinjiang under Grant 2023BGL077 and the Research Program for High-level Talent Program of Xinjiang University of Finance and Economics 2022XGC041, 2022XGC042.

Author contributions

Jinzhong Guo (guojz@xjufe.edu.cn): Conceptualization (Equal), Funding acquisition (Equal), Project administration(Equal), Supervision (Equal), Writing - review & editing (Equal); Kai Wang (1181260967@qq.com): Data curation (Equal), Formal analysis (Equal), Methodology (Equal), Software (Equal), Writing - original draft (Equal); Xueqin Liao (13319903515@163.com): Formal analysis (Equal), Investigation (Equal), Validation (Equal), Writing - review & editing (Equal); Xiaoling Liu (teach_liu@163.com): Funding acquisition (Equal), Methodology (Equal), Resources (Equal), Validation (Equal), Writing - review & editing (Equal).
[1]
Barabási, A., & Albert, R. (1999). Emergence of scaling in random networks. Science, 286(5439), 509-512.

DOI PMID

[2]
Battiston, F., Musciotto, F., Wang, D., Barabási, A., Szell, M.,... Sinatra, R. (2019). Taking census of physics. Nature Reviews Physics, 1(1), 89-97.

DOI

[3]
Beckmann, M. J. (1995). Networks in action: Communication, economics and human knowledge. Economic models of knowledge networks. (pp.159-174): Springer, Berlin, Heidelberg.

[4]
Berger, V. W., & Zhou, Y.Y. (2014). Kolmogorov-smirnov test: Overview. Wiley statsref: Statistics reference online. https://doi.org/10.1002/9781118445112.stat06558.

[5]
Bianconi, G., & Barabási, A. (2001). Bose-Einstein condensation in complex networks. Physical review letters, 86(24), 5632.

[6]
Börner, K., Maru, J. T., & Goldstone, R. L. (2004). The simultaneous evolution of author and paper networks. Proceedings of the National Academy of Sciences, 101(suppl_1), 5266-5273.

DOI

[7]
Castells, M. (1996). The rise of the network society. Malden, MA: Blackwell Publishers, Inc.

[8]
Chen, Y. W., Börner, K., & Fang, S. (2013). Evolving collaboration networks in Scientometrics in 1978-2010: a micro-macro analysis. Scientometrics, 95, 1051-1070.

[9]
Clauset, A., Arbesman, S., & Larremore, D. B. (2015). Systematic inequality and hierarchy in faculty hiring networks. Science advances, 1(1), e1400005.

[10]
Benesty, J., Huang, Y. T., Chen, J. D., & Cohen, I. (2009). Pearson correlation coefficient. Noise reduction in speech processing, STSP, 2, 1-4.

[11]
Cooke, P., & Leydesdorff, L. (2006). Regional Development in the Knowledge-Based Economy: The Construction of Advantage. The Journal of Technology Transfer, 31(1), 5-15.

DOI

[12]
Daraio, C., Fabbri, F., Gavazzi, G., Izzo, M. G., Leuzzi, L., Quaglia, G.,... Ruocco, G. (2018). Assessing the interdependencies between scientific disciplinary profiles. Scientometrics, 116, 1785-1803.

DOI

[13]
Fortunato, S., Bergstrom, C. T., Börner, K., Evans, J. A., Helbing, D., Milojević, S.,... Uzzi, B. (2018). Science of science. Science, 359(6379), eaao185.

[14]
Garfield, E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108-111.

DOI

[15]
Guan, J. C., & Zhu, W. J. (2014). How knowledge diffuses across countries: a case study in the field of management. Scientometrics, 98(3), 2129-2144.

DOI

[16]
Harvey, H. B., & Sotardi, S. T. (2018). The pareto principle. Journal of the American College of Radiology, 15(6), 931.

[17]
Krapivsky, P. L., Redner, S., & Leyvraz, F. (2000). Connectivity of growing random networks. Physical review letters, 85(21), 4629.

[18]
Li, N. (2017). Evolutionary patterns of national disciplinary profiles in research: 1996-2015. Scientometrics, 111(1), 493-520.

DOI

[19]
Li, W. H., Aste, T., Caccioli, F., & Livan, G. (2019). Early coauthorship with top scientists predicts success in academic careers. Nature communications, 10(1), 5170.

[20]
Li, X., & Chen, G. R. (2003). A local-world evolving network model. Physica A: Statistical Mechanics and its Applications, 328(1-2), 274-286.

DOI

[21]
Liu, Z. H., Lai, Y. C., Ye, N., & Dasgupta, P. (2002). Connectivity distribution and attack tolerance of general networks with both preferential and random attachments. Physics Letters A, 303(5-6), 337-344.

DOI

[22]
Lyu, Y. S., Yin, M. Q., Xi, F. J., & Hu, X. J. (2022). Progress and Knowledge Transfer from Science to Technology in the Research Frontier of CRISPR Based on the LDA Model. Journal of Data and Information Science, 7(1), 1-19.

DOI

[23]
Monechi, B., Pullano, G., & Loreto, V. (2019). Efficient team structures in an open-ended cooperative creativity experiment. Proceedings of the National Academy of Sciences, 116(44), 22088-22093.

DOI

[24]
Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The pagerank citation ranking: Bring order to the web. Technical Report SIDL-WP-1999-0120, Stanford Digital Library Technologies Project.

[25]
Price, D. D. S. (1976). A general theory of bibliometric and other cumulative advantage processes. Journal of the American society for Information science, 27(5), 292-306.

DOI

[26]
Price, D. J. D. S. (1965). Networks Of Scientific Papers. Science, 149(3683), 510-515.

PMID

[27]
Shen, Z. S., Yang, L. Y., Pei, J. S., Li, M. H., Wu, C. S., Bao, J. Z., Wei, T., Di, Z. R., Rousseau, R., & Wu, J. S. (2016). Interrelations among scientific fields and their relative influences revealed by an input-output analysis. Journal of Informetrics, 10(1), 82-97.

DOI

[28]
Souma, W., Vodenska, I., & Chitkushev, L. (2020). Classification of Paper Values Based on Citation Rank and PageRank. Journal of Data and Information Science, 5(3), 57-70.

DOI

[29]
Sun, Y., & Latora, V. (2020). The evolution of knowledge within and across fields in modern physics. Scientific Reports, 10(1), 12097.

DOI PMID

[30]
Van Noorden, R. (2015). Interdisciplinary research by the numbers. Nature, 525(7569), 306-307.

DOI

[31]
Wu, D. S., Li, J., Lu, X. L., & Li, J. P. (2018). Journal editorship index for assessing the scholarly impact of academic institutions: An empirical analysis in the field of economics. Journal of Informetrics, 12(2), 448-460. doi: https://doi.org/10.1016/j.joi.2018.03.008.

[32]
Yan, E. J. (2016). Disciplinary knowledge production and diffusion in science. Journal of the Association for Information Science and Technology, 67(9), 2223-2245.

DOI

[33]
Zhang, B., & Li, Y. T. (2016). A review of the evolution model of scientific knowledge network. J. China Libr. Sci, 42, 85-101.

Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn