1 Introduction
2 Literature review
2.1 Factors influencing scientists’ mobility
2.2 Effects of mobility on scientific performance
3 Data and method
3.1 Scientists’ mobility data and outcome variable definition
3.2 Propensity score matching (PSM)
Figure 1. Visualization of PSM procedure. We collected all papers published in Chinese institutions from 2014 to 2017 to identify scientists’ mobility and selected scientists who worked in two or more institutions and whose mobility time fell between 2014 and 2017. The control group consisted of scientists with only one employer university during the observed time, resulting in a data set of about 100,000 people. Then we used PSM matching methods to find scientists with similar prior measures to the “moved” scientist and obtained 2,586 pairs of matched individuals between moved and unmoved groups. The matched pre-movement variables include (1) research age (i.e. the number of years since his/her first publication). (2) # Publications. (3) # Citation. (4) # Coauthors. (5) DFC University (i.e. the university is the Double First-Class or not) (6) University prestige (the percentile measured by the average citation per paper hosted by this university). (7) region (i.e. Beijing, Guangdong, etc.). (8) part (i.e. East China, North China, etc.). (9) discipline (i.e. Biology, Business, etc.). (10) Project 985/211 (the different Project types of the institutions, 2 for both “Project 985” and “Project 211”, 1 for only “Project 211”, and 0 for no project). The e(X) represents the estimated propensity score for a scientist with 10 covariates. Then PSM uses those scores to choose similar “moved” and “unmoved” individuals by the nearest neighbor algorithm. |
3.3 Ordered logistic regression model
4 Results
4.1 Domestic scientific flow in China
Figure 2. Chinese scientific flows at the scale of regions and provinces. A. The Sankey graph shows the scientific labor flow happening between 2014 and 2017 among regions in China (followed by Chinese administrative divisions). The left side represents the source regions, while the right side shows the destination regions. The width of a band represents the frequency of the labor flow. B. The directed circular chart shows the scientific labor flow happening between 2014 and 2017 among the main provinces in China. Only the top 20 provinces or municipalities with the highest frequency of scientists’ mobility are displayed. The width of the arc represents the frequency. |
Figure 3. Chinese scientific flow in the scale of regions and university prestige. A. The scatter plot of Inflow vs. Outflow in Chinese Regions. B. The scatter plot of Inflow vs. Number of DFC institutions in Chinese Regions. C. The pie chart depicts the proportion of transitions between other institutions and DFC institutions. |
4.2 Matching contenders
Figure 4. Comparisons of multiple measures between moved and unmoved groups of scientists. Panel A-C illustrate the distributions (in logarithmic scale) of the number of publications, number of citations, and number of collaborators, and Panel D the level of employer institution (as a percentage) for the “moved” and “unmoved” group of scientists before and after the year of mobility. Panel E displays the estimated coefficients of differences between the two experimental groups obtained using the t-test. |
4.3 Scientific collaboration and scientists’ mobility
4.4 Citation impact and scientists’ mobility
4.5 University hierarchy of scientists’ mobility
4.6 Heterogeneity in mobility outcomes by career stage
Figure 5. Comparisons between moved and unmoved groups of scientists with short and long tenure. A. The estimated t-test coefficients of differences between the two experimental groups of scientists with short tenure. B. The estimated t-test coefficients of differences between the two experimental groups of scientists with long tenure. |
4.7 Robustness check of PSM
Table 1. Balance table of covariates before and after PSM. |
| Variable | Mean | t-Test | |||||
|---|---|---|---|---|---|---|---|
| Sample | “moved” | “unmoved” | bias% | |bias| | t | p value | |
| Research age | After PSM | 5.2289 | 5.1983 | 0.57 | 33.3 | -0.2134 | 0.8310 |
| Before PSM | 5.2289 | 5.2500 | -0.38 | 0.2090 | 0.8345 | ||
| #Publications | After PSM | 2.2392 | 2.2184 | 3.44 | 8.14 | -1.2916 | 0.1965 |
| Before PSM | 2.4497 | 2.4531 | -3.16 | -0.1912 | 0.8484 | ||
| #Citations | After PSM | 1.8512 | 1.8479 | 0.31 | 98.39 | -0.1179 | 0.9061 |
| Before PSM | 3.0165 | 3.2874 | 19.46 | -10.2247 | 0.0000 | ||
| #Collaborators | After PSM | 2.6206 | 2.6206 | -0.01 | 99.98 | 0.0030 | 0.9976 |
| Before PSM | 3.0468 | 2.6210 | -51.00 | 21.6664 | 0.0000 | ||
| DFC University | After PSM | 0.6557 | 0.6661 | -2.18 | 61.79 | 0.8186 | 0.4131 |
| Before PSM | 0.4751 | 0.6661 | 5.71 | -14.7624 | 0.0000 | ||
| University prestige | After PSM | 0.9114 | 0.9107 | 1.23 | 91.29 | -0.4624 | 0.6438 |
| Before PSM | 0.8669 | 0.9107 | 14.18 | -21.0350 | 0.0000 | ||
| Project 985/211 | After PSM | 1.1302 | 1.1520 | -2.74 | 76.10 | 1.0279 | 0.3041 |
| Before PSM | 0.6958 | 1.1520 | 11.46 | -6.0741 | 0.0000 | ||
Figure 6. Distribution diagram of endogeneity test for variables. The figure displays the distribution of actual mean differences (black dashed lines) between the two groups and the mean differences generated from 100 random allocations separately. The variable names corresponding to the distribution maps A-D are consistent with those in Figure 4. |
4.8 Drivers of prestige-related mobility
Table 2. Estimated coefficients of ordered logistic regression. |
| Model 1 | Model 2 | Model 3 | Model 4 | |
|---|---|---|---|---|
| #Publications | -0.0188 | -0.1336* | ||
| 0.0583 | 0.0664 | |||
| #Citations | 0.0755* | 0.0672 | ||
| 0.0351 | 0.0390 | |||
| #Collaborators | 0.1449** | 0.1585** | ||
| 0.0514 | 0.0580 | |||
| Region division | Yes | Yes | Yes | Yes |
| N | 2,896 | 2,896 | 2,896 | 2,896 |
| Pseudo R2 | 0.0433 | 0.0441 | 0.0446 | 0.0456 |
Note: *p<0.05, **p<0.01, ***p<0.001. |
Figure 7. Estimated probabilities of institutional mobility by scientific performance indicators, based on separate and combined regression models. The x-axes represent the number of publications (A, B), citations (C, D), and collaborators (E,F). Panels A, C, and E show marginal effects from separate regression models, while Panels B, D, and F present results from a combined model including all three indicators. The y-axis indicates the predicted probability of scientists moving from a non-DFC university to a DFC one (rank = 2), making a lateral move (rank = 1), or moving from a DFC to a non-DFC university (rank = 0). |
5 Discussion
Figure 8. Comparison of University prestige between moved and unmoved groups of scientists. Violin plot shows the distribution of university prestige percentiles (core indicator for measuring university prestige). Statistical results are from t-tests. |


