Research Paper

Understanding the Correlations between Social Attention and Topic Trends of Scientific Publications

  • Xianlei Dong ,
  • Jian Xu ,
  • Ying Ding ,
  • Chenwei Zhang ,
  • Kunpeng Zhang & Min Song
Expand
  • 1 School of Management Science and Engineering, Shandong Normal University, Jinan 250014, China;
    2 School of Information Management, Sun Yat-sen University, Guangzhou 510006, China;
    3 Department of Information and Library Science, Indiana University, Bloomington, IN 47405, USA;
    4 Department of Information and Decision Sciences, University of Illinois at Chicago, IL 60607, USA;
    5 Department of Library and Information Science, Yonsei University, 50 Yonsei-ro, Seoul 120-749, Republic of Korea

Received date: 2016-01-18

  Revised date: 2016-02-26

  Online published: 2016-03-15

Supported by

This work was supported by the National Research Foundation of Korea Grant funded by the Korean Government (NRF-2012-2012S1A3A2033291) and by the Yonsei University Future-leading Research Initiative of 2014.

Abstract

Purpose: We propose and apply a simplified nowcasting model to understand the correlations between social attention and topic trends of scientific publications.

Design/methodology/approach: First, topics are generated from the obesity corpus by using the latent Dirichlet allocation (LDA) algorithm and time series of keyword search trends in Google Trends are obtained. We then establish the structural time series model using data from January 2004 to December 2012, and evaluate the model using data from January 2013. We employ a state-space model to separate different non-regression components in an observational time series (i.e. the tendency and the seasonality) and apply the “spike and slab prior” and stepwise regression to analyze the correlations between the regression component and the social media attention. The two parts are combined using Markov-chain Monte Carlo sampling techniques to obtain our results.

Findings: The results of our study show that (1) the number of publications on child obesity increases at a lower rate than that of diabetes publications; (2) the number of publication on a given topic may exhibit a relationship with the season or time of year; and (3) there exists a correlation between the number of publications on a given topic and its social media attention, i.e. the search frequency related to that topic as identified by Google Trends. We found that our model is also able to predict the number of publications related to a given topic.

Research limitations: First, we study a correlation rather than causality between topics' trends and social media. As a result, the relationships might not be robust, so we cannot predict the future in the long run. Second, we cannot identify the reasons or conditions that are driving obesity topics to present such tendencies and seasonal patterns, so we might need to do “field” study in the future. Third, we need to improve the efficiency of our model by finding more efficient variable selection models, because the stepwise regression method is time consuming, especially for a large number of variables.

Practical implications: This paper analyzes publication topic trends from three perspectives: tendency, seasonality, and correlation with social media attention, providing a new perspective for identifying and understanding topical themes in academic publications.

Originality/value: To the best of our knowledge, we are the first to apply the state-space model to examine the relationships between healthcare-related publications and social media to investigate the relationships between a topic's evolvement and people's search behavior in social media. This paper thus provides a new viewpoint in the correlation analysis area, and demonstrates the value of considering social media attention in the analysis of publication topic trends.


http://ir.las.ac.cn/handle/12502/8477

Cite this article

Xianlei Dong , Jian Xu , Ying Ding , Chenwei Zhang , Kunpeng Zhang & Min Song . Understanding the Correlations between Social Attention and Topic Trends of Scientific Publications[J]. Journal of Data and Information Science, 2016 , 1(1) : 28 -49 . DOI: 10.20309/jdis.201604

References

Al-Anaswah, N., & Wilfling, B. (2011). Identification of speculative bubbles using state-space models with Markov-switching. Journal of Banking & Finance, 35(5), 1073-1086.
Andrew, H.C. (1989). Forecasting, structural time series models and the Kalman filter. Cambridge, UK: Cambridge University Press.
Blei, D.M., Griffiths, T.L., & Jordan, M.I. (2010). The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 57(2), article no. 7.
B lei, D.M., Ng, A.Y., & Jordan, M.I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993-1022.
Centers for Disease Control and Prevention. (2011). National diabetes fact sheet: National estimates and general information on diabetes and prediabetes in the United States, 2011.
Atlanta, GA: US Department of Health and Human Services, Centers for Disease Control and Prevention, 2011.
Costa, M., & Alpuim, T. (2010). Parameter estimation of state space models for univariate observations. Journal of Statistical Planning and Inference, 140(7), 1889-1902.
Daniels, S.R., Arnett, D.K., Eckel, R.H., Gidding, S.S., Hayman, L.L., Kumanyika, S.,... Williams, C.L. (2005). Overweight in children and adolescents pathophysiology, consequences, prevention, and treatment. Circulation, 111(15), 1999-2012.
De Jong, P., & Shephard, N. (1995). The simulation smoother for time series models. Biometrika, 82(2), 339-350.
Dietz, W.H. (2004). Overweight in childhood and adolescence. New England Journal of Medicine, 350(9), 855-856.
Dong, C., Shao, C., Richards, S.H., & Han, L.D. (2014). Flow rate and time mean speed predictions for the urban freeway network using state space models. Transportation Research Part C: Emerging Technologies, 43, 20-32.
Dong, X., & Bollen, J. (2015). Computational models of consumer confidence from large-scale online attention data: Crowd-sourcing econometrics. PLOS One, 10(3): e0120039.
Draper, N.R, & Smith, H. (1998). Applied regression analysis (3rd ed.). New York: John Wiley & Sons.
Durbin, J., & Koopman, S.J. (2001). Time series analysis by state space methods (2nd ed.). Oxford, UK: Oxford University Press.
Freedman, D.S., Khan, L.K., Serdula, M.K., Dietz, W.H., Srinivasan, S.R., & Berenson, G.S. (2005). The relation of childhood BMI to adult adiposity: The Bogalusa heart study. Pediatrics, 115(1), 22-27
Freedman, D.S., Mei, Z., Srinivasan, S.R., Berenson, G.S., & Dietz, W.H. (2007). Cardiovascular risk factors and excess adiposity among overweight children and adolescents: The Bogalusa heart study. Journal of Pediatrics, 150(1), 12-17.
George, E.I., & McCulloch, R.E. (1993). Variable selection via Gibbs sampling. Journal of the American Statistical Association, 88(423), 881-889.
Ghosh, A., Mukhopadhyay, S., Roy, S., & Bhattacharya, S. (2014). Bayesian inference in nonparametric dynamic state-space models. Statistical Methodology, 21, 35-48.
Kalman, R.E. (1960). A new approach to linear filtering and prediction problems. Journal of Fluids Engineering, 82(1), 35-45.
Kendall, M.G. (1962). Rank correlation methods (3rd ed.). New York: Hafner Publishing.
Kietzmann, J.H., Hermkens, K., McCarthy, I.P., & Silvestre, B.S. (2011). Social media? Get serious! Understanding the functional building blocks of social media. Business Horizons, 54(3), 241-251.
Kushi, L.H., Byers, T., Doyle, C., Bandera, E.V., McCullough, M., Gansler, T.,... Thun, M.J. (2006). American Cancer Society guidelines on nutrition and physical activity for cancer prevention: Reducing the risk of cancer with healthy food choices and physical activity. A Cancer Journal for Clinicians, 56(5), 254-281.
Li, C., Ford, E.S., Zhao, G., & Mokdad, A.H. (2009). Prevalence of pre-diabetes and its association with clustering of cardiometabolic risk factors and hyperinsulinemia among US adolescents: National Health and Nutrition Examination Survey 2005-2006. Diabetes Care, 32(2), 342- 347.
Liang, F., Paulo, R., Molina, G., Clyde, M.A., & Berger, J.O. (2008). Mixtures of g priors for Bayesian variable selection. Journal of the American Statistical Association, 103(481), 410-423.
McCausland, W.J., Miller, S., & Pelletier,D. (2011). Simulation smoothing for state—space models: A computational efficiency analysis. Computational Statistics & Data Analysis, 55(1), 199-212.
Poirier, D.J. (1995). Intermediate statistics and econometrics: A comparative approach. Cambridge, MA: MIT Press.
Priem, J., Taraborelli, D., Groth, P., & Neylon, C. (2010). Altmetrics: A manifesto. Retrieved from http://altmetrics.org/manifesto/.
Rodgers, J.L., & Nicewander, W.A. (1988). Thirteen ways to look at the correlation coefficient. The American Statistician, 42, 59-66.
Rueda, C., & Rodríguez, P. (2010). State space models for estimating and forecasting fertility. International Journal of Forecasting, 26(4), 712-724.
Scott, S.L., & Varian, H.R. (2014). Predicting the present with Bayesian structural time series. International Journal of Mathematical Modelling and Numerical Optimisation, 5(1-2), 4-23.
Unnikrishnan, K. (2012). Prediction of magnetic substorms using a state space model. Journal of Atmospheric and Solar-Terrestrial Physics, 75, 22-30.
World Health Organization (WHO). (2015). Obesity and overweight. Fact Sheet No. 311. Retrieved from www.who.int/mediacentre/factsheets/fs311/en/.
Wilcox, R.R. (2005). Introduction to robust estimation and hypothesis testing (3rd ed.). Waltham, MA: Academic Press.
Zhou, J., Hu, L., Wang, F., Lu, H.H., & Zhao, K. (2013). An efficient multidimensional fusion algorithm for IoT data based on partitioning. Tsinghua Science and Technology, 18(4): 369- 378.
Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn