Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings

Ruopeng An; Quinlan Batcheller; Junjie Wang; Yuyi Yang

doi:10.2478/jdis-2023-0014

Journal of Data and Information Science >

2023 , Vol. 8 >Issue 3: 88 - 97

DOI: https://doi.org/10.2478/jdis-2023-0014

Research Note

Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings

Ruopeng An ¹ ,
Quinlan Batcheller ¹ ,
Junjie Wang ^,²^,^† ,
Yuyi Yang ¹

Expand

¹Brown School, Washington University in St. Louis, One Brookings Drive, St. Louis, Missouri 63130, United States
²Department of kinesiology and health promotion, Dalian University of Technology, No.2 Linggong Road, Dalian 116024, China

^†Junjie Wang (wangjunjie@dlut.edu.cn; ORCID: 0003-2270-7245).

Received date: 2023-02-10

Revised date: 2023-05-08

Accepted date: 2023-05-27

Online published: 2023-06-05

Fold

Abstract

Purpose: Media exaggerations of health research may confuse readers’ understanding, erode public trust in science and medicine, and cause disease mismanagement. This study built artificial intelligence (AI) models to automatically identify and correct news headlines exaggerating obesity-related research findings.

Design/methodology/approach: We searched popular digital media outlets to collect 523 headlines exaggerating obesity-related research findings. The reasons for exaggerations include: inferring causality from observational studies, inferring human outcomes from animal research, inferring distant/end outcomes (e.g., obesity) from immediate/intermediate outcomes (e.g., calorie intake), and generalizing findings to the population from a subgroup or convenience sample. Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. We drafted an exaggeration-free counterpart for each original headline and fined-tuned a BERT model to differentiate between them. We further fine-tuned three generative language models—BART, PEGASUS, and T5 to autogenerate exaggeration-free headlines based on a journal publication’s title and abstract. Model performance was evaluated using the ROUGE metrics by comparing model-generated headlines with journal publication titles.

Findings: The fine-tuned BERT model achieved 92.5% accuracy in differentiating between exaggeration-free and original headlines. Baseline ROUGE scores averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. PEGASUS, T5, and BART all outperformed the baseline. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum.

Originality/value: This study demonstrated the feasibility of leveraging AI to automatically identify and correct news headlines exaggerating obesity-related research findings.

Key words： Artificial intelligence; Deep neural networks; News; Headlines; Exaggeration; Obesity

Cite this article

Ruopeng An , Quinlan Batcheller , Junjie Wang , Yuyi Yang . Build neural network models to identify and correct news headlines exaggerating obesity-related scientific findings[J]. Journal of Data and Information Science, 2023 , 8(3) : 88 -97 . DOI: 10.2478/jdis-2023-0014

1 Introduction

The general public increasingly uses the Internet to seek health-related information (Tan & Goonawardene, 2017). The framing of health news in digital media may profoundly influence people’s health behaviors and healthcare utilization (Wallington et al., 2010). Disinformation and misinformation have led to detrimental health consequences during the COVID-19 pandemic (Wang et al., 2022). By contrast, media coverage of health-related scientific studies is often subtler than fake news, making detection challenging even for knowledgeable readers (Sumner et al., 2014). The widespread misreporting, frequently mixing facts with exaggerations, may confuse people’s understanding, erode trust in science and medicine, and contribute to disease onset or progression (Sumner et al., 2014).

Obesity is a leading risk factor for various diseases such as type 2 diabetes, hypertension, dyslipidemia, coronary heart disease, and certain types of cancer (Pi-Sunyer, 2009). From 1976-1980 to 2017-2018, obesity prevalence more than doubled among adults and tripled among children and adolescents in the U.S. (Sanyaolu et al., 2019). Weight-related behaviors (e.g., physical activity, sedentary behaviors, and dietary intakes) and outcomes (e.g., overweight and obesity) are among the most popular topics in health news. To our knowledge, no study has explored the degree of exaggerations concerning obesity-related news headlines and potential remedies.

In recent years, there has been a surge of interest in developing models for title generation and detection. Studies have explored various techniques, ranging from TextRank (Mihalcea & Tarau, 2004), a graph-based ranking method, to the sequence-to-sequence model (Nallapati et al., 2016), to generate meaningful headlines from article content. While these methods have shown promise in understanding the linguistic structure and context of the text and in selecting the most relevant words and phrases for headline generation, limitations remain to be addressed. A notable issue is that these models might not sufficiently capture secondary but important semantic information during feature extraction, which could lead to the loss of essential details in the generated headlines (Zhao et al., 2020). Moreover, the parallel processing capabilities of these models can be limited (Zhao et al., 2020), potentially affecting their efficiency and scalability in real-world applications.

Using neural network models for identifying and correcting news headlines that exaggerate obesity-related scientific findings offers several advantages over existing methods. Traditional extraction-based approaches (Dorr et al., 2003; Kupiec et al., 1995; Zajic et al., 2002) to summarization rely heavily on text features such as word frequency and position in the text, often resulting in summaries that lack coherence and fail to capture the overall meaning of the original content. In contrast, neural networks have demonstrated the ability to learn complex patterns and relationships between words and phrases (Nuruzzaman & Hussain, 2018), making them better suited for generating accurate, informative headlines. Additionally, the capacity of neural networks to learn from large amounts of data enables them to continually improve their performance (Hadsell et al., 2020), offering greater potential for high-quality headline generation and detection in the long term. With the advent of large-scale models like T5, BART, and other transformer-based architectures, many natural language processing tasks have achieved state-of-the-art results (Wolf et al., 2020). Consequently, the application of neural network models in this domain represents a promising direction for combating misinformation and ensuring that news headlines accurately reflect scientific findings related to obesity, paving the way for further exploration of artificial intelligence (AI) applications in this area.

AI has become an indispensable tool in obesity research, with applications ranging from obesity detection to intervention (An et al., 2022). AI also plays a crucial role in disinformation campaigns and potential ways to address them (Bontridder & Poullet, 2021). This study aims to build AI models to identify and autocorrect news headlines exaggerating obesity-related scientific findings. It contributes to the literature in three aspects. First, most health communication studies focus on misinformation or disinformation, whereas investigations on exaggerations in news media remain scarce. Second, an average news consumer likely browses many headlines before reading a particular article. News headlines may serve as one’s primary or sole source of health information. Therefore, focusing on headlines exaggerating obesity-related scientific findings could benefit a significant proportion of news consumers. Finally, leveraging the power of large-scale, AI-based language models, this study achieved high accuracy in detecting exaggerated news headlines and preliminary success in auto-generating exaggeration-free headlines. Study findings may illuminate innovative AI applications to address news exaggerations and their adverse consequences.

2 Methods

2.1 Data

Our research team searched multiple digital media outlets (e.g., Health Day, Science Daily, CNN, NPR, ABC, CBS, FOX, and GMA) to collect 523 headlines that exaggerated obesity-related scientific findings. The reasons for exaggerations were classified into four types: (1) Infer causality from observational studies (e.g., cross-sectional or longitudinal surveys); (2) Infer human outcomes from animal research (e.g., mice); (3) Infer distant/end outcomes (e.g., obesity, diabetes) from immediate/intermediate outcomes (e.g., increased physical activity level, reduced calorie intake); and (4) Generalize findings to the population at large (e.g., all U.S. adults) from a subgroup or convenience sample (e.g., male veterans admitted to a local hospital). Each headline was paired with the title and abstract of the peer-reviewed journal publication covered by the news article. Moreover, an exaggeration-free counterpart was drafted for each news headline, which took the same angle as the original headline but objectively reflected the relevant scientific findings without overstatement. For example, an exaggeration-free counterpart of the original headline “Yogurt contributing to obesity epidemic, study claims” is “Most yogurt products in the U.K. have too much sugar to be considered healthy” because the relevant study reported findings concerning the immediate outcome—sugar in yogurt products rather than the distant outcome—obesity.

2.2 Identification of exaggerated headlines

The 523 news headline pairs (the original headline versus the exaggeration-free headline, together with their corresponding title and abstract of the scientific journal article) were randomly split into training (80%) and test sets (20%), with the order of the two headlines within each pair also randomized. We fine-tuned a multiple-choice deep neural network model based on the pretrained Bidirectional Encoder Representations from Transformers (BERT) model (i.e., bert-base-uncased in Hugging Face) on the training set, with accuracy calculated from the test set. Developed by Google in 2018, BERT was pretrained on BooksCorpus and English Wikipedia and widely adopted to perform natural language processing (NLP) tasks. In our specific use case, the fine-tuned multiple-choice model used the learned representations of the study title and abstract to differentiate the original from the exaggeration-free headline within each headline pair.

2.3 Auto-generation of exaggeration-free headlines

We fine-tuned three generative language models on the training set based on the pretrained Bidirectional Auto-Regressive Transformers (BART), Pre-training with Extracted Gap-sentences for Abstractive Summarization (PEGASUS), and Text-to-Text-Transfer-Transformer (T5) models. During fine-tuning, those models learned the complex relationship linking the study title and abstract to their corresponding exaggeration-free headline. The fine-tuned models were subsequently used to generate headlines based on the titles and abstracts in the test set. The quality of the generated headlines was evaluated using the Recall-Oriented Understudy for Gisting Evaluation (ROUGE) metrics—ROUGE-1, ROUGE-2, ROUGE-L, and ROUGE-Lsum. ROUGE is a set of metrics for assessing automatic summarization or machine translation in NLP by comparing model-produced against human-produced (i.e., reference) summary or translation. ROUGE-1 and ROUGE-2 evaluate the overlap of unigram (each word) and bigrams (every two words) between the model and reference summaries; ROUGE-L calculates the score per sentence and averages it for the summary; and ROUGE-Lsum calculates the score directly over the entire summary. All ROUGE scores range from 0 (lowest quality) to 1 (highest quality). The scores of BART, PEGASUS, and T5 models were compared to those of the baseline attained by the study title because it serves as an accurate summary of scientific findings but likely hinders the general audience’s comprehension owing to its terminological framing.

3 Results

The fine-tuned BERT model achieved a 92.5% accuracy in differentiating exaggeration-free headlines from the original headlines in the test set. Table 1 reports ROUGE scores for the baseline and the three fine-tuned generative models. The baseline ROUGE scores comparing the study titles with exaggeration-free headlines averaged 0.311 for ROUGE-1, 0.113 for ROUGE-2, 0.253 for ROUGE-L, and 0.253 ROUGE-Lsum. All three generative models—PEGASUS, T5, and BART outperformed the baseline with ascending ROUGE scores. The best-performing BART model attained 0.447 for ROUGE-1, 0.221 for ROUGE-2, 0.402 for ROUGE-L, and 0.402 for ROUGE-Lsum. For example, an exaggeration-free headline is “Dense breasts and high BMI may interact to increase breast cancer risk in premenopausal women,” compared to the model-generated headline “Mammographic breast density and BMI linked to increased risk of breast cancer in postmenopausal women, study finds.”

Table 1. ROUGE scores of the baseline and the three fine-tuned generative language models.

Model	rouge1	rouge2	rougeL	rougeLsum
Baseline	0.311	0.113	0.253	0.253
BART	0.447	0.221	0.402	0.402
PEGASUS	0.337	0.145	0.276	0.276
T5	0.403	0.188	0.360	0.360

4 Discussion

Information explosion in the digital age often results in cognitive overload and undermines comprehension (Fan et al., 2021). On the one hand, as the primary venue for scholarly communication, peer-reviewed journal publications prove too technical and terminologically heavy for the general public. On the other hand, only a fraction of scientific publications is covered by news media. Among them, exaggerations and misreporting are not uncommon, which may confuse and mislead health information seekers (Sumner et al., 2014). Drafting timely, accurate, jargon-free news articles concerning scientific publications entails substantial domain knowledge and writing skills, often requiring years of multidisciplinary training and practice. The exponential growth in AI technologies during the past decade enables machines to perform increasingly complex cognitive tasks (e.g., writing, painting, composing) traditionally reserved only for humans (Hughes et al., 2021). This study aimed to train deep neural network models to identify and correct exaggerations in obesity-related news headlines without human supervision. Such applications hold the potential to automate news reporting of scientific discoveries. The advantages over human-drafted news may include removing sociopolitical biases, drafting large volumes of news reports in near real-time, and costing little to none per news article generated. Fine-tuned models embedded in mobile apps could make inferences on the cloud or a terminal device (e.g., smartphone, iPad). Hence, users have discretions over when and for what scientific publications to generate news reports.

Despite high promises, several limitations concerning AI technologies and our specific use case are worth noting. First, large language models (LLMs) like BERT, BART, T5, and PEGASUS were trained on vast amounts of human-written texts and may struggle to ground their generated text on factual information (Petroni et al., 2020). Due to the inherent biases present in the training data, these models might inadvertently perpetuate or exacerbate existing biases in generated headlines. Moreover, LLMs may incorporate factual inaccuracies, outdated information, or misleading content, leading to inconsistencies or contradictions in generated headlines (Tang et al., 2023), especially when dealing with complex or ambiguous input texts. Consequently, ensuring factual grounding, accurate representation of source content, and addressing biases in the models remain significant challenges for LLMs in producing reliable, bias-free headlines. Second, most NLP foundation models can only handle relatively short passages (i.e., 512-1,024 words), which should suffice title/abstract comprehension but fall short of a full-length peer-reviewed journal article. More recent models, such as Big Bird and Longformer, accept longer text, up to 4,096 words. But in general, today’s mainstream NLP model architectures, such as transformers and recurrent neural networks, are not optimized to consume or produce long passages (e.g., text with over 4,000 words). Third, the substantial growth in the number of parameters in LLMs has led to increased computational costs for fine-tuning such large-scale models. This drawback could potentially render them impractical for real-world applications. Fourth, our application focuses solely on obesity-related scientific discoveries owing to our team’s domain expertise. To capitalize on large language models’ potential for generating news reports concerning distinct scientific domains, those models need to be fine-tuned on large-scale, diverse text corpora, with examples pairing news reports with scientific publications. Constructing such datasets demands substantial human and financial resources. Fifth, this study built AI models to autogenerate news headlines, an initial step for creating full-length news articles reporting scientific findings. Substantial future endeavors and technological advancement may be required to fulfill that goal.

Acknowledging those limitations offers a foundation for future improvements and research in headline generation. First, vector embedding techniques such as Word2Vec, GloVe, or BERT embeddings might provide a promising approach (Prottasha et al., 2022). These embeddings capture semantic relationships between words, phrases, and sentences in a continuous vector space, enabling models to better comprehend the underlying meaning and context of the text and generate more accurate and coherent headlines. Second, incorporating grounding and reasoning models, as proposed by Yoshua Bengio (Bengio & Hu, 2023), could help enhance the factual accuracy and consistency of generated headlines. Third, integrating human fact-checking into the headline-generation process may offer an additional layer of verification, ensuring that generated headlines are accurate and free from exaggeration.

It is essential to differentiate between exaggeration and disinformation in the context of our study. While both can distort the understanding of scientific findings, exaggeration specifically refers to the overstatement or amplification of certain aspects, whereas disinformation involves intentionally spreading false information. One crucial factor to consider is the subtleness of exaggerations, which often requires domain experts to identify and correct. Exaggerations are typically slight deviations from the study findings rather than obvious distortions or fabrications. These exaggerations could be either intentional or unintentional, arising from factors such as a lack of knowledge and understanding of scientific methods and implications.

By focusing on exaggeration in news headlines, our research seeks to improve the clarity and accuracy of public health communication, particularly regarding obesity-related scientific findings. It is worth noting that exaggerations may be more prevalent among science news than disinformation, which often appear in social media rather than in credible, large mass media outlets. This distinction emphasizes the unique contributions of our study and highlights the importance of addressing exaggeration in news media to foster better public comprehension of scientific information.

In our study, we built neural network models to automate identifying and revising exaggerated news headlines about obesity studies. This innovative approach can assist media professionals in mitigating the risk of exaggeration and enhancing the overall quality of health information presented to the public. Moreover, integrating neural network models into the news production process can complement other efforts, such as educational initiatives targeting journalists and other media professionals. These initiatives may reduce the likelihood of unintentional exaggerations by improving their understanding of scientific methods and implications. Combining expert collaboration, educational interventions, and the implementation of neural network models may create a multi-faceted approach to fostering accurate and clear public health communication.

Author contributions

R.A.(ruopeng@wustl.edu) designed the study, built the models, and wrote the manuscript. Q.B.( b.quinlan@wustl.edu) and J.W.( wangjunjie@dlut.edu.cn) drafted the exaggeration-free news headlines under R.A.’s supervision. J.W. collected news headlines exaggerating obesity-related research findings and their corresponding peer-reviewed journal publications. Y.Y. (y.yuyi@wustl.edu)revised the manuscript under R.A.’s supervision.

References

Publishing order | Descend order by publishing year | Descend order by cited within

[1]	An, R. P., Shen, J., & Xiao, Y. Y. (2022). Applications of artificial intelligence to obesity research: scoping review of methodologies. Journal of Medical Internet Research, 24(12), Article e40589. https://www.jmir.org/2022/12/e40589.

[2]	Bengio, Y., & Hu, E. J. (2023, March 21). Scaling in the service of reasoning & model-based ML. Yoshua Bengio. Retrieved May 7, 2023, from https://yoshuabengio.org/2023/03/21/scaling-in-the-service-of-reasoning-model-based-ml/.

[3]	Bontridder, N., & Poullet, Y. (2021). The role of artificial intelligence in disinformation. Data & Policy, 3, Article e32. https://doi.org/10.1017/dap.2021.20.

[4]	Dorr, B., Zajic, D., & Schwartz, R. (2003). Hedge trimmer: A parse-and-trim approach to headline generation. MARYLAND UNIV COLLEGE PARK INST FOR ADVANCED COMPUTER STUDIES. https://dl.acm.org/doi/10.3115/1119467.1119468.

[5]

Fan,

M. Y.

, Huang,

Y. C.

, Qalati,

S. A.

, Shah,

S. M. M.

, Ostic,

, & Pu,

Z. J.

(2021). Effects of information overload, communication overload, and inequality on digital distrust: a cyber-violence behavior mechanism. Frontiers in Psychology, 12, Article 643981. https://doi.org/10.3389/fpsyg.2021.643981.

[6]	Hadsell, R., Rao, D., Rusu, A. A., & Pascanu, R. (2020). Embracing Change: Continual Learning in Deep Neural Networks. In Trends in Cognitive Sciences (Vol. 24, Issue 12). https://doi.org/10.1016/j.tics.2020.09.004.

[7]

Hughes,

R. T.

, Zhu,

L. M.

, & Bednarz,

(2021). Generative adversarial networks-enabled human-artificial intelligence collaborative applications for creative and design industries: a systematic review of current approaches and trends. Frontiers in Artificial Intelligence, 4, Article 604234. https://doi.org/10.3389/frai.2021.604234.

[8]	Kupiec, J., Pedersen, J., & Chen, F. (1995). Trainable document summarizer. SIGIR Forum (ACM Special Interest Group on Information Retrieval). https://doi.org/10.1145/215206.215333.

[9]	Mihalcea, R., & Tarau, P. (2004, July). Textrank: Bringing order into text. In Proceedings of the 2004 conference on empirical methods in natural language processing. (pp. 404-411)

[10]	Nallapati, R., Zhou, B. W., dos Santos, C., Gulçehre, Ç., & Xiang, B. (2016). Abstractive text summarization using sequence-to-sequence RNNs and beyond. CoNLL 2016 - 20th SIGNLL Conference on Computational Natural Language Learning, Proceedings. https://doi.org/10.18653/v1/k16-1028.

[11]	Nuruzzaman, M., & Hussain, O. K. (2018, October). A survey on chatbot implementation in customer service industry through deep neural networks. In 2018 IEEE 15th International Conference on e-Business Engineering (ICEBE) (pp.54-61). IEEE.

[12]	Petroni, F., Lewis, P., Piktus, A., Rocktäschel, T., Wu, Y. X., Miller, A. H., & Riedel, S. (2020). How context affects language models’ factual predictions. arXiv preprint arXiv:2005.04611.

[13]	Pi-Sunyer, X. (2009). The medical risks of obesity. Postgraduate Medicine, 121(6): 21-33. https://doi.org/10.3810/pgm.2009.11.2074. DOI PMID

[14]	Prottasha, N. J., Sami, A. A., Kowsher, M., Murad, S. A., Bairagi, A. K., Masud, M., & Baz, M. (2022). Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning. Sensors, 22(11). https://doi.org/10.3390/s22114157.

[15]	Sanyaolu, A., Okorie, C., Qi, X. H., Locke, J., & Rehman, S. (2019). Childhood and adolescent obesity in the United States: a public health concern. Global Pediatric Health, 6, Article 2333794X19891305. https://doi.org/10.1177/2333794X19891305.

[16]

Sumner,

, Vivian-Griffiths,

, Boivin,

, Williams,

, Venetis,

C. A.

, Davies,

, Ogden,

, Whelan,

, Hughes,

, Dalton,

, Boy,

, & Chambers,

C. D.

(2014). The association between exaggeration in health-related science news and academic press releases: retrospective observational study. BMJ, 349, Article g7015. https://www.bmj.com/content/349/bmj.g7015.

[17]	Tan, S. S., & Goonawardene, N. (2017). Internet health information seeking and the patient-physician relationship: a systematic review. Journal of Medical Internet Research, 19(1), Article e9. https://www.jmir.org/2017/1/e9.

[18]	Tang, L. Y., Sun, Z. Y., Idnay, B., Nestor, J. G., Soroush, A., Elias, P. A.,... & Peng, Y. F. (2023). Evaluating Large Language Models on Medical Evidence Summarization. medRxiv, 2023-04. https://www.medrxiv.org/content/10.1101/2023.04.22.23288967v1

[19]

Wallington,

S. F.

, Blake,

, Taylor-Clark,

, & Viswanath,

(2010). Antecedents to agenda setting and framing in health news: an examination of priority, angle, source, and resource usage from a national survey of U.S. health reporters and editors. Journal of Health Communication, 15(1), 76-94. https://doi.org/10.1080/10810730903460559.

DOI PMID

[20]	Wang, Y. X., Bye, J., Bales, K., Gurdasani, D., Mehta, A., Abba-Aji, M., Stuckler, D., & McKee, M. (2022). Understanding and neutralizing covid-19 misinformation and disinformation. BMJ, 379, Article e070331. https://doi.org/10.1136/bmj-2022-070331.

[21]	Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A.,... & Rush, A. M. (2020, October). Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations (pp. 38-45).

[22]	Zajic, D., Dorr, B., & Schwartz, R. (2002, July). Automatic headline generation for newspaper stories. In Workshop on automatic summarization (pp. 78-85).

[23]	Zhao, S., Deng, E. C., Liao, M. F., Liu, W., & Mao, W. M. (2020). Generating summary using sequence to sequence model. Proceedings of 2020 IEEE 5th Information Technology and Mechatronics Engineering Conference, ITOEC 2020. https://doi.org/10.1109/ITOEC49072.2020.9141919.

Options

Outlines

模态框（Modal）标题