1 Introduction
2 Related work
2.1 SPO triples: a structured representation for knowledge claims
2.2 Nanopublication: a richer semantic representation for knowledge claims
2.3 Uncertainty: a fundamental role in representing and communicating claims in scientific literature
3 Methods
Figure 1. Research framework for extracting and measuring computable biomedical knowledge. |
3.1 Extracting structured biomedical knowledge
3.1.1 Collecting related publications
3.1.2 Extracting SPO triples
of MEDLINE publications with essential metadata. We constructed our dataset by consolidating the tables in the SemMedDB database, including information about each concept, SPO triple, sentence, and publication. SemMedDB version 43R was retrieved on February 8, 2021 and used in our study.
3.1.3 Classifying semantic types and relations
3.2 Identifying and measuring uncertainty of biomedical knowledge
3.2.1 Uncertainty types and uncertainty cue words
Table 1. Frequencies of the uncertain cue words. |
Cue words | Frequency in all SemMedDB sentences | Frequency in our Triples Store sentences | ||
---|---|---|---|---|
Unknown lexicon | ||||
uncertain* | 227,014 | (10.57‱) | 191 | (12.87‱) |
unknown | 525,536 | (24.48‱) | 499 | (33.62‱) |
Hedging lexicon | ||||
maybe | 10,286 | (0.48‱) | 142 | (9.57‱) |
may | 5,946,955 | (276.96‱) | 10,050 | (677.19‱) |
might | 949,536 | (44.22‱) | 2,639 | (177.82‱) |
possible | 1,751,994 | (81.59‱) | 1,611 | (108.55‱) |
potential | 2,879,336 | (134.10‱) | 2,675 | (180.25‱) |
seems | 333,677 | (15.54‱) | 154 | (10.38‱) |
perhaps | 84,058 | (3.91‱) | 32 | (2.16‱) |
likely | 1,052,986 | (49.04‱) | 1,248 | (84.09‱) |
sometimes | 119,942 | (5.59‱) | 26 | (1.75‱) |
Conflicting lexicon | ||||
conflict* | 175,516 | (8.17‱) | 59 | (3.98‱) |
contradict* | 46,639 | (2.17‱) | 10 | (0.67‱) |
controvers* | 208,264 | (9.70‱) | 308 | (20.75‱) |
debat* | 122,332 | (5.70‱) | 54 | (3.64‱) |
no consensus | 17,907 | (0.83‱) | 9 | (0.61‱) |
questionable* | 21,159 | (0.99‱) | 5 | (0.34‱) |
refut* | 9,710 | (0.45‱) | 9 | (0.61‱) |
“*” stands for all of the possible derivations of the word. |
3.3.2 Information entropy
4 Results
4.1 Overview of structured biomedical knowledge in cardiovascular research in China
Figure 2. The evolution of structured biomedical knowledge in cardiovascular research in China. |
Table 2. Examples of SPO triples extracted from scientific statements. |
# | PMID | Year | Sentence | Subject | Predicate | Object |
---|---|---|---|---|---|---|
1 | 11748351 | 2001 | 95232561. The high level of low-density lipoprotein (LDL) is a risk factor for cardiovascular disease. | Low-Density Lipoproteins | PREDISPOSES | Cardiovascular Diseases |
(Chem. & Drugs) | (others) | (Disorders) | ||||
2 | 16541193 | 2006 | 62140. Our finding suggests that the CYP2C9*3 gene variant significantly alters the plasma concentration and acute DBP response at the 6-h point following irbesartan treatment in Chinese hypertensive patients | Hypertensive | PROCESS_OF | Patients |
(Disorders) | (functionally_related_to) | Living Beings | ||||
Therapeutic procedure | USES | irbesartan | ||||
Procedures | (functionally_related_to) | (Chem. & Drugs) | ||||
irbesartan | TREATS | Patients | ||||
(Chem. & Drugs) | (functionally_related_to) | Living Beings | ||||
irbesartan | TREATS | Hypertensive | ||||
(Chem. & Drugs) | (functionally_related_to) | (Disorders) | ||||
3 | 27733220 | 2016 | 7853995. Subgroup analysis for each outcome measure was performed for the observing time point after the transplantation of MSCs. | Stem cells | PART_OF | Marrow |
Anatomy | physically_related_to | Anatomy | ||||
4 | 32421381 | 2020 | 345332630. Our data seem to suggest that COVID-19 is probably an additional risk factor for DVT in hospitalized patients. | COVID-19 | PREDISPOSES | Deep Vein Thrombosis |
(Disorders) | others | Physiology | ||||
5 | 32493073 | 2020 | 345298871. COVID-19 presented with deep vein thrombosis: an unusual presentation. | COVID-19 | COEXISTS_WITH | Deep Vein Thrombosis |
(Disorders) | others | Physiology | ||||
6 | 32351121 | 2020 | 345367243. In this brief review, we will elaborate on the role of RAS and ACE2 in pathogenesis of COVID-19. | Angiotensin converting enzyme 2 | CAUSES | COVID-19 |
(Chem. & Drugs) | (functionally_related_to) | (Disorders) |
Terms in parentheses stand for semantic groups. "Chem. & Drugs" is short for "Chemicals & Drugs". |
4.2 Evaluation of uncertain biomedical knowledge in cardiovascular research
Table 3. Top 10 SO pairs with the highest IE value. |
# | Subject_Object Pair | Start year | End year | # Sentence | IE | ||
---|---|---|---|---|---|---|---|
1 | Polymorphism, Genetic (Genetic Function / Physiology) | _ | Coronary Arteriosclerosis (Disease or Syndrome / Disorders) | 2010 | 2019 | 9 | 1.837 |
2 | Fibrinogen (AAPP / Chem. & Drugs) | _ | Ischemic stroke (Disease or Syndrome / Disorders) | 2006 | 2015 | 5 | 1.522 |
3 | Vascular Diseases (Disease or Syndrome / Disorders) | _ | Human (Human / Living Beings) | 2003 | 2013 | 4 | 1.500 |
4 | Epinephrine (Hormone / Chem. & Drugs) | _ | Cardiopulmonary Arrest (Pathologic Function / Disorders) | 2007 | 2007 | 4 | 1.500 |
5 | Ischemic stroke (Disease or Syndrome / Disorders) | _ | Variation (Genetics) (NPOP / Phenomena) | 2008 | 2016 | 4 | 1.500 |
6 | Gene Expression (Genetic Function / Physiology) | _ | Population Group (Human / Living Beings) | 2016 | 2018 | 4 | 1.500 |
7 | Reactive Oxygen Species (BACS / Chem. & Drugs) | _ | Apoptosis (Cell Function / Physiology) | 2017 | 2020 | 4 | 1.500 |
8 | Basal Ganglia (BPOC / Anatomy) | _ | Hematoma (Pathologic Function / Disorders) | 2012 | 2017 | 4 | 1.500 |
9 | HMG-CoA Reductase Inhibitors (Organic Chemical /Chem. & Drugs) | _ | Acute coronary Syndrome (Disease or Syndrome / Disorders) | 2009 | 2019 | 6 | 1.459 |
10 | Hyperuricemia (Disease or Syndrome / Disorders) | _ | Hypertensive Disease (Disease or Syndrome / Disorders) | 2012 | 2019 | 14 | 1.430 |
Terms in parentheses stand for semantic types / semantic groups. “Chem. & Drugs” is short for “Chemicals & Drugs;” “AAPP” is short for “Amino Acid, Peptide, or Protein;” “BACS” is short for “Biologically Active Substance;” “BPOC” is short for “Body Part, Organ, or Organ Component;” and “NPOP” is short for “Natural Phenomenon or Process.” |
Table 4. Sample sentences from SO pairs with high IE. |
Polymorphism, Genetic_Coronary Arteriosclerosis | ||||
---|---|---|---|---|
# | PMID | Year | Predicate | Sentence |
1.1 | 20668462 | 2010 | AFFECTS | To clarify whether polymorphisms of the RAGE gene were related to CAD, we performed a case-control study in Chinese Han patients. |
1.2 | 22363637 | 2012 | AFFECTS | Our findings failed to demonstrate a correlation between (CAG)(n) polymorphism with CAD; however, we concluded that the rare 21bp deletion might have a more compelling effect on CAD than the common (CAG)(n) polymorphism, and MEF2A genetic variant might be a rare but specific cause of CAD/MI. |
1.3 | 22345093 | 2012 | NEG_AFFECTS | The three other polymorphisms of the RAS do not seem to influence the development of CAD in type 2 diabetes. |
1.4 | 23583798 | 2013 | AFFECTS | However, given the limited number of studies and the potential biases, the influence of this (myeloperoxidase (MPO) G463A) polymorphism on CAD risk needs further investigation. |
1.5 | 24155913 | 2013 | CAUSES | Our findings provided strong evidence for the potentially contributory roles of RAGE multiple genetic polymorphisms, especially in the context of locus-to-locus interaction, in the pathogenesis of CAD among northeastern Han Chinese. |
1.6 | 24239227 | 2014 | AFFECTS | Some polymorphisms in the fibroblast growth factor receptor 4 gene (FGFR-4) have been correlated with coronary artery disease, however, the role of polymorphisms in the FGFR-4 gene in ischemic stroke remain unknown. |
1.7 | 27323132 | 2016 | AFFECTS | There is growing evidence that polymorphisms in NOS3 influence the progression of CAD; however, there is also a controversy regarding the association of polymorphisms in the gene encoding NOS3 and CAD. |
1.8 | 30826813 | 2019 | AFFECTS | Studies have reported that inflammatory cytokine interleukin-8 (IL-8) gene -251 A/T (rs4073) polymorphism is correlated with CAD susceptibility, but the result remains controversial. |
1.9 | 31770200 | 2019 | AFFECTS | Thus, a meta-analysis was conducted to reassess the effects of this (interleukin-8 gene) polymorphism on CAD risks. |
Hydroxymethylglutaryl-CoA Reductase Inhibitors_Acute Coronary Syndrome | ||||
# | PMID | Year | Predicate | Sentence |
9.1 | 19781407 | 2009 | TREATS | Rationale and design of China intensive lipid lowering with statins in acute coronary syndrome: the CHILLAS study. |
9.2 | 23990595 | 2014 | TREATS | The effect of statins in patients with acute coronary syndrome (ACS) at advanced age with lower low-density lipoprotein cholesterol (LDL-C) levels undergoing percutaneous coronary intervention (PCI) remains unknown. |
9.3 | 24216317 | 2014 | TREATS | Results of the current study merit further investigation of the early use of statins in patients with NSTE-ACS to delineate patient subgroups who may benefit from this therapy. |
9.4 | 25879728 | 2015 | TREATS | Combination therapy analysis of ezetimibe and statins in Chinese patients with acute coronary syndrome and type 2 diabetes. |
9.5 | 25879728 | 2015 | TREATS | The effects and safety of the combined treatment of ezetimibe (EZ) and statins in Chinese patients with acute coronary syndrome (ACS) and type 2 diabetes mellitus (T2DM) remain unknown. |
9.6 | 30986750 | 2019 | TREATS | This study aimed to assess use of prehospital statins and LDL-C levels at admission in ACS patients with history of MI or revascularization. |
Hyperuricemia_Hypertensive Disease | ||||
# | PMID | Year | Predicate | Sentence |
10.1 | 22377586 | 2012 | PREDISPOSES | Multivariate logistic regression showed that age, gender, overweight/obesity, dyslipidemia and alcohol use were risk factors for prehypertension, and age, overweight/obesity, dyslipidemia, alcohol use, family history of hypertension and hyperuricemia were risk factors for hypertension. |
10.2 | 24905962 | 2014 | PREDISPOSES | In addition, after adjusting for potential confounders, hyperuricemia was associated with increased risk of hypertension in both males and females, with odds ratios (95% CI) of 1.680 (1.110-2.543) and 1.065 (1.012-1.118), respectively. |
10.3 | 25437867 | 2014 | PREDISPOSES | Hyperuricemia may modestly increase the risk of hypertension incidence, consistent with a dose-response relationship. |
10.4 | 25863573 | 2015 | PREDISPOSES | Besides traditional risk factors, multiple logistic regression analysis indicated that obesity, diabetes, dyslipidemia, and hyperuricemia were becoming risk factors for hypertension in this rural area. The status of hypertension is grim currently in rural Northeast China. |
10.5 | 25919438 | 2015 | PREDISPOSES | Hyperuricemia is an independent risk factor for hypertension. |
10.6 | 27129957 | 2016 | PREDISPOSES | A Kaplan-Meier survival analysis showed that hyperuricemia predicted higher incidences of hypertension in a dose-dependent manner: hypertension onset significantly differed across SUA quartiles. |
10.7 | 28176036 | 2017 | PREDISPOSES | Whether hyperuricemia is an independent risk factor for hypertension in adults is still under debate. |
10.8 | 28808071 | 2017 | PREDISPOSES | Temporal Relationship Between Hyperuricemia and Insulin Resistance and Its Impact on Future Risk of Hypertension. |
10.9 | 28808071 | 2017 | PREDISPOSES | Although hyperuricemia and insulin resistance significantly correlated, their temporal sequence and how the sequence influence on future risk of hypertension are largely unknown. |
10.10 | 29390287 | 2017 | PREDISPOSES | HUA was also a risk factor for hypertension in this age group (odds ratio 1.425, 95% confidence interval, 1.217-1.668, P <.001). |
10.11 | 28445311 | 2017 | PREDISPOSES | Besides, after adjustment for confounding variables, hyperuricemia was associated with an increased risk of hypertension in both male and female patients, with odds ratios of 2.152 (95% confidence interval 1.324-3.498) and 2.133 (95% confidence interval 1.409-3.229), respectively. |
10.12 | 28445311 | 2017 | PREDISPOSES | Hyperuricemia was significantly associated with the risk of hypertension. |
10.13 | 30817449 | 2019 | PREDISPOSES | This study aimed to develop a cumulative score composed of seven risk factors: age, resting heart rate, overweight or obesity, dyslipidemia, hyperuricemia, impaired glucose regulation, and impaired estimated glomerular filtration rate (eGFR), to evaluate the risk of new-onset hypertension. |
10.14 | 31908434 | 2019 | AFFECTS | Hyperuricemia is an important potential pathogenic factor for hypertension, cardiovascular disease and stroke. |
4.3 Visual presentation of the SO pairs in cardiovascular literature
Figure 3. Network visualization of co-occurrence of SO pairs. |
5 Discussion and conclusion
5.1 Identification of uncertain biomedical knowledge from scientific statements in cardiovascular literature
5.2 Distribution characteristics of uncertainty types in different types of scientific statements
Figure 4. Distribution of Unknown/Hedging/Conflicting cue words in different parts of scientific statements. |
Table 5. Examples of uncertain sentences and triples from different parts of scientific statements. |
Statement Location | Supporting Sentences | SPO Triples |
---|---|---|
Premise (Background) | Liver X receptors (LXRs) play a central role in atherosclerosis; however, LXR activity of organic pollutants and associated potential risk of atherosclerosis have not yet been characterized. | liver X receptor _AFFECTS_ Atherosclerosis |
Lipoprotein-associated phospholipase A2 (Lp-PLA2) is considered to be a risk factor for acute coronary syndrome (ACS), but this remains controversial. | Phospholipase A2 _PREDISPOSES_ Acute coronary syndrome | |
Hypothesis (Objective) | We therefore performed a case-control study investigating the possible relation between ACE gene polymorphisms and MVPS in Taiwan Chinese. | Mitral Valve Prolapse _ASSOCIATED_WITH_ gene polymorphism |
Given the uncertainty regarding the relationship of C-reactive protein (CRP) and homocysteine (Hcy) to atherosclerotic burden, our aim was to determine whether CRP and Hcy are related to the presence of subclinical coronary plaque and stenosis. | Stenosis _ASSOCIATED_WITH_ C-reactive protein Stenosis _ASSOCIATED_WITH_ homocysteine | |
Evidence (Results) | Ischemic heart disease was identified as the possible etiology of HF in a greater proportion of non-Chinese patients (47.7% vs. 35.3%; p < 0.001) whereas hypertension (26.1% vs. 16.1%; p < 0.001) and valvular heart disease (11.6% vs. 7.2%; p < 0.001) were relatively more common in Chinese patients. | Myocardial Ischemia _CAUSES_ Heart failure |
Genetic polymorphisms of four genes, methylenetetrahydrofolate reductase (MTHFR) and apolipoprotein E (ApoE) have been demonstrated to associate with the increased risk for both MDD and stroke, while the association between identified polymorphisms in angiotensin-converting enzyme (ACE) and serum paraoxonase (PON1) with depression is still under debate, for the existing studies are insufficient in sample size. | Peptidyl-Dipeptidase A _PREDISPOSES_ Cerebrovascular accident Peptidyl-Dipeptidase A _PREDISPOSES_ Major Depressive Disorder Arylesterase _PREDISPOSES_ Cerebrovascular accident Arylesterase _PREDISPOSES_ Major Depressive Disorder | |
Claims (Conclusions) | This study shows a significant association of hypertension susceptibility loci only in obese Chinese children, suggesting a likely influence of childhood obesity on the risk of hypertension. | Hypertensive disease _AFFECTS_ Obesity |
Our data demonstrate that TrkB protects endothelial integrity during atherogenesis by promoting Ets1-mediated VE-cadherin expression and plays a previously unknown protective role in the development of CAD | ETS1 gene, ETS1 _INTERACTS_WITH_ cadherin 5 |
5.3 Metrics to measure uncertainty of biomedical knowledge
Figure 5. Trends in IE of SO pairs as the number of the supporting sentences. |
to quantify the uncertainty of epistemic status of scientific knowledge represented at different levels, such as SPO triples (micro level) and semantic type pairs (macro level).