1 Introduction
2 Related Works
3 Dataset and Methodology
3.1 Dataset
Table 1 Citation Sentences Examples |
No. | Citation sentence |
---|---|
1 | Their transcription is dependent on mouse Cebpe and human CEBPE [12]. |
2 | These changes may derive in a higher risk for type 2 diabetes development [8], [9]. |
3 | It interacts with a variety of transcriptional factors and MLL proteins [9]-[12]. |
4 | Most pathogens of humans, animals and plants are multi-host pathogens [1]-[3], [20]. |
3.2 Framework of CitationAS
Figure 1. Framework of CitationAS. |
3.3 Key Modules and Technology of CitationAS
Table 2 Top 20 Phrases According to High Frequency. |
Phrase (Frequency) | Phrase (Frequency) |
---|---|
cell line (37507) | reactive oxygen species (5160) |
gene expression (37001) | central nervous system (4418) |
amino acid (35165) | smooth muscle cell (3439) |
transcription factor (25626) | protein protein interaction (3286) |
cancer cell (25605) | single nucleotide polymorphism (2535) |
stem cell (22567) | tumor necrosis factor (2482) |
growth factor (17531) | genome wide association (2386) |
signaling pathway (16597) | case control study (2269) |
cell proliferation (14203) | false discovery rate (2209) |
meta analysis (12647) | innate immune response (2133) |
from each cluster, we design the following methods to calculate a sentence score. The higher score is, the higher ranking of sentence in the paragraph.
4 Experiments and Results Analysis
4.1 Evaluation Method
Table 3 Evaluation Standards. |
Score | Evaluation standards |
---|---|
5 | Sentences are very smooth. Paragraphs and surveys are very comprehensive, exist very small redundancy and can fully reflect retrieval topics. The logical structure of survey is reasonable. |
4 | Sentences are relatively smooth. Paragraphs and surveys are relatively comprehensive, exist relatively small redundancy and can relatively reflect retrieval topics. The logical structure of survey is relatively reasonable. |
3 | Sentences are basically smooth. Paragraphs and surveys are basically comprehensive, exist certain redundancy and can basically reflect retrieval topics. The logical structure of survey is basically reasonable. |
2 | Sentences are not smooth enough. Paragraphs and surveys are not comprehensive, exist relatively high redundancy and cannot reflect retrieval topics enough. The logical structure of survey is confusing. |
1 | The smoothness of sentences becomes very poor. Paragraphs and surveys are far from comprehensive, exist very high redundancy and cannot fully reflect retrieval topics. There is no logical structure in the survey. |
4.2 User Interface of CitationAS
Figure 2. User Interface of CitationAS. |
4.3 Results Analysis
Table 4 Topic Distribution in Dataset. |
Topic No. | Topic words |
---|---|
1 | protein, domain, binding, structure, membrane, residue, acid, interaction, site, amino |
2 | disease, patient, increase, risk, study, disorder, chronic, factor, blood, clinical |
3 | bacteria, gene, strain, plant, resistance, species, report, found, host, pathogen |
4 | study, health, patient, year, hiv, treatment, country, population, report, clinical |
5 | gene, sequence, data, analysis, based, identified, study, expression, number, region |
6 | model, method, data, based, test, analysis, value, number, calculated, approach |
7 | cell, expression, tissue, mice, differentiation, development, human, stem, bone, mouse |
8 | acid, increase, level, activity, glucose, concentration, stress, enzyme, insulin, effect |
9 | study, process, task, response, visual, effect, memory, information, social, related |
10 | cell, signalling, pathway, activation, receptor, role, factor, protein, expression, apoptosis |
Table 5 Six Methods Rankings based on Two Volunteers. |
Ranking | Volunteer A | Volunteer B |
---|---|---|
1 | STC-TF-IDF | Lingo-TF-IDF |
2 | Lingo-TF-IDF | Lingo-MMR |
3 | STC-MMR | STC-MMR |
4 | Lingo-MMR | STC-TF-IDF |
5 | bisecting K-means-MMR | bisecting K-means-MMR |
6 | bisecting K-means-TF-IDF | bisecting K-means-TF-IDF |
Table 6 Six Methods Rankings based on Two Volunteers when Considering Sentence Location. |
Ranking | Volunteer A | Volunteer B |
---|---|---|
1 | Lingo-MMR | Lingo-MMR |
2 | Lingo-TF-IDF | Lingo-TF-IDF |
3 | STC-TF-IDF | STC-MMR |
4 | STC-MMR | STC-TF-IDF |
5 | bisecting K-means-TF-IDF | bisecting K-means-MMR |
6 | bisecting K-means-MMR | bisecting K-means-TF-IDF |
Figure 3. Average Scores of Different Methods. |