1 Introduction
Figure 1. Structured Model information as part of the research contribution highlights of a scholarly article (Lample et al., 2016) in the NlpContributionGraph scheme. |
2 Related work
3 The NlpContributionGraph scheme: Preliminaries
3.1 Twelve information unit nodes
4 The NlpContributionGraph scheme: Annotation exercise
4.1 Stage 1: Pilot annotations
Table 1 Two examples illustrating the three different granularities for NlpContributionGraph data instances (viz., a. sentences, b. phrases, and c. triples) modeled for the Result information unit from a scholarly article (Cho et al., 2014). |
[1a. sentence 159] As expected, adding features computed by neural networks consistently improves the performance over the baseline performance. [1b. phrases from sentence 159] {adding features, computed by, neural networks, improves the performance, over baseline performance} [1c. triples from entities above] {(Contribution, has, Results), (Results, improves the performance, adding features), (adding features, computed by, neural networks), (Results, improves the performance, over baseline performance)} |
[2a. sentence 160] The best performance was achieved when we used both CSLM and the phrase scores from the RNN Encoder - Decoder. [2b. phrases from sentence 160] {best performance was achieved, used both CSLM and the phrase scores, from, RNN Encoder - Decoder} [2c. triples from entities above] {(Contribution, has, Results), (Results, best performance was achieved, used both CSLM and the phrase scores), (used both CSLM and the phrase scores, from, RNN Encoder - Decoder)} |
4.2 Stage 2: Adjudication annotations
Figure 2. Functional workflow of the annotation process to obtain the NlpContributionGraph data. |
4.2.1 The NCG Scheme's five general annotation guidelines
Figure 3. Illustration of the annotation guideline 5 of forming triples without incorrect repetitions of the extracted phrases. This Results IU is modeled from the research paper by (Wang et al., 2018). If the phrases “in terms of” and “F1 measure” were modeled by sentence word order, they would need to be reused twice under the “ACE datasets” and “GENIA dataset” scientific terms. To avoid this incorrect repetition, despite being at the end of the sentence, they are annotated at the top of the triples hierarchy. |
5 The NlpContributionGraph Scheme: Evaluating the annotations
5.1 Raw data and preprocessing tools
5.2 Annotated corpus statistics
Table 2 Annotated corpus characteristics for our trial dataset containing a total of 50 NLP articles using the NlpContributionGraph model. “ann” stands for annotated; and IU for information unit. The 50 articles are uniformly distributed across five different NLP subfields characterized at sentence and token-level granularities as follows—machine translation (MT): 2,596 total sentences, 9,581 total overall tokens; named entity recognition (NER): 2,295 sentences, 8,703 overall tokens; question answering (QA): 2,511 sentences, 10,305 overall tokens; relation classification (RC): 1,937 sentences, 10,020 overall tokens; text classification (TC): 2,071 sentences, 8,345 overall tokens. |
MT | NER | QA | RC | TC | Overall | |
---|---|---|---|---|---|---|
total IUs | 38 | 43 | 44 | 45 | 46 | 216 |
ann Sentences | 209 | 157 | 176 | 194 | 164 | 900 |
avg ann Sentences | 0.081 | 0.068 | 0.07 | 0.1 | 0.079 | - |
ann Phrases | 956 | 770 | 960 | 978 | 1038 | 4,702 |
avg Toks per Phrase | 2.81 | 2.87 | 2.76 | 2.91 | 2.7 | - |
avg ann Phrase Toks | 0.28 | 0.25 | 0.26 | 0.28 | 0.34 | - |
ann Triples | 590 | 504 | 619 | 620 | 647 | 2,980 |
Table 3 Annotated corpus statistics for the 12 Information Units in the NlpContributionGraph scheme. |
Information Unit | No. of triples | No. of papers | Ratio of triples to papers |
---|---|---|---|
Experiments | 168 | 3 | 56 |
Tasks | 277 | 8 | 34.63 |
ExperimentalSetup | 300 | 16 | 18.75 |
Model | 561 | 32 | 17.53 |
Hyperparameters | 254 | 15 | 16.93 |
Results | 688 | 42 | 16.38 |
Approach | 283 | 18 | 15.72 |
Baselines | 148 | 10 | 14.8 |
AblationAnalysis | 155 | 13 | 11.92 |
Dataset | 8 | 1 | 8 |
ResearchProblem | 169 | 50 | 3.38 |
Code | 9 | 9 | 1 |
5.3 Intra-Annotation agreement measures
Table 4 Intra-Annotation Evaluation Results. The NlpContributionGraph scheme pilot stage annotations evaluated against the adjudicated gold-standard annotations made on the trial dataset. |
Tasks | Information Units | Sentences | Phrases | Triples | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | ||
1 | MT | 66.66 | 73.68 | 70.0 | 66.67 | 54.55 | 60.0 | 37.47 | 30.96 | 33.91 | 19.73 | 17.46 | 18.53 |
2 | NER | 79.55 | 81.40 | 80.46 | 60.89 | 69.43 | 64.88 | 44.09 | 42.60 | 43.34 | 22.34 | 21.63 | 21.98 |
3 | QA | 93.18 | 93.18 | 93.18 | 67.96 | 79.55 | 73.30 | 54.04 | 45.21 | 49.23 | 37.50 | 32.0 | 34.52 |
4 | RC | 70.21 | 73.33 | 71.74 | 64.64 | 60.31 | 62.40 | 35.31 | 29.24 | 32.0 | 12.59 | 11.45 | 11.99 |
5 | TC | 86.67 | 84.78 | 85.71 | 75.44 | 78.66 | 77.01 | 54.77 | 45.38 | 49.63 | 27.41 | 22.41 | 24.66 |
Cum. | micro | 78.83 | 80.65 | 79.73 | 67.25 | 67.63 | 67.44 | 45.36 | 38.83 | 41.84 | 23.76 | 20.97 | 22.28 |
macro | 78.8 | 80.49 | 79.64 | 67.33 | 68.51 | 67.92 | 45.2 | 38.91 | 41.82 | 23.87 | 20.95 | 22.31 |
6 The NlpContributionGraph Scheme: Practical use case
6.1 Leveraging the Open Research Knowledge Graph framework
Figure 4. Annotated data from the paper “Sentence similarity learning by lexical decomposition and composition” under the Results Information Unit by the NlpContributionGraph scheme. |
Figure 5. An Open Research Knowledge Graph paper view. The NlpContributionGraph scheme is employed to model the ResearchProblem and the Results information units of the paper. |
Figure 6. A Results graph branch traversal in the ORKG until the last level. |
6.2 Automated NLP contribution comparisons
Figure 7. A NlpContributionGraph Scheme Data Integration Use Case in the Open Research Knowledge Graph Digital Library. An automatically generated survey from a part of a knowledge graph of scholarly contributions over four articles using the NlpContributionGraph scheme proposed in this work. This comparison was customized in the Open Research Knowledge Graph framework to focus only on the Results information unit (the comparison is accessible online here https://www.orkg.org/orkg/c/kM2tUq). |
8 Conclusions and future directions
Acknowledgments
Author contributions
Appendix
1. NlpContributionGraph: Parent Node Names
Figure 8. Illustration of a parent node name called ‘character-level LSTM' serving a conceptual reference selected from the article's running text as opposed to the section names. The figure is part of the contribution from the article (B. Wang et al., 2018). Essentially, for such encapsulation when it exists, coreference is applied for the child-node nesting (consider the coreference between ‘we incorporate a character-level LSTM to capture' in sentence 1 and ‘this character-level component can also help' in sentence 2). |
2. Improved Phrasal Granularity during Adjudication
Figure 9. Figures (a) and (b) depicts the modeling of part of a Results information unit from a scholarly article (Ghaddar & Langlais, 2018) in the pilot and the adjudication stages, respectively. |