Research Paper

Usage Count: A New Indicator to Detect Research Fronts

  • Guoqiang Liang ,
  • Haiyan Hou , ,
  • Zhigang Hu , ,
  • Fu Huang ,
  • Yajie Wang ,
  • Shanshan Zhang
Expand
  • WISE Lab, Dalian University of Technology, Dalian 116023, China
Corresponding authors: Haiyan Hou (E-mail: ) and Zhigang Hu (E-mail: ).

Received date: 2016-09-26

  Revised date: 2016-11-27

  Accepted date: 2016-12-04

  Online published: 2016-11-18

Copyright

Open Access

Abstract

Purpose

Research fronts build on recent work, but using times cited as a traditional indicator to detect research fronts will inevitably result in a certain time lag. This study attempts to explore the effects of usage count as a new indicator to detect research fronts in shortening the time lag of classic indicators in research fronts detection.

Design/methodology/approach

An exploratory study was conducted where the new indicator “usage count” was compared to the traditional citation count, “times cited,” in detecting research fronts of the regenerative medicine domain. An initial topic search of the term “regenerative medicine” returned 10,553 records published between 2000 and 2015 in the Web of Science (WoS). We first ranked these records with usage count and times cited, respectively, and selected the top 2,000 records for each. We then performed a co-citation analysis in order to obtain the citing papers of the co-citation clusters as the research fronts. Finally, we compared the average publication year of the citing papers as well as the mean cited year of the co-citation clusters.

Findings

The citing articles detected by usage count tend to be published more recently compared with times cited within the same research front. Moreover, research fronts detected by usage count tend to be within the last two years, which presents a higher immediacy and real-time feature compared to times cited. There is approximately a three-year time span among the mean cited years (known as “intellectual base”) of all clusters generated by usage count and this figure is about four years in the network of times cited. In comparison to times cited, usage count is a dynamic and instant indicator.

Research limitations

We are trying to find the cutting-edge research fronts, but those generated based on co-citations may refer to the hot research fronts. The usage count of older highly cited papers was not taken into consideration, because the usage count indicator released by WoS only reflects usage logs after February 2013.

Practical implications

The article provides a new perspective on using usage count as a new indicator to detect research fronts.

Originality/value

Usage count can greatly shorten the time lag in research fronts detection, which would be a promising complementary indicator in detection of the latest research fronts.

Cite this article

Guoqiang Liang , Haiyan Hou , Zhigang Hu , Fu Huang , Yajie Wang , Shanshan Zhang . Usage Count: A New Indicator to Detect Research Fronts[J]. Journal of Data and Information Science, 2017 , 2(1) : 89 -104 . DOI: 10.1515/jdis-2017-0005

1 Introduction

Research fronts detection has become the focus of global scientific and technological competition. Through detecting and tracking research fronts timely and accurately, Japan (Kuwahara, 2007; Nagano, 2005) and the US (Porter, Guo, & Chiavatta, 2011) have made significant advancements in science and technology (S&T) policy-making and technological evaluation.
Research front was originally a term used by Price (1965). He concluded that a research front is a small part of earlier literature knitted together by the new year’s crop of paper, and used the phrases “epidermal layer” and “growing tip” to describe research fronts. Since then, scholars have made efforts to identify research fronts from various points of view. Small and Griffith (1974) considered co-citation clusters as research fronts; Vlachý (1984) summarized prior research on scientometric studies on research fronts detection and pointed out that “science grows from a very thin skin of its research front” and “a core body of seminal literature” constitutes “a sort of epidermal layer, an active research front” (p. 95). Garfield (1994) pointed out that research fronts are co-citation clusters plus citing articles; Morris et al. (2003) applied bibliographic coupling methods to identify the research fronts; Shibata et al. (2008) proposed that research fronts are direct citation clusters. Presently, Chen (2006), Braam, Moed, and van Raan (1991), and Persson’s (1994) views are the mainstream in research fronts detection. They concurred that research fronts are clusters of citing papers sharing a common intellectual base.
The research fronts detection method of this paper is in accordance with Chen, Braam et al., and Persson’s points of view. We label groups of citing articles that cite clusters of co-cited references as research fronts, and they are labeled from the title of an article which cites the most references in the cluster. When beginning co-citation analysis, the usual method is first to set a threshold and find the representative highly cited papers by “times cited,” and then make a co-cited matrix before clustering the networks and identifying the research fronts. However, due to the large time lag, there is a problem in using times cited as an indicator in research fronts identification. It might take up to two years for a paper to become highly cited (Shibata et al., 2008), and the situation varies among disciplines. Besides, times cited is affected by authors’ different citing motivations, articles accessibility (Bollen et al., 2005), etc. Therefore, as a traditional indicator, times cited cannot reflect the current interests of the research community. Faced with the fast paced development of S&T, new methods, tools, and indicators need to be developed to capture the research fronts more precisely in order to support S&T policy-making.
Open access journal publishers such as PLoS(1)(1)http://blogs.plos.org/plos/2009/09/article-level-metrics-at-plos-addition-of-usage-data/)took the lead to provide online usage data for published articles. When more and more publishers chose to disclose the usage data of academic articles, researchers proposed to use articles’ usage data as potential complements, perhaps even alternatives, for research evaluation (Das & Mishra, 2014; Yan & Gerstein, 2011). They emphasized the superiority of usage data to citation data, such as ease to access and more convenient data collection. However, there are divergent opinions with regard to whether usage data such as times of views and downloads can be used as metrics of research evaluation. Some researchers reported significant correlation between specific usage types (Line & Sandison, 1975), especially downloads and citations, but others (Schloegl & Gorraiz, 2010; 2011) found only moderate or a rather low correlation between downloads and citations.
On September 26, 2015, Thomson Reuters started to provide article-level usage data, called “usage count” on the Web of Science (WoS) platform (Thomson Reuters, 2015). The new indicator, consisting of “U1” and “U2,” reflects the user’s level of interest by marking an article when read or downloaded by researchers. U1 is the count of the number of times the full text of a record has been accessed or saved within the last 180 days. U2 is the count of the number of times the full text of a record has been accessed or saved since February 1, 2013. The usage count is recorded every second day after users full-text request of an article, exports to bibliographic management tools or to formats for later import into bibliographic management tools, thus it does not need to wait for the tedious submitting and publishing process of times cited.
There is limited research on the effects of usage data on research evaluation. Presently, we have retrieved two relative articles (Martín-martín, 2016; Wang, Fang, & Sun, 2016) involving the relationship between WoS usage count and times cited. They discussed the usage patterns of articles and the correlation between usage count and times cited. In their research, Wang, Fang, and Sun (2016) discovered that citations play an important role in determining the usage count for old papers, and highly cited old papers are more likely to be used for even a long time after publication. Following their study, we are curious whether usage count(2) ((2)In this article, we mainly discuss U1. In the subsequent sections of this article, usage count refers to U1, the count of the number of times the full text of a record has been accessed or saved within the last 180 days.)can become a new and more effective indicator in detection of the latest research fronts, which are valuable for decision-makers to track the latest research trend and take the lead in scientific competition. To compare research fronts detected based on different indicators, we use “recentness” as a measure and we define recentness of a research front as the average publication year of citing papers of the co-citation cluster. Our research questions are:
1) What is the difference in recentness of research fronts generated by usage count and times cited?
2) Are there any common research fronts detected by usage count and times cited, and if there are, what about the recentness?
3) What is the difference in recentness of the top 10 highly cited papers selected by usage count and times cited?

2 Data and Methodology

2.1 Data Source

This article takes the regenerative medicine domain as an example to retrieve data. After using the topic search “regenerative medicine” in titles, abstracts or keywords (including keywords plus), and filtering out less representative record types, such as proceeding papers, meeting abstracts, news items, letters, etc., a total of 10,545 records dated between 2000 and 2015 were downloaded from WoS. We gathered the top 2,000 records sorted by both times cited and usage count. We regard these records as a representative dataset that can reflect the total records downloaded. The exported data format, search strategy, indexes, time span, and document types of the two indicators remained consistent.

2.2 Methodology

Co-citation analysis is the typical bibliometric method, initially proposed by Small and Griffith (1974). Articles were clustered together based on their co-occurrence in the references lists of papers. In other words, if articles A and B are both cited by article C, it is more likely that they belong to the same research field and share similar topics or methods.
To identify clusters which represent the intellectual bases, spectral clustering algorithms have been used in this paper. Spectral clustering is a clustering method that uses eigenvectors of an affinity matrix derived from the data (Dhillon, 2004), and results derived by spectral clustering often outperform the traditional algorithms such as k-means or single linkage (von Luxburg, 2007). Besides, spectral clustering is easy to implement. This study uses CiteSpace 4.0.R5 (Chen, 2006) to identify the co-citation clusters and furthermore to find the citing articles of each cluster as well as the research front.
There are 84,315 and 113,339 references in dataset “times cited” and dataset “usage count,” respectively. In order to eliminate records that have little relationship with our research and pick out the most frequently used references, we select the top 1% most-cited records within the datasets to be further analyzed. In order to reduce disparities caused by absolute frequencies, we construct the co-citation matrix in terms of cosine coefficients. Additionally, a minimum spanning tree network is used in the software for network pruning in order to hide relatively weak citation links between item i and item j in the matrix, and improve the pruning efficiency as well. We compare the average publication year of citing articles (the recentness) as well as the mean cited year of the co-citation clusters. In this paper, average publication year of citing articles is equal to the mean publication year of the citing publications, and does not take the number of citations into account. Figure 1 presents organization of this study.
Figure 1. Organization and procedure of the present study. RF is the abbreviation for research front and TC for times cited. UC refers to usage count.

3 Results

3.1 Basic Statistic Comparison of Times Cited and Usage Count

We investigate the distribution of citing articles and cited references within the two datasets. Figure 2 shows that the citing articles of usage count have experienced an exponential growth, because researchers prefer to use newly published literature, while Figure 3 indicates a right-skewed distribution for citing articles of times cited. This phenomenon is an indication that it will take several years for a paper to become highly cited. The distribution of cited references in Figures 4 and 5 presents a right-skewed distribution, but the data collected by usage count present a relatively high real-time property when compared with that collected by times cited. Many cited references were gathered from 2010 to 2013 by usage count in comparison with 2005-2008 by times cited.
Figure 2. Yearly frequency distribution of citing articles with threshold of usage count. Y represents the usage count and R2 reflects the goodness-of-fit of this equation. A value closer to 1 indicates a satisfactory goodness-of-fit.
Figure 3. Yearly frequency distribution of citing articles with threshold of times cited.
Figure 4. Yearly frequency distribution of cited references with threshold of usage count. The time span of cited references is 1638-2015, and the study integrated the data of 1638-1990 to facilitate reading.
Figure 5. Yearly frequency distribution of cited references with threshold of times cited. The time span of cited references is 1632-2014, and the study integrated the data of 1632-1990 to facilitate reading.
There is approximately a four-year time lag in the mean publication year of citing articles, as well as a three-year time lag in the mean publication year of cited references collected by times cited. As is shown in Table 1, the recentness in times cited is 2009 and 2013.3 in usage count. As for the mean publication year of cited references, the figure is 2002.6 in times cited and 2005.7 in usage count.
Table 1 Overview of the dataset.
Times cited Usage count
No. of cited references 84,315 113,339
Recentness 2009.0 2013.3
Mean publication year of cited references 2002.6 2005.7

Note. Recentness means the average publication year of citing papers.

3.2 Comparison of the Recentness of Research Fronts

In this section, we compute the recentness of each research front. Table 2 lists the details of the two networks in which the number of articles in each cluster is above five, and the study ranks each cluster by recentness. The method to label the cluster is based on word profiles derived from the title of an article which cites the most co-cited articles within the cluster (Chen, 2006).
Tables 2 and 3 show there are 20 research fronts detected by the times cited, and 26 detected by usage count. The recentness in a majority of clusters detected by usage count tends to be published within the last four years, while this figure is relatively less recent in those detected by times cited. The recentness is 2011.59 detected by usage count and 2009.07 by times cited.
Table 2 Recentness of the research fronts detected by times cited.
Clusters No. of references Mean cited year No. of citing articles Recentness
Emerging peptide nanomedicine 29 2008 54 2010.44
Pluripotent stem cell 25 2007 56 2008.89
Adipose-derived stem cell 25 2003 83 2010.14
Somatic cell 22 2008 52 2009.33
Mesenchymal stem cell 22 2003 55 2010.07
Induced pluripotent stem cell 22 2008 41 2009.05
Embryonic stem cell 22 2004 60 2010.08
Organ level tissue engineering 21 2008 50 2009.80
Mesenchymal stromal cell 21 2005 52 2009.33
Synthetic hydrogels 21 2002 48 2010.45
Hippo pathway 20 2008 52 2010.69
Human induced pluripotent stem cell 19 2009 64 2010.36
Human embryonic stem cells 18 2007 40 2008.05
Regenerative biology 18 2001 51 2010.25
Marrow-derived mesenchymal cell 18 2001 38 2007.34
Human wharton 17 2006 27 2010.00
Genetic modification 17 2002 31 2007.81
Generation 16 2008 40 2007.40
Therapeutic application 16 2002 41 2010.00
Review 8 2000 8 2002.00
Mean value of all clusters 19.85 2005 47.15 2009.07

Note. Recentness means the average publication year of citing papers.

Table 3 Recentness of the research fronts detected by usage count.
Clusters No. of references Mean cited year No. of citing articles Recentness
Clinic 5 2012 39 2014.35
Whole organ engineering 25 2011 39 2013.77
Expansion 22 2011 55 2013.46
Hydrogel 27 2010 52 2013.32
Overview 26 2010 36 2013.21
Extracellular vesicle 26 2010 49 2013.02
Regulating stem cell fate 23 2010 61 2013.00
Induction 23 2010 44 2013.00
Induced pluripotent stem cell differentiation 21 2010 38 2013.00
Carbon nanotube 19 2010 40 2012.72
Human pluripotent stem 19 2010 22 2012.60
Stem cell application 26 2009 39 2012.28
Peptide 24 2009 32 2012.05
Porous scaffold 21 2009 42 2011.81
Poly 5 2009 35 2011.50
Layer 27 2008 66 2011.49
Biomedicine 25 2008 57 2011.31
Induced pluripotent stem cell 25 2008 36 2011.13
Glycosaminoglycan-binding substratum 24 2008 36 2011.09
Pro-angiogenic properties 26 2007 27 2010.96
Nanotechnologies 16 2007 34 2010.23
Water filtration 31 2005 51 2010.20
Supramolecular design 25 2005 20 2010.02
Biodegradable hydrogel 24 2005 8 2009.90
Present status 21 2004 20 2009.37
Biological characterization 5 1998 3 2002.67
Mean value of all clusters 21.58 2008.19 37.73 2011.59

Note. Recentness means the average publication year of citing papers.

As indicated in Tables 2 and 3, a majority of research fronts generated by usage count tend to be more newly published compared to times cited. Because usage count is a reflection of researchers’ interest level within the last 180 days, most researchers pay more attention to achievements published within the previous two-three years, in order to stay in step with their colleagues and keep abreast of what is going on across scholarly communities. Besides, usage count can capture users’ full-text searching behaviors instantly instead of waiting for the tedious publishing process compared with times cited. Due to this, it is reasonable that a large proportion of citing articles in each cluster created by usage count are newly published. Meanwhile, we also observed that articles detected by usage count were published almost two years later than those detected based on times cited accordingly, because the newly published citing articles are prone to cite recent achievements due to rapid knowledge updates in the regenerative medicine field.

3.3 Comparison of the Recentness of the Common Research Fronts

The two indicators both detect the induced pluripotent stem cell (IPSc) as one of the research fronts in the regenerative medicine field. We calculated all the citing articles of the IPSc field detected by the two indicators (Table 4 only lists the top 10 citing articles), to compare the recentness of the common research fronts detected by usage count and times cited. There are 55 citing articles in the IPSc field detected by times cited and 22 by usage count. The recentness in the IPSc field created by usage count is 2011.09 and 2010.07 by times cited. Moreover, the two indicators found seven common papers in the IPSc field.
Table 4 Comparison of the top 10 citing papers of the common research front.
Times cited Usage count
Coverage
(%)
Citing articles Publishing year Coverage
(%)
Citing articles Publishing year
55 Wang, Y. A transcriptional roadmap to the induction of pluripotency in somatic cells. 2010 40 Patel, M. Advances in reprogramming somatic cells to induced pluripotent stem cells. 2010
50 Kiskinis, E. Progress toward the clinical application of patient-specific pluripotent stem cells. 2010 32 Warren, L. Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. 2010
50 Li, W.L. Small molecules that modulate embryonic stem cell fate and somatic cell reprogramming. 2010 24 Ben-David, U. The tumorigenicity of human embryonic and induced pluripotent stem cells. 2011
50 Masip, M. Reprogramming with defined factors: From induced pluripotency to induced transdifferentiation. 2010 20 Lister, R. Hotspots of aberrant epigenomic reprogramming in human induced pluripotent stem cells. 2011
45 Warren, L. Highly efficient reprogramming to pluripotency and directed differentiation of human cells with synthetic modified mRNA. 2010 20 Tsuji, O. Therapeutic potential of appropriately evaluated safe-induced pluripotent stem cells for spinal cord injury. 2010
41 Cox, J.L. Induced pluripotent stem cells: What lies beyond the paradigm shift. 2010 20 Wu, S.M. Harnessing the potential of induced pluripotent stem cells for regenerative medicine. 2011
41 Lengner, C.J. IPS cell technology in regenerative medicine. 2010 16 Zhao, T.B. Immunogenicity of induced pluripotent stem cells. 2011
41 Tamaoki, N. Dental pulp cells for induced pluripotent stem cell banking. 2010 12 Young, R.A. Control of the embryonic stem cell state. 2011
36 Chun, Y.S. Applications of patient-specific induced pluripotent stem cells; focused on disease modeling, drug screening and therapeutic potentials for liver disease. 2010 8 Burridge, P.W. A universal system for highly efficient cardiac differentiation of human induced pluripotent stem cells that eliminates interline variability. 2011
36 Nakagawa, M. Promotion of direct reprogramming by transformation-deficient myc. 2010 8 Klim, J.R. A defined glycosaminoglycan-binding substratum for human pluripotent stem cells. 2010
Recentness 2010.07 Recentness 2011.09
As illustrated in Table 4, the two indicators can both detect research fronts in the IPSc field, and the citing articles of the IPSc field detected by usage count tend to be published more recently than times cited. Takahashi and Yamanaka (2006) originally introduced IPSc in 2006, and showed that the introduction of four specific gene encoding transcription factors could convert adult cells into pluripotent stem cells, and was awarded the 2012 Nobel Prize. As the shortage of donor organs for treating end-stage organ failure highlights the need for generating organs from IPSc (Takebe et al., 2013), we can expect that more and more researchers will retrieve and download classical articles in this field, and thus usage count will capture and accumulate the usage logs accordingly. Articles listed in the IPSc field by usage count are expected to be more recenly published than those listed by times cited.

3.4 Comparison of the Recentness of the Top 10 Most Highly Cited Papers

In this section, we compare the top 10 most highly cited papers detected by usage count and times cited. Frequency refers to the times cited in local datasets. The results indicate that there are four papers in common listed in Table 5. Coincidentally, these papers rank within the top five results due to frequency. Moreover, there are three articles published before the year 2004 detected by times cited, while no article is published before 2004 in the top 10 detected by usage count.
Table 5 Top 10 highly cited papers detected by usage count and times cited.
Articles detected by times cited Articles detected by usage count
Frequency Articles Publishing year Frequency Articles Publishing year
225 Takahashi, K., Cell, V126, P663 2006 168 Takahashi, K., Cell, V131, P861 2007
212 Takahashi, K., Cell, V131, P861 2007 155 Engler, A.J., Cell, V126, P677 2006
188 Yu, J.Y., Science, V318, P1917 2007 110 Slaughter, B.V., Bvadv Mater, V21, P3307 2009
115 Engler, A.J., Cell, V126, P677 2006 107 Yu, J.Y., Science, V318, P1917 2007
113 Dominici, M., Mcytotherapy, V8, P315 2006 104 Takahashi, K., Cell, V126, P663 2006
106 Jiang, Y.H., Nature, V418, P41 2002 77 Dalby, M.J., Nat Mater, V6, P997 2007
99 Okita, K., Nature, V448, P313 2007 74 Ott, H.C., Nat Med, V14, P213 2008
91 Park, I.H., Nature, V451, P141 2008 63 Lutolf, M.P., Nat Biotechnol, V23, P47 2005
89 Pittenger, M.F., Science, V284, P143 1999 61 Discher, D.E., Science, V324, P1673 2009
86
Mean year
Thomson, J.A., Science, V282, P1145 1998
2004.6
55
Mean year
Macchiarini, P., Lancet, V372, P2023 2008
2007.2

Note. Only the first author and the starting page of these articles are listed.

From Table 5 we can see that the top 10 most highly cited papers selected by usage count and times cited are classical articles, but we find papers selected by usage count tend to be more recently published than those by times cited. The mean year of the top 10 most highly cited papers is 2004.6 detected by times cited, while this figure is 2007.2 when sorted by usage count. It indicates an approximate three-year time span among the mean cited years (known as “intellectual base”) of all clusters generated by usage count in the regenerative medicine domain.

4 Discussion and Conclusion

The study collects 2,000 records by both times cited and usage count to measure whether usage count can be a new indicator in detection of research fronts. We find both indicators can be used in detection of research fronts, but using usage count can detect the latest research fronts than using times cited. In comparing the effects of the two indicators, first, we note that the majority of research fronts generated by usage count tend to be newer than times cited. Second, we investigate the recentness of a common research front detected by usage count and times cited. Results indicate using usage count can detect the latest research fronts than using times cited. Third, we compare the top 10 most highly cited papers detected by usage count and times cited. We find the top 10 papers selected by usage count represent more recent research fronts than selected by times cited. Moreover, we draw the conclusion that research fronts detected by usage count tend to be within the last two years, and present a higher immediacy and real time accuracy compared with times cited. Usage count can greatly shorten the time lag in research fronts detection, which could become a complementary indicator in the recentness detection of research fronts.
Usage count would be a new indicator in recentness detection of research fronts. If paper A is cited frequently within a period of time, the times cited will be added to WoS once the citing articles are published online. Usage count captures the researchers’ preference on various publications within the last 180 days. Generally, researchers prefer to use newly published papers, and therefore the usage data from publications within the last three years will reach a peak with relatively few citations (Wang, Fang, & Sun, 2016). Therefore, the meta data collected by usage count are most likely to be recent publications. In the research front detecting process, cited references are clustered as intellectual base and the citing articles form the “footprints” of research fronts accordingly. Citation activity can lag behind the publication of an article and some research domains are slow to be cited. In this sense, there is a relatively larger time lag in the research front detection based on times cited.
This paper represents preliminary work on the study of usage count in research fronts detection. However, there are some limitations in the study. For instance, the research fronts generated based on co-citations may refer to the hot research fronts, while we are trying to identify the cutting-edge research fronts. The usage count of older highly cited papers were not taken into consideration, because the new usage count indicator released by WoS only reflects usage logs after February 2013. In comparison to times cited, usage count is a dynamic and instant indicator. However, the correlation between usage count and times cited needs to be further discussed in the future.

Author Contributions

H.Y. Hou (htieshan@dlut.edu.cn, corresponding author) and Z.G. Hu (huzhigang@dlut.edu.cn, corresponding author) planned and designed the outline, jointly discussed the findings, and contributed to the final draft. G.Q. Liang (Liang_1988@mail.dlut.edu.cn) proposed the research idea, carried out the data collection and data analysis, and wrote the first draft. F. Huang (hf206@163.com), Y.J. Wang (yjwang55@mail.dlut.edu.cn), and S.S. Zhang (shann1027@sina.com) joined discussion of the findings and put forward valuable suggestions.

The authors have declared that no competing interests exist.

[1]
Bollen J., van de Sompel H., Smith J.A., & Luce R. (2005). Toward alternative metrics of journal impact: A comparison of download and citation data. Information Processing and Management, 41(6), 1419-1440.We generated networks of journal relationships from citation and download data, and determined journal impact rankings from these networks using a set of social network centrality metrics. The resulting journal impact rankings were compared to the ISI IF. Results indicate that, although social network metrics and ISI IF rankings deviate moderately for citation-based journal networks, they differ considerably for journal networks derived from download data. We believe the results represent a unique aspect of general journal impact that is not captured by the ISI IF. These results furthermore raise questions regarding the validity of the ISI IF as the sole assessment of journal impact, and suggest the possibility of devising impact metrics based on usage information in general.

DOI

[2]
Braam R.R., Moed H.F., & van Raan A.F.J. (1991). Mapping of science by combined co-citation and word analysis. I: Structural aspects. Journal of the American Society for Information Science, 42(4), 233-251.The claim that co-citation analysis is a useful tool to map subject-matter specialties of scientific research in a given period, is examined. A method has been devel- oped using quantitative analysis of content-words re- lated to publications in order to: (1) study coherence of research topics within sets of publications citing clus- ters, i.e., (part of) the "current work" of a specialty; (2) to study differences in research topics between sets of publications citing different clusters; and (3) to evalu- ate recall of "current work" publications concerning the specialties identified by co-citation analysis. Empirical support is found for the claim that co-citation analysis identifies indeed subject-matter specialties. However, different clusters may identify the same specialty, and results are far from complete concerning the identified "current work." These results are in accordance with the opinion of some experts in the fields. Low recall of co-citation analysis concerning the "current work" of specialties is shown to be related to the way in which researchers build their work on earlier publications: the "missed" publications equally build on very recent ear- lier work, but are less "consensual" and/or less "atten- tive" in their referencing practice. Evaluation of national research performance using co-citation analysis ap- pears to be biased by this "incompleteness."

DOI

[3]
Chen C.(2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359-377.This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature-an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.

DOI

[4]
Das A.K., & Mishra S., (2014). The spread of scientific information: Insights from the web usage statistics in PLoS article-level metrics. Journal of Scientometric Research, 3(2), 1-16.The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

DOI PMID

[5]
Dhillon I.S. (2004). Kernel k-means, Spectral clustering and normalized cuts. Compute, Cl(78712), 551-556.Kernel k-means and spectral clustering have both been used to identify clusters that are non-linearly separable in input space. Despite significant research, these methods have remained only loosely related. In this paper, we give an explicit theoretical connection between them. We show the generality of the weighted kernel k-means objective function, and derive the spectral clustering objective of normalized cut as a special case. Given a positive definite similarity matrix, our results lead to a novel weighted kernel k-means algorithm that monotonically decreases the normalized cut. This has important implications: a) eigenvector-based algorithms, which can be computationally prohibitive, are not essential for minimizing normalized cuts, b) various techniques, such as local search and acceleration schemes, may be used to improve the quality as well as speed of kernel k-means. Finally, we present results on several interesting data sets, including diametrical clustering of large geneexpression matrices and a handwriting recognition data set.

DOI

[6]
Garfield E. (1994). Research fronts. Current Contents, 41(10), 3-7.

[7]
Kuwahara A.S.Y. (2007). Benchmarking S&T capacity and future direction of S&T development in Japan (Policy analysis of the science and technology basic plans). Journal of Science Policy & Research Management, 21(1), 28-34.

[8]
Line M.B., & Sandison A. (1975). Practical interpretation of citation and library use studies. College an Research Libraries, 36(5), 393-396.Considers the data required to guide (a) the librarian in acquisition (current and retrospective), discarding, and binding; and (b) the information system designer in selecting journals to be scanned for secondary services, selecting items from journals scanned, and retiring items from active files. (Author)

DOI PMID

[9]
Martín-Martín A. (2016). Thomson Reuters utiliza altmétricas: Usage counts para los artículos indizados en la Web of Science. Anuario ThinkEPI, 10, 209-221.

[10]
Morris S.A., Yen G., Wu Z., & Asnake B. (2003). Time line visualization of research fronts. Journal of the American Society for Information Science and Technology, 54(5), 413-422.Research fronts, defined as clusters of documents that tend to cite a fixed, time invariant set of base documents, are plotted as time lines for visualization and exploration. Using a set of documents related to the subject of anthrax research, this article illustrates the construction, exploration, and interpretation of time lines for the purpose of identifying and visualizing temporal changes in research activity through journal articles. Such information is useful for presentation to members of expert panels used for technology forecasting.

DOI

[11]
Nagano H. (2005. Comprehensive analysis of science and technology benchmarking and foresight. Nistep Report. Retrieved on November 27, 2016, from .

[12]
Persson O. (1994). The intellectual base and research fronts of JASIS 1986-1990. Journal of the American Society for Information Science, 45(1), 31-38.A citation analysis was applied to articles published in the Journal of the American Society for Information Science. The document set consisted of 209 genuine articles from the 1986–1990 SSCI03 CD-ROM. To find the intellectual base of these articles a cocitation analysis was made. A map of the most cocited authors shows considerable resemblance to a map of information science produced by other methods. Citation-based bibliographic coupling was applied to the same set of documents in order to define research fronts, i.e., clusters of articles using similar parts of the intellectual base. It is also shown that the research front map has a close correspondence with the man of the intellectual base.

DOI

[13]
Porter A.L., Guo Y., & Chiavatta D. (2011). Tech mining: Text mining and visualization tools, as applied to nanoenhanced solar cells. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 1(2), 172-181.Tech mining’ is a multistep process for the analysis of science, technology, & innovation (‘ST&I’) information resources. It uses text mining, visualization, and communication tools to provide the empirical knowledge necessary to address management of technology questions. Tech mining can help assess mature or emerging fields of science and technology, such as nanotechnology. Here, we depict select analyses and visualizations of relevant ST&I data on the topics of nanoenhanced, thin-film solar cells and dye-sensitized solar cells. These analyses help identify complementary and competitive research activity, evaluate research productivity, assess research interdisciplinarity, understand nanotechnology developmental trajectories, and identify and forecast promising nanoapplications.

DOI

[14]
Price D.J. (1965). Networks of scientific papers. Science, 149(3683), 510.

[15]
Schloegl C., & Gorraiz J. (2010). Comparison of citation and usage indicators: The case of oncology journals. Scientometrics, 82(3), 567-580.It is the objective of this article to examine in which aspects journal usage data differ from citation data. This comparison is conducted both at journal level and on a paper by paper basis. At journal level, we define a so-called usage impact factor and a usage half-life in analogy to the corresponding Thomson’s citation indicators. The usage data were provided from Science Direct, subject category “oncology”. Citation indicators were obtained from JCR, article citations were retrieved from SCI and Scopus. Our study shows that downloads and citations have different obsolescence patterns. While the average cited half-life was 5.6 years, we computed a mean usage half-life of 1.7 years for the year 2006. We identified a strong correlation between the citation frequencies and the number of downloads for our journal sample. The relationship was lower when performing the analysis on a paper by paper basis because of existing variances in the citation-download-ratio among articles. Also the correlation between the usage impact factor and Thomson’s journal impact factor was “only” moderate because of different obsolescence patterns between downloads and citations.

DOI

[16]
Schloegl C., & Gorraiz J. (2011). Global usage versus global citation metrics: The case of pharmacology journals. Journal of the Association for Information Science and Technology, 62(1), 161-170.Following the transition from print journals to electronic (hybrid) journals in the past decade, usage metrics have become an interesting complement to citation metrics. In this article we investigate the similarities of and differences between usage and citation indicators for pharmacy and pharmacology journals and relate the results to a previous study on oncology journals. For the comparison at journal level we use the classical citation indicators as defined in the Journal Citation Reports and compute the corresponding usage indicators. At the article level we not only relate download and citation counts to each other but also try to identify the possible effect of citations upon subsequent downloads. Usage data were provided by ScienceDirect both at the journal level and, for a few selected journals, on a paper-by-paper basis. The corresponding citation data were retrieved from the Web of Science and Journal Citation Reports. Our analyses show that electronic journals have become generally accepted over the last decade. While the supply of ScienceDirect pharma journals rose by 50% between 2001 and 2006, the total number of article downloads (full-text articles [FTAs]) multiplied more than 5-fold in the same period. This also impacted the pattern of scholarly communication (strong increase in the immediacy index) in the past few years. Our results further reveal a close relation between citation and download frequencies. We computed a high correlation at the journal level when using absolute values and a moderate to high correlation when relating usage and citation impact factors. At the article level the rank correlation between downloads and citations was only medium-sized. Differences between downloads and citations exist in terms of obsolescence characteristics. While more than half of the articles are downloaded in the publication year or 1 year later, the median cited half-life was nearly 6 years for our journal sample. Our attempt to reveal a direct influence of citations upon downloads proved not to be feasible.

DOI

[17]
Shibata N., Kajikawa Y., Takeda Y., & Matsushima K. (2008). Detecting emerging research fronts based on topological measures in citation networks of scientific publications. Technovation, 28(11), 758-775.In this paper, we performed a comparative study in two research domains in order to develop a method of detecting emerging knowledge domains. The selected domains are research on gallium nitride (GaN) and research on complex networks, which represent recent examples of innovative research. We divided citation networks into clusters using the topological clustering method, tracked the positions of papers in each cluster, and visualized citation networks with characteristic terms for each cluster. Analyzing the clustering results with the average age and parent-children relationship of each cluster may be helpful in detecting emergence. In addition, topological measures, within-cluster degree z and participation coefficient P, succeeded in determining whether there are emerging knowledge clusters. There were at least two types of development of knowledge domains. One is incremental innovation as in GaN and the other is branching innovation as in complex networks. In the domains where incremental innovation occurs, papers changed their position to large z and large P. On the other hand, in the case of branching innovation, they moved to a position with large z and small P, because there is a new emerging cluster, and active research centers shift rapidly. Our results showed that topological measures are beneficial in detecting branching innovation in the citation network of scientific publications.

DOI

[18]
Small H., & Griffith B.C. (1974). The structure of scientific literatures I: Identifying and graphing specialties. Social Studies of Science, 4(1), 17-40.CiteSeerX - Scientific documents that cite the following paper: The structure of scientific literatures i: Identifying and graphing specialties. Science studies

DOI

[19]
Takahashi K., & Yamanaka S. (2006). Induction of pluripotent stem cells from mouse embryonic and adult fibroblast cultures by defined factors. Cell, 126(4), 663-676.

[20]
Takebe T., Zhang R.R., Koike H., Kimuraet M., Yoshizawa E., Enomura M., Koike N., Sekine K., & Taniguchi H. (2013). Generation of a vascularized and functional human liver from an iPSC-derived organ bud transplant. Nature Protocols, 9(2), 481-484.A critical shortage of donor organs for treating end-stage organ failure highlights the urgent need for generating organs from human induced pluripotent stem cells (iPSCs). Despite many reports describing functional cell differentiation, no studies have succeeded in generating a three-dimensional vascularized organ such as liver. Here we show the generation of vascularized and functional human liver from human iPSCs by transplantation of liver buds created in vitro (iPSC-LBs). Specified hepatic cells (immature endodermal cells destined to track the hepatic cell fate) self-organized into three-dimensional iPSC-LBs by recapitulating organogenetic interactions between endothelial and mesenchymal cells. Immunostaining and gene-expression analyses revealed a resemblance between in vitro grown iPSC-LBs and in vivo liver buds. Human vasculatures in iPSC-LB transplants became functional by connecting to the host vessels within 48hours. The formation of functional vasculatures stimulated the maturation of iPSC-LBs into tissue resembling the adult liver. Highly metabolic iPSC-derived tissue performed liver-specific functions such as protein production and human-specific drug metabolism without recipient liver replacement. Furthermore, mesenteric transplantation of iPSC-LBs rescued the drug-induced lethal liver failure model. To our knowledge, this is the first report demonstrating the generation of a functional human organ from pluripotent stem cells. Although efforts must ensue to translate these techniques to treatments for patients, this proof-of-concept demonstration of organ-bud transplantation provides a promising new approach to study regenerative medicine.

DOI PMID

[21]
Thomson Reuters. (2015. Usage count. Retrieved on September 10, 2016, from .

[22]
Urashima K. (2007). Comprehensive analysis of science and technology benchmarking and foresighting. International Journal of Plasma Envirnoment Science & Technology, 1(1), 3-7.Japan's science and technology policy has been implemented according to the Science & Technology (S&T) Basic Plans, which are established every 5 years. Japan's share of world scientific papers has steadily grown since the 1980s and is now ranked second after the USA. We classified the 100 most important topics from 1992 and current surveys and the topics deemed important. The electronics and nanotechnology and materials fields, Japan is seen as above the EU over all and ahead of the USA in many areas. A significant number of foresight themes have been identified in relation to CO2, NOx, and other exhaust gases (seven themes) and a recycling-oriented society (five themes) for environmental field.

[23]
Vlachý.(1984). Priority choice and research front specialties in physics. Czechoslovak Journal of Physics B, 34(1), 95-98.More remarks on the priority choice should probably be added, against the background of existing literature reviews /23/ and bibliography /24/. Science is linked to its ultimate social effects in many nonobvious ways. Broad emphasis may be determined in terms of potential applicability, but the finer the scale of choice within a field, the more essential are the intrinsic criteria. Basic science is applied also by flow through other sciences which are themselves applied, physics standing high in the hierarchy. The conversion of laboratory methods and instrumentation into operational use and application opens another means of transfer. Basic science is also an important input to decision process about social and technological policy. It is a sor~ of search strategy, intellectual standards, cultural effects and future-oriented intimation /25/. The instrumental model of science and the concept of science as discovery are reflected also in personal research plans, individual priority decisions /26/. But for some the choice of research projects is largely a matter of fashion, social conformity /27/. The allocation of funds among researchers by one extreme method, the reaching down merit system (the dribbling down egalitarian system being the opposite extreme), would imply an elaborate mechanism of proposal evaluation which recognizes excellence and merit, bolsters morale and competition within the scientific community /28/. Two subfields mayj of course, be in quite different s~ages of development and exhibit very different cognitive structures~ one may be divided into small disconnected groups of researchers, the other may form a cohesive group working on highly significant items/29/o The hotter a field, subfield or specialty is, the more likely it is that the main work is being done in small, informal groups. Conversely, in quiet problem areas scientists are more reluctant to share informally their effort with their colleagues /30/. New approaches originated from scientometric findings that science grows from a very thin skin of its research front /31/. A core body of seminal literature is knitted together by concep~u81 citation links, it forms a sort of epidermal layer, an active research front /32/. The monitoring of the vital signs of science by co-citation clustering as a measure of cognitive association /33-38,etc./ (for library management of. /39/) now extended as far back as %o 1955 /40/~ Since 1970, ISI has identified about 30 000 research fronts for disciplinary and specialized studies. The co-citation techniques help to localize the emerging research fronts and to track the evolution of specialties through cluster strings /41/. Some 50 research fronts in physics are among 1978 clusters ranked by the number of citing articles /36/. In combination with factor analysis, differences in the fine structure of nuclear and p~rticlo physics was revealed (a clear substantive content of the former ve high theoretical integration on the latter subfield) /29/. Seven physics clusters were identified for the developing countries /42/. Perhaps thor inset citations maps exist for the theory of weak interactions /43-48/. The here tabulated 288 research fronts in physics were selected from some 2400 most-active specialties in the 1982 ISR volumes /49/. Ranking of the items e.g. by the number of citing articles may be known later, currently the number of published reviews is known for eight physics research fronts /50/. The physicists may be encouraged to become more explicitely concerned about choice in their research areas, as a good physics is possible with small, but well-placed means /51/.

DOI

[24]
von Luxburg U. (2007). A tutorial on spectral clustering. Statistics and Computing, 17(4), 395-416.In recent years, spectral clustering has become one of the most popular modern clustering algorithms. It is simple to implement, can be solved efficiently by standard linear algebra software, and very often outperforms traditional clustering algorithms such as the k-means algorithm. On the first glance spectral clustering appears slightly mysterious, and it is not obvious to see why it works at all and what it really does. The goal of this tutorial is to give some intuition on those questions. We describe different graph Laplacians and their basic properties, present the most common spectral clustering algorithms, and derive those algorithms from scratch by several different approaches. Advantages and disadvantages of the different spectral clustering algorithms are discussed.

DOI

[25]
Wang X., Fang Z., & Sun X. (2016). Usage patterns of scholarly articles on Web of Science: A study on Web of Science usage count. Scientometrics, 109(2), 917-926.Usage data of scholarly articles provide a direct way to explore the usage preferences of users. Using the “Usage Count” provided by the Web of Science platform, we collect and analyze the usage data of five journals in the field of Information Science and Library Science, to investigate the usage patterns of scholarly articles on Web of Science. Our analysis finds that the distribution of usage fits a power law. And according to the time distribution of usage, researchers prefer to use more recent papers. As to those old papers, citations play an important role in determining the usage count. Highly cited old papers are more likely to be used even a long time after publication.

DOI

[26]
Yan K.K., & Gerstein M. (2011). The spread of scientific information: Insights from the web usage statistics in PLoS article-level metrics. PLoS ONE, 6(5), 1-7.The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

DOI PMID

[26]
Yan K.K., & Gerstein M. (2011). The spread of scientific information: Insights from the web usage statistics in PLoS article-level metrics. PLoS ONE, 6(5), 1-7.The presence of web-based communities is a distinctive signature of Web 2.0. The web-based feature means that information propagation within each community is highly facilitated, promoting complex collective dynamics in view of information exchange. In this work, we focus on a community of scientists and study, in particular, how the awareness of a scientific paper is spread. Our work is based on the web usage statistics obtained from the PLoS Article Level Metrics dataset compiled by PLoS. The cumulative number of HTML views was found to follow a long tail distribution which is reasonably well-fitted by a lognormal one. We modeled the diffusion of information by a random multiplicative process, and thus extracted the rates of information spread at different stages after the publication of a paper. We found that the spread of information displays two distinct decay regimes: a rapid downfall in the first month after publication, and a gradual power law decay afterwards. We identified these two regimes with two distinct driving processes: a short-term behavior driven by the fame of a paper, and a long-term behavior consistent with citation statistics. The patterns of information spread were found to be remarkably similar in data from different journals, but there are intrinsic differences for different types of web usage (HTML views and PDF downloads versus XML). These similarities and differences shed light on the theoretical understanding of different complex systems, as well as a better design of the corresponding web applications that is of high potential marketing impact.

DOI PMID

Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn