Expert Review

Science Mapping: A Systematic Review of the Literature

  • Chaomei Chen ,
Expand
  • College of Computing and Informatics, Drexel University, Philadelphia, PA 19104-2875, USA
Corresponding author: Chaomei Chen (E-mail: ).

Online published: 2017-02-25

Copyright

Open Access

Abstract

Purpose: We present a systematic review of the literature concerning major aspects of science mapping to serve two primary purposes: First, to demonstrate the use of a science mapping approach to perform the review so that researchers may apply the procedure to the review of a scientific domain of their own interest, and second, to identify major areas of research activities concerning science mapping, intellectual milestones in the development of key specialties, evolutionary stages of major specialties involved, and the dynamics of transitions from one specialty to another.

Design/methodology/approach: We first introduce a theoretical framework of the evolution of a scientific specialty. Then we demonstrate a generic search strategy that can be used to construct a representative dataset of bibliographic records of a domain of research. Next, progressively synthesized co-citation networks are constructed and visualized to aid visual analytic studies of the domain’s structural and dynamic patterns and trends. Finally, trajectories of citations made by particular types of authors and articles are presented to illustrate the predictive potential of the analytic approach.

Findings: The evolution of the science mapping research involves the development of a number of interrelated specialties. Four major specialties are discussed in detail in terms of four evolutionary stages: conceptualization, tool construction, application, and codification. Underlying connections between major specialties are also explored. The predictive analysis demonstrates citations trajectories of potentially transformative contributions.

Research limitations: The systematic review is primarily guided by citation patterns in the dataset retrieved from the literature. The scope of the data is limited by the source of the retrieval, i.e. the Web of Science, and the composite query used. An iterative query refinement is possible if one would like to improve the data quality, although the current approach serves our purpose adequately. More in-depth analyses of each specialty would be more revealing by incorporating additional methods such as citation context analysis and studies of other aspects of scholarly publications.

Practical implications: The underlying analytic process of science mapping serves many practical needs, notably bibliometric mapping, knowledge domain visualization, and visualization of scientific literature. In order to master such a complex process of science mapping, researchers often need to develop a diverse set of skills and knowledge that may span multiple disciplines. The approach demonstrated in this article provides a generic method for conducting a systematic review.

Originality/value: Incorporating the evolutionary stages of a specialty into the visual analytic study of a research domain is innovative. It provides a systematic methodology for researchers to achieve a good understanding of how scientific fields evolve, to recognize potentially insightful patterns from visually encoded signs, and to synthesize various information so as to capture the state of the art of the domain.

Cite this article

Chaomei Chen . Science Mapping: A Systematic Review of the Literature[J]. Journal of Data and Information Science, 2017 , 2(2) : 1 -40 . DOI: 10.1515/jdis-2017-0006

1 Introduction

Science mapping is a generic process of domain analysis and visualization. The scope of a science mapping study can be a scientific discipline, a field of research, or topic areas concerning specific research questions. In other words, the unit of analysis in science mapping is a domain of scientific knowledge that is reflected through an aggregated collection of intellectual contributions from members of a scientific community or more precisely defined specialties. The scope of the domain should contain major components that are relevant to the underlying research program. The structure of the domain may experience constant and possibly revolutionary changes.
A science mapping study typically consists of several components, notably a body of scientific literature, a set of scientometric and visual analytic tools, metrics, and indicators that can highlight potentially significant patterns and trends, and theories of scientific change that can guide the exploration and interpretation of visualized intellectual structures and dynamic patterns.
Commonly used sources of scientific literature include the Web of Science, Scopus, Google Scholar, and PubMed. Scientometric methods include author co-citation analysis (ACA) (Chen, 1999a; White & McCain, 1998), document co-citation analysis (DCA) (Chen, 2006; Small, 1973), co-word analysis (Callon et al., 1983), and many other variations. Visualization techniques include graph or network visualization (Herman, Melançon, & Marshall, 2000), visualizations of hierarchies or trees (Johnson & Shneiderman, 1991), visualizations of temporaral structures (Morris et al., 2003), geospatial visualizations, and coordinated views of multiple types of visualizations. Metrics and indicators of research impact include citation counts (Garfield, 1955), the h-index (Hirsch, 2005) and its numerous extensions, and a rich set of altmetrics on social media (Thelwall et al., 2013). Theories of scientific change include the paradigmatic views of scientific revolutions, scientific advances driven by competitions, and evolutionary stages of a scientific discipline. In order to conduct a science mapping study, researchers need to develop a good understanding of each of the categories of skills and knowledge outlined above. Furthermore, each of these categories is a current and active research area in its own right, for instance, the current research on finding the optimal field normalization method and the debates over how various potentially conflicting theories of scientific change may be utilized to reveal the underlying mechanisms of how science advances.
The complexity of science mapping is shared by many research fields. In this article, our aim is to demonstrate the process of a systematic review based on a series of visual analytic functions implemented in the continously evolving software CiteSpace (Chen, 2004; Chen, 2006; Chen, Ibekwe-SanJuan, & Hou, 2010). We demonstrate the steps of preparing a representative dataset, how to generate visualizations that can guide our review, and how to identify salient patterns at various levels of granularity. We also aim to set an example of a systematic review that can address questions that are commonly asked by researchers when they need to grasp the state of the art of a fast growing and complex scientific domain.
Since the origin and major milestones of the science mapping research will become clear as we visualize and explain our results, we will first describe the methodology and then present the results with our interpretations and reflections as the systematic review.

2 Theories of Scientific Change

The widely known theory of scientific change is Thomas Kuhn’s scientific revolutions (Kuhn, 1962). According to Kuhn’s structure of scientific revolutions, science advances as an iterative revolutionary process, in which scientific paradigms compete for a predominant position, i.e. paradigm shifts. The process consists of several stages: normal science, crises, and revolutions. At the normal science stage, research in a field is primarily dominated by a particular scientific paradigm, consisting of a set of theories, methods, and a consensus of a research agenda. At the crisis stage, anomalies become inevitable and they challenge the fundation of the current paradigm. Alternative and competing paradigms are developed to address the anomalies. At the revolutionary stage, compeling evidence is accumulated and competing paradigms become mature enough to take over the existing paradigm that has been evidently incapable of handling the pressing crises. A new paradigm replaces the existing paradigm and provides the overarching framework for the research community. This process repeats itself as the new paradigm becomes the norm.
A sociological theory of scientific change proposed by Fuchs (1993) challenges the Kuhnian paradigm shift model as an oversimplified view of the complex reality. Instead, Fuchs proposes that advances of science are driven by sociological reasons, i.e. by scientists who are competing for recognition and reputation in their organizational settings. Fuchs suggests that interactions of two variables may derive four types of scientific change. The two variables are task uncertainty and mutual dependence. Task uncertainty refers to the level of uncertainty involved in the course of scientific inquiry. The task uncertainty is high in scientific frontiers where research is essentially exploratory in nature and there is a high amount of tacit knowledge involved, for example, scientific discoveries of high creativity. In contrast, the task uncertainty is low in areas where tasks are rountinized. Mutual dependence refers to the social and organizational dependencies between scientists and their competing peers. A combination of high task uncertainty and high mutual dependence will lead to original scientific discoveries, which will bring a substaintial degree of recognitions and reputations such as Nobel Prizes. A research area with intensified competitions is also likely to have a high retraction rate (Chen et al., 2013). A combination of low task uncertainty and high mutual dependence will result in specialization to maintain the tension between scientists with high mutual dependence while they work on routinized research.
A relatively new theory of the evolution of a scientific discipline is proposed by Shneider (2009). The evolution of a scientific discipline is divided into four stages. A new discipline, or more generically, a research specialty, begins with a conceptualization stage - Stage I. At this stage, the object of the research is established, for example, as in science mapping, the object of science mapping is a scientific knowledge domain. The goal of the new research is to answer a set of research questions concerning the newly identified target. The next stage, Stage II, is characterized by the development of research instruments, or tools, that will enable researchers to investigate the underlying phenomona. Once researchers are equipped with special purpose tools, the research moves to the third stage, Stage III, the investigation of the research questions supported by the newly developed enabling techniques. This is a prolific stage as many results are produced and the understanding of the research problems is substantially advanced. The deepened understanding and a thorough examination may reveal previously unknown phenomena. The need to address the previously unknown phenomena may lead to the emergence of another new line of research or a new research specialty. The original specialty may continue to investigate along the original research agenda. In addition, tools developed by the original specialty may find their way to contribute to the development of other subject domains. In other words, a Stage III specialty may contribute to a specialty in its own Stage II. The final stage of a specialty is Stage IV. The specialty at this stage is characterized by transferring tacit knowledge to condified and rountinized knowledge. Comprehensive textbooks are written. Accumulated domain knowledge is reviewed, synthesized, and conveyed to new comers to the specialty as well as the existing members of the specialty.
There are other theories of scientific change. For example, a transition model of Exploration →Unification →Decline/Displacement was proposed by Mulkay, Gilbert, and Woolgar (1975). Nevertheless, the three outlined above are represenative and they sufficiently cover the major characteristics of the development of a scientific field for the purpose of our systematic review.
We can see obvious overlaps as well as clear distinctions among them. For example, Shneider’s Stage III overlaps with Kuhn’s crisis stage. Fuchs’ high task uncertainty and high mutual dependence may characterize a Stage I specialty. As a specialty transits from Stage III to Stage IV, it may transform itself from a high-high uncertainty-dependence mode to a low-high mode. A specialty may start to decay and it may be forgotten, but it may be revitalized years later, as a sleeping beauty in the conceptual world (van Raan, 2003).
In addition to the four evoluationary stages, Shneider also proposes that each stage may suit a particular type of scientists better than other stage in terms of how a type of talent matches a particular stage. In fact, Shneider suggests that the better scientists understand the four stages of development the more effectively they may optimize their career path. In addition, Shneider emphasizes that researchers at different stages may not have the mindset that would enable them to evaluate a discipline at a different evolutionary stage.
In corresponding to the four stages, we may identify the most suitable scientists for each stage as creative thinkers, visioners, boundary spanners, or brokers for scientists who may excel at Stage I; inventors and tool builders for Stage II; adaptors and experimenters for Stage III; and sythesizers, codifiers, and educators for Stage IV. In this review, we will adopt Shneider’s four-stage model in our interpretation of the results.

3 Method

3.1 Data Collection

The input data of our review is generated by a combination of the results from multiple topic search queries to the Web of Science (Figure 1). The rationale of the query construction is as follows. First, we would like to ensure that currently widely used science mapping tools such as VOSViewer, CiteSpace, HistCite, SciMAT, and Sci2 are covered by our topic search query. Thus, publications that mention any of these software tools in their titles, abstracts, and/or keyword lists will be included. This query generates 135 records as Set #.
Second, since the goal of science mapping is to identify the intellectual structure of a scientific domain, the second query focuses on the object of science mapping, including topic terms such as intellectual structure, scientific change, research front, invisible college, and domain analysis. As we will see later on, terms such as domain analysis may be ambiguous as they are also used in other contexts that are irrelevant to science mapping. In practice, we recommend to defer the assessment of relevance until the analysis stage. This query produces 13,242 records as Set #2.
The third query focuses on scientometric and visual analytic techniques that are potentially relevant to science mapping. Topic terms include science mapping, knowledge domain visualization, information visualization, citation analysis, and co-citation analysis. This query leads to 4,772 records.
The queries #4-#10 aim to retrieve bibliographic records on the common data sources for science mapping, including Scopus (6,782 records), the Web of Science (15,401 records), Google Scholar (5,170 records), Pubmed (46,760 records), and Medline (61,405 records).
The final dataset is Set #14, containing 17,731 bibliographic records of the types of Article or Review in English. This query formation strategy is generic enough to be applicable to a science mapping study unless of course one has access to the entire database.
Patents and research grants are other types of data sources one may consider, but for this particular review, we are limited to the scientific literature indexed by the Web of Science.
Figure 1. Topic search queries used for data collection.

3.2 Visualization and Analysis

We visualize and analyze the dataset with a new version of CiteSpace (5.0.R3 SE) (Figure 2). CiteSpace has been continuously developed to meet the needs for visual analytic tasks of science mapping. The new version will be released shortly to the public.
Figure 2. The main user interface of CiteSpace.
CiteSpace takes a set of bibliographic records as its input and models the intellectual structure of the underlying domain in terms of a synthesized network based on a time series of networks derived from each year’s publications. CiteSpace supports several types of bibliometric studies, including collaboration network analysis, co-word analysis, author co-citation analysis, document co-citation analysis, and text and geospatial visualizations. In this study, we focus on the document co-citation analysis within the period of time between 1995 and 2016.
The Set #14 contains 16,250 records published in the range of 1980-2017 (Figure 3). These records collectively cited 515,026 references. The document co-citation analysis function in CiteSpace constructs networks of cited references. Connections between references represent co-citation strengths. CiteSpace uses a time slicing technique to build a time series of network models over time and synthesizes these individual networks to form an overview network for the systematic review of the relevant literature.
Figure 3. The distribution of the bibliographic records in Set #14.
The synthesized network is divided into co-citation clusters of references. Citers to these references are considered as the research fronts associated with these clusters. Each cluster represents the intellectual base of the underlying specialty. According to Shneider’s four-stage model, the intellectual base of a specialty and the corresponding research fronts provide valuable insights into the current stage of the specialty as well as the intellectual milestones in the evolution of the specialty.
Our first step in the review is to make sense of the nature of major clusters and characteristics that may inform us about the stage of the underlying specialties. In this study, we consider a cluster as the embodiment of an underlying specialty. Thus, science mapping consists of multiple specialties that contribute to various aspects of the domain.
In each cluster, we focus on cluster members that are identified by structural and temporal metrics of research impact and evolutionary significance. A commonly used structural metric is the betweenness centrality of a node in a network. Studies have shown that nodes with high betweenness centrality values tend to identify boundary spanning potentials that may lead to transformative discoveries (Chen et al., 2009). Burst detection is a computational technique that has been used to identify abrupt changes of events and other types of information (Kleinberg, 2002). In CiteSpace, the sigma score of a node is a combinant metric of the betweenness centrality and the citation burstness of the node, i.e. the cited reference. CiteSpace represents the strength of these metrics through the design of visual encoding such that articles that are salient in terms of these metrics will be easy to see in the visualizations. For example, the citation history of a node is depicted as a number of treerings and each treering represents the number of citations received in the corresponding year of publication. If a citation burst is detected for a cited reference, the corresponding treering will be colored in red. Otherwise, treerings will be colored by a spectrum that ranges from cold colors such as blue to warm colors such as orange.
The nature of a cluster is identified from the following aspects: a hierarchy of key terms in articles that cite the cluster (Tibély et al., 2013), the prominent members of the cluster as the intellectual milestones in its evolution and as the intellectual base of the specialty, recurring themes in the citing articles to the cluster to reflect the interrelationship between the intellectual base and the research fronts. In particular, we will pay attention to indicators of the evolutionary stages of a specialty such as the original conceptualization, research instruments, applications, and routinization of the domain knowledge of the specialty.
In addition to the study of citation-based patterns, we will demonstrate the concept of citation trajectories in the context of distinct clusters. According to the theory of structural variation, the transformative potential of an article may be reflected by the extent to which it varies the existing intellectual structure (Chen, 2012). For example, if an article adds many inter-cluster links, it may alter the overall structure. If the structural change is subsequently accepted and reinforced by other researchers, then transformative changes of the knowledge become significant in a socio-cognitive view of the domain.

4 Results

A dual-map overlay of the science mapping literature represents the entire dataset in the context of a global map of science generated from ove 10,000 journals indexed in the Web of Science (Chen & Leydesdorff, 2014). The dual-map overlay shows that science mapping papers are published in almost all major disciplines (Figure 4). Publications in the discipline of information science (shown in the map as curves in cyan) are built on top of at least four disciplines on the right-hand side of the map.
Figure 4. A dual-map overlay of the science mapping literature.
A hierarchical visualization of index terms, i.e. keywords, is generated to represent the coverage of the dataset (Figure 5). Five semantic types of nodes are annotated in the visualized hierarchy:
These semantic types will be also used to identify the evolutionary stage of a specialty. For example, if a cluster contains several articles that report the development of software tools, then the underlying specialty is considered as a specialty that has reached at least Stage II. If the methodologies appear in a cluster of knowledge domains external to information science, such as regenerative medicine and strategic management research, then we will consider the specialty has reached Stage III - tools developed by the specialty are applied to other subject domains. In the following analysis, we will use the terms in the hierarchy as the primary source of our vocabulary to identify the role of the contributions made by a scientific publication to a specialty.
Figure 5. A hierarchy of index terms derived from Set #14.
Major milestones in the development of science mapping can be identified from the list of references that have strong citation bursts between 1995 and 2016 (Figure 6). References with strong values in the Strength column tend to be significant milestones for the science mapping research. We label such references with high-level concepts. For example, the first milestone paper in the study is a landmark ACA study of information science (White & McCain, 1998). The next milestone is a major collection of seminal papers in information visualization (Card, Mackinlay, & Shneiderman, 1999). Other major milestones include visual analytics (Thomas & Cook, 2005), and the h-index (Hirsch, 2005).
Figure 6. 49 references with citation bursts of at least 5 years.

4.1 Landscape View

The following landscape view is generated based on publications between 1995 and 2016 (Figure 7). Top 100 most cited publications in each year are used to construct a network of references cited in that year. Then individual networks are synthesized. The synthesized network contains 3,145 references. The network contains 603 co-citation clusters. The three largest connected components include 1,729 nodes, which account for 54% of the entire network. The network has a modularity of 0.8925, which is considered as very high, suggesting that the specialties in science mapping are clearly defined in terms of co-citation clusters. The average silhouette score of 0.3678 is relative low mainly because of the numerous small clusters. The major clusters that we will focus on in the review are sufficiently high.
The areas of different colors indicate the time when co-citation links in those areas appeared for the first time. Areas in blue were generated earlier than areas in green. Areas in yellow were generated after the green areas and so on. Each cluster can be labled by title terms, keywords, and abstract terms of citing articles to the cluster. For example, the yellow-colored area at the upper right quadrant is labeled as #3 information visualization, indicating that Cluster #3 is cited by articles on information visualization. The largest node is the paper that introduces the h-index. Other nodes with red treerings are references with citation bursts (Figure 7).
Figure 7. A landscape view of the co-citation network, generated by top 100 per slice between 1995 and 2016 (LRF = 3, LBY = 8, and e = 1.0).

4.2 Timeline View

A timeline visualization in CiteSpace depicts clusters along horizontal timelines (Figure 8). Each cluster is displayed from left to right. The legend of the publication time is shown on top of the view. The clusters are arranged vertically in the descending order of their size. The largest cluster is shown at the top of the view. The colored curves represent co-citation links added in the year of the corresponding color. Large-sized nodes or nodes with red treerings are of particular interest because they are either highly cited or have citation bursts or both. Below each timeline the three most cited references in a particular year are displayed. The label of the most cited reference is placed at the lowest position. References published in the same year are placed so that the less cited references are shifted to the left. The new version of CiteSpace supports the function to generate labels of a cluster year by year based on terms identified by Latent Semantic Indexing (LSI) (Deerwester et al., 1990). The year-by-year labels can be displayed in a table or above the corresponding timeline. Users may control the displays interactively.
Figure 8. A timeline visualization of the largest clusters of the total of 603 clusters.
Clusters are numbered from 0, i.e. Cluster #0 is the largest cluster and Cluster #1 is the second largest one. As shown in the timeline overview, the sustainability of a specialty varies. Some clusters sustain a period over 20 years, whereas some clusters are relatively short-lived. Some clusters remain active until 2015, the most recent year of publication for a cited reference in this study.
Each of the largest five clusters has over 150 members (Table 1). The largest cluster’s homogeneity in terms of the silhouette score is slightly lower than that of the smaller clusters. The largest cluster represents 4.5% of the references from the entire network and 8.1% of the largest three connected components of the network (LCCs). In this study, our review will primarily focus on the largest five clusters.
Table 1 The five largest clusters of co-cited references of the network of 3,145 references. The largest three connected components include 1,729 of the references.
Cluster Size Mean (year) Silhouette % of the
network
Accumulated %
of the network
% of top 3 LCCs Accumulated %
of LCCs
0 214 2006 0.748 4.5 4.5 8.1 8.1
1 209 1997 0.765 2.3 6.7 4.1 12.2
2 190 2009 0.845 3.3 10.0 6.0 18.2
3 160 2005 0.954 2.9 12.9 5.3 23.5
4 152 1992 0.890 1.7 14.6 3.0 26.5
The duration of a cluster is particularly interesting (Table 2). The largest cluster lasts 21 years and it is still active. Cluster #3 spans a 19-year period and also remains to be active. In contrast, Cluster #6 on webometrics ends by 2006, but as we will see, relevant research finds its way in new specialties, notably in the form of altmetrics.
Table 2 Temporal properties of major clusters.
Cluster ID Size Silhouette From To Duration Median Sustainability Activeness Theme
0 214 0.748 1995 2015 21 2006 ++++++ Active Science mapping
1 209 0.765 1990 2006 17 1997 ++ Inactive Domain analysis
2 190 0.845 2000 2015 16 2009 Active Research evaluation
3 160 0.954 1996 2014 19 2005 ++++ Active Information visualization / Visual analytics
4 152 0.890 1988 1999 12 1993 Inactive Applications of ACA
6 125 0.925 1995 2006 12 2001 Inactive Webometrics
8 93 0.882 1994 2010 17 2002 ++ Inactive Bibliometric studies of social work in health
11 48 0.965 1994 2006 13 2000 Inactive Bibliometric studies of management research
12 44 0.966 1990 1999 10 1996 Inactive Graph visualization
16 29 0.977 1999 2007 9 2003 Inactive Bibliometric studies of information systems
28 15 0.995 2004 2013 10 2008 Inactive Global trend; Water resources

4.3 Major Specialties

In the following discussion, we will particularly focus on the five largest clusters (Table 1). A research programme, or a paradigm, in a field of research can be characterized by its intellectual base and research fronts. The intellectual base is the collection of scholarly works that have been cited by the corresponding research community, whereas research fronts are the works that are inspired by the ones of the intellectual base. A variety of research fronts may rise from a common intellectual base.
4.3.1 Cluster #0 - Science Mapping
Cluster #0 is the largest cluster, containing 214 references across a 21-year period from 1995 till 2015 (Table 2). The median year of all references in this cluster is 2006, but the median year of the 20 most representative citing articles to this cluster is 2010. This cluster’s silhouette value of 0.748 is the lowest among the major clusters, but this is generally considered a relatively high level of homogeneity.
The primary focus of the large and currently active cluster is on the intellectual structure of a scientific discipline, a field of research, or any sufficiently self-contained domain of scientific inquiry. Key concepts identified from the titles
of citing articles to this cluster can be algorithmically organized according to hierarchical relations derived from co-occurring concepts (Figure 9). The largest branch of such a hierarchy typically reflects the core concepts of scholarly publications produced by the specialty behind the cluster. For example, concepts such as intellectual structure, co-citation analysis, and co-authorship network underline the primary interest of this specialty.
Figure 9. A hierarchy of key concepts selected from citing articles of Cluster #0 by log-likelihood ratio test.
We can use a simple method to classify various terms into two broad categories: domain-intrinsic or domain-extrinsic. Domain-intrinsic terms belong to the research field that aims to advance the conceptual and methodological capabilities of science mapping, for example, intellectual structure and co-citation analysis. In contrast, domain-extrinsic terms belong to the domain to which science mapping techniques are applied. In other words, they belong to the domain that is the object of a science mapping study. For example, stem cell research per se may not directly influence the advance of a specialty that is mainly concerned with how to identify the intellectual structure of a research field from scientific literature. Information science has a unique position. On the one hand, it is the discipline that hosts a considerable number of fields relevant to science mapping. On the other hand, it is the most frequent choice of a knowledge domain to test drive newly developed techniques and methods.
The timeline visualization reveals three periods of its development (Figure 10). The first period is from 1995 to 2002. This period is relatively uneventful without high-profile references in terms of citation counts or bursts. Two visualization-centric domain analysis articles (Boyack, Wylie, & Davidson, 2002; Chen et al., 2002), preluded the subsequent wave of high-impact studies appeared in the second period. This period also features a social network analysis tool UCINET (Borgatti, Everett, & Freeman, 2002).
Figure 10. High-impact members of Cluster #0
The second period is from 2003 to 2010. Unlike the first period, the second period is full of high-impact contributions - large citation treerings and periods of citation bursts colored in red. Several types of high-impact contributions appeared in this period, notably:
The third period is from 2011 to 2015. Although no citation bursts were detected so far in this period, the themes of this period shed additional insights into the more recent developmental status of the specialty. Most cited publications in this period include a study of the cognitive structure of library and information science (Milojević et al., 2011) and a few studies that focus on domains with no apparent overlaps with computer and information science, for example regenerative medicine (Chen, Dubin, & Kim, 2014; Chen et al., 2012) and strategic management (Vogel & Güttel, 2013).
A specialty may experience the initial conceptualization stage, the growth of research capabilities through the fourish of research tools, the expansion stage when researchers apply their methods to subject domains beyond the original research problems, and the final stage of decay (Shneider, 2009). The largest cluster is dominated by an overwhelming number of tool-related references. The top 20 most cited members of the cluster include several software tools such as CiteSpace (Chen, 2006; Chen et al., 2010), UCINET (Borgatti et al., 2002), VOSViewer (van Eck & Waltman, 2010), and global maps of science (Leydesdorff & Rafols, 2009) (Figure 11). In terms of the four-stage evolution model of Shneider, the underlying specialty evidently reached Stage II - the tool building stage by 2010.
Figure 11. Top 20 most cited references in the largest cluster.
The cluster also includes several author co-citation studies of disciplines and research areas such as information science (Chen et al., 2010; White, 2003) and strategic management (Nerur, 2008; Ramos-rodriguez, 2004). White (2003) revisits the intellectual structure of information science. Instead of using multidimensional scaling technique as they did in a previous study of the domain, the new study applied the Pathfinder network scaling technique and demonstrated the advantages of the technique. Pathfinder network scaling was first introduced to author co-citation analysis in (Chen, 1999b). Although the studies of strategic management research can be seen as applications outside the original specialty of author co-citation analysis, studies of information science typically involve the development of new tools as well as applications of existing tools.
Articles that cited members of the cluster convey additional information for us to understand the dynamics of the specialty. The top 20 citing articles ranked by the bibliographic overlap with the cluster reveal similar types of contributions, namely software tools and techniques (1, 2, 5, 8, 14), new methods (9, 11, 16, 19, 20), surveys and reviews (3, 10, 13), and applications of bibliometric studies (6, 12, 17) (Figure 12).
Figure 12. Major citing articles to the largest cluster.
The timeline visualization suggests that the specialty represented by the largest cluster has cumulated sufficient research techniques and tools by the end of the third period and that the specialty is likely in Stage III of its evolution. In other words, the specialty is currently dominated by the need to apply these new techniques to a broader range of scientific domains and address research questions at new levels. According to Shneider’s four-stage model, Stage III is also the evolutionary stage when the specialty may encounter discoveries that could change the course of the development of the specialty.
At a more pragmatic level, one may monitor the further development of the specialty by tracking research fronts that are built on the early stages of the specialty. One can monitor emerging trends and patterns in terms of the major dimensions in the latent semantic space spanned by each year’s publications connected to this particular cluster. For example, the growing number of domain-extrinsic terms such as nanotechnology, case study, and solar cell suggest an expansion of the research scope - a hallmark of a Stage III specialty.
4.3.2 Cluster #1 - Domain Analysis
Cluster #1 is the second largest cluster, containing 209 references that range a 17-year duration from 1990 to 2006. The cluster, or its underlying specialty, is largely inactive with reference to the resolution of this study. This cluster is dominated by representative terms such as information retrieval, domain analysis, scholarly communication, and intellectual space (Figure 13). Although information retrieval is the root node in the hierarchy of key terms in this cluster, domain analysis underlines the conceptual fundation of this cluster, as we will see shortly.
Figure 13. A hierarchy of key concepts in Cluster #1.
Two outstanding references from the timeline visualization of this cluster have strong citation burstness (Figure 14). One is a doman analysis of information science (White & McCain, 1998), in which the multidimensional scaling of an author co-citation space was utilized to visualize the intellectual structure of the domain. The other is a study of major approaches to domain analysis (Hjørland, 2002). In early 1990s, Hjørland developed a domain-analytic apporach, also known as socialogical-epistemological approach or a socio-cognitive view, as a methodological alternative to the then methodological individualism and cognitive perspective toward information science that largely marginalized the social, historical, and cultural roles in understanding a domain of scientific knowledge. Hjørland’s another article on domain analysis is also a member of the cluster (Hjørland, 1997).
Figure 14. Key members of Cluster #1.
The sigma score of a cited reference reflects its structural and temporal significance. In addition to the author co-citation analysis of information science (White & McCain, 1998), two more author co-citation studies are ranked highly by their
sigma scores, namely an author co-citation study of information retrieval (Ding, Chowdhury, & Foo, 1999), and an author co-citation study of hypertext (Chen, 1999b) (Figure 15).
Figure 15. Key members of Cluster #1, sorted by sigma.
The review article by White and McCain (1997) on “visualization of literature” is an important member of the cluster, whereas Tabah’s review of the study of literature dynamics (Tabah, 1999) is a citing article to the cluster. Although the term domain analysis was not used consistently during the period of this cluster, the contributions consistently focus on holistic views of a knowledge domain. As Hjørland argued, domain analysis serves a fundamental role in information science because its goal is to understand the subject matter from a holistic view of sociological, cognitive, historical, and epistemological dimensions.
Citing articles to Cluster #1 include some of the earliest attempts to integrate information visualization techniques to the methodology of a domain analysis (Börner et al., 2003; Boyack, Wylie, & Davidson, 2002; Chen et al., 2002) (Figure 16). Interestingly, some of these citing articles appear as cited references in Cluster #0. In other words, the downturn of Cluster #1 does not mean that researchers lost their interest in the domain analysis approaches. Rather, they shifted their focus to explore a new generation of domain analysis with the support of a variety of computational and visualization techniques. As a result, the specialty underlining Cluster #0 continues the vision conceived in the works of Cluster #1. The citers of Cluster #1 identify the group of researchers who would be the core members of the specialty of the new generation of domain analysis.
Figure 16. Citing articles to Cluster #1.
Author co-citation analysis (ACA) plays an instrumental role in the development of the domain analysis specialty embodied in Cluster #1. It is not only a bibliometric method that has been adopted by researchers beyond information science, but also a research instrument that helps to reveal challenges that the next generation of domain analysis must deal with.
In their ACA study of information science, White and McCain (1998) masterfully demonstrated the power and the potential of what one may learn from a holistic view of the intellectual landscape of a discipline. They utilized the multidimensional scaling technique as a vehicle for visualization and tapped into their encyclopedic knowledge of the information science discipline in an intellectually rich guided tour across the literature. In an attempt to enrich and enhance the conventional methodology of ACA, Chen (1999b) introduced the Pathfinder network scaling technique. Using Pathfinder networks brings several advantages to the methodology of ACA, including the ability to identify and preserve salient structural patterns and algorithmically derived visual cues to assist the navigation and interpretation of resultant visualizations (Chen & Morris, 2003). White (2003) revisited the ACA study of information science with Pathfinder network scaling. A fast algorithm to compute Pathfinder networks is published in 2008 (Quirin et al., 2008).
The re-introduction of the network thinking opens up a wider variety of computational techniques to an ACA study, notably network modeling and visualization. Furthermore, technical advances resulted from the improvement of ACA have been applied to a broader range of biliometric studies, notably document co-citation analysis (DCA) (Chen, 2004; Small, 1973; 1999). As we will see shortly, the adaptation of network modeling and information visualization techniques in general results from a Stage III specialty of information visualization and visual analytics.
4.3.3 Cluster #2 - Research Evaluation
Cluster #2 is the third largest cluster with 190 cited references and a silhouette value of 0.845, which is slightly higher than the previous two larger clusters #0 and #1, suggesting a higher homogeneity. In other words, one would consider this specialty a more specialized than the previously identified specialties. This cluster is active over a 16-year period from 2000 till 2015. It represents an active specialty.
The overarching theme of the cluster is suggested by the two major branches shown in the hierarchy of key terms of this cluster: the information visualization branch and the much larger branch of research evaluation (Figures 17 & 18). The information visualization branch highlights the recurring themes of intellectual structure and co-citation analysis. The research evaluation branch highlights numerous concpets that are central to measuring scholarly impact, notably h-index, bibliometric ranking, bibliometric indicator, sub-field normalization, Web indicator, citation distribution, social media metrics, and alternative metrics.
Figure 17. A hierarchy of key concepts in Cluster #2.
Figure 18. High-impact members of Cluster #2.
The six-year period from 2005 through 2010 is a highly active period of the cluster (Figure 19). The most prominent contributions in this period include the original article that introduces the now widely known h-index (Hirsch, 2005),
the subsequent introduction of g-index as a refinement by taking citations into account (Egghe, 2006), a 2007 study that compares the impact of using the Web of Science, Scopus, and Google Scholar on citation-based ranking (Meho & Yang, 2007), a 2008 review entitled “What do citation counts measure?” (Bornmann & Daniel, 2008), and a study of the universality of citation distributions (Radicchi, Fortunato, & Castellano, 2008). These papers are also among the top sigma ranked members of this cluster because of their structural centrality as well as the strength of their citation burstness.
Figure 19. High-impact members of Cluster #2.
The top 20 citing articles of the cluster reveal a considerable level of thematic consistency (Figure 20). The overarching theme of research evaluation is evidently behind all these articles with popular title terms identified by latent semantic indexing such as citation impact, scientific impact, impact measures, bibliometric indicators, research evaluation, and Web indicators.
Figure 20. Citing articles of Cluster #2.
Some of the more recent and highly cited members in Cluster #2 include a comparative study of 11 altmetrics and counterpart articles matched in the Web of Science (Thelwall et al., 2013), the Leiden manifesto for research metrics (Hicks
et al., 2015
), and a study of power law properties in citation distributions based on over 6 millions of Scopus records (Brzezinski, 2015).
4.3.4 Cluster #3 - Information Visualization and Visual Analytics
Cluster #3 is the fourth largest cluster. Its duration ranges from 2004 through 2014. The topic hierarchy has two branches: information visualization and heart rate variability (Figures 21 & 22). The heart rate variability does not belong to the domain analysis in the context of information science. In fact, its inclusion in the original results of the topic search was due to the ambiguity of the term domain analysis across multiple disciplines. Pragmatically it is easier and more efficient to simply skip an irrelevant branch than keep refining the original topic search query untill all noticeable irrelevant topics are eliminated. This is one of the foundamental challenges for information retrieval and this is where domain analysis has an instrumental role to play (Hjørland, 2002).
Figure 21. A hierarchy of key concepts in Cluster #3.
Figure 22. High-impact members of Cluster #3.
The information visualization branch includes a mixture of information visualization techniques such as fisheye view, group drawing, graph visualization, and visual analytics and topics that are center to information science such as citation analysis and information retrieval. The mixture is a sign of attempts to apply information visualization and visual analytic techniques to bibliometric approaches to the study of intellectual structure of a research domain. The vision of information visualization is to identify insightful patterns from abstract information (Card, Mackinlay, & Shneiderman, 1999; Chen, 2005; Chen, 2010). The subsequently emerged visual analytics emphasizes the critical and more specific role of sense-making and analytic reasoning in accomplishing such goals (Chen, 2008; Keim et al., 2008; Thomas & Cook, 2005).
High-impact contributions in Cluster #3 include the collection of seminal works in information visualization (Card, Mackinlay, & Shneiderman, 1999), a survey of graph visualization techniques (Herman, Melançon, & Marshall, 2000), Cytoscape - a widely used software tool for visualizing biomolecular interaction networks (Shannon et al., 2003), the ground breaking work of visual analytics (Thomas & Cook, 2005), Many Eyes - the popular Web-based visualization platform (Viégas et al., 2007), and a framework of seven types of interaction techniques in information visualization (Yi et al., 2007) (Figure 23).
Figure 23. Key members of Cluster #3.
In addition to the above high-impact contributions, this cluster features information visualization tools such as the InfoVis toolkit (Fekete, 2004), NodeTrix (Henry, Fekete, & McGuffin, 2007), Jigsaw - a visual analytic tool (Stasko, Gorg, & Liu, 2008), and D3 (Bostock, Ogievetsky, & Heer, 2011). The most widely used information visualization tools such as Many Eyes and D3 became available between 2007 and 2011 (Figure 24).
Figure 24. Citing articles of Cluster #3.
According to Shneider’s four-stage model, the information visualization and visual analytics specialty in the context of domain analysis and literature visualization has demonstrated properties of a Stage IV specialty. For example, in the most recent few years of the cluster, researchers reflect on empirical evaluations of information visualization in various scenarios (Lam et al., 2012), revisit taxonomic organizations of abstract visualization tasks (Brehmer & Munzner, 2013), and synthesize and codify domain knowledge in the forms of textbooks (Munzner, 2014).
4.3.5 Remaining Clusters
The remaining clusters are either relatively small in size or short in terms of the length of their duration. We will omit detailed discussions of these clusters. Readers may refer to supplementary materials provided on the project website. We outline a few more relevant clusters as follows.
Cluster #4 focuses on applications of bibliometric studies to research domains such as decision support systems and information retrieval studies. Top cited references in this cluster are mostly articles published in the early 1990s.
Cluster #6 focuses on webometrics, led by an article on methodological approaches to webometrics (Almind & Ingwersen, 1997). This cluster was active during the period between 1995 and 2006. The leading contributors of this specialty such as Mike Thelwall continue to make active contributions to Cluster #2 Research Evaluation, especially in association with the development of altmetrics. A review of scholarly communication and bibliometrics by Borgman and Furner (2002) is also a key member of this cluster.

4.4 Trajectories of Citations across Cluster Boundaries

Cluster analysis helps us to understand the major specialties associated with science mapping. Now we turn our attention to the trajectories of several leading contributors in the landscape of these clusters. We are interested in what we may learn from citation links made in publications of a scholar, especially those links bridging distinct clusters.
4.4.1 Trajectories of Prolific Authors
The first example is the citation trajectory of Howard White (Figure 25 left). He is the author of several seminal papers featured in several clusters. His citation trajectories move across the citation landscape from the left to the center, ranging from #4 decision support system (applications of ACA), #1 domain visualization (domain analysis), and #8 social work (another cluster of bibliometric studies).
Figure 25. Novel co-citations made by 8 papers of White (left) and by 14 papers of Thelwall (right).
The second example is the citation trajectory of Mike Thelwall (Figure 25 right). He is a prolific researcher who contributed to webometrics and altmetrics among other areas of bibliometrics. An overlay of his citation trajectories on a citation landscape view shows that his trajectories spanning clusters such as #6 university websites (webometrics) and # google scholar (research evaluation).
In both examples of citatoin trajectories, we have observed that their citation trajectories span across a wide area over the citation landscape. Monitoring the movement of citation trajectories in such a way provides an intuitive insight into the evolution of the underlying specialties and the context in which high-impact researchers make their contributions.
4.4.2 Articles with Transformative Potentials
It is widely known that a major limitation of any citation-based indicators is their reliance on citations accumulated over time. Thus, citation-based indicators are likely to overlook newly published articles. An alternative method is to focus on the extent to which a newly published article brings to the conceptual structure of the knowledge domain of interest (Chen, 2012). The idea is to identify the potential of an article to make extrordinary or unexpected connections across distinct clusters. According to theories of scientific discovery, many significant contributions are resulted from boundary spanning ideas.
Table 3 lists three articles each year for the last five years. These articles have the highest geometric mean of three structural variation variables generated by CiteSpace. For example, in 2016, the highest score goes to the review of citation impact indicators by Waltman (2016), followed by two bibliometric analyses - one contrasts two closely related but distinct domains and the other studies the research over a 20-year span (Figure 26). In 2015, two bibliometric studies are followed by a review of theory and practice in scientometrics (Mingers & Leydesdorff, 2015).
Table 3 Potentially transformative papers published in recent years (2012-2016).
Year ∆M ∆CLw CKL Geometric Mean GC Title Reference
2016 6.0541 0.0152 0.0251 0.1322 5 A review of the literature on citation impact indicators (Waltman, 2016)
2016 0.9235 0.0019 0.3407 0.0842 0 How are they different? A quantitative domain comparison of information visualization and data visualization (2000-2014) (Kim, Zhu, & Chen, 2016)
2016 0.8207 0.0017 0.0640 0.0447 2 A bibliometric analysis of 20 years of research on software product lines (Heradio et al., 2016)
2015 1.7498 0.0073 0.0380 0.0786 0 Global ontology research progress: A bibliometric analysis (Zhu et al., 2015)
2015 1.9873 0.0052 0.0397 0.0743 9 Bibliometric methods in management and organization (Zupic, 2015)
2015 1.9906 0.0029 0.0238 0.0516 13 A review of theory and practice in scientometrics (Mingers & Leydesdorff, 2015)
2014 1.6240 0.0087 0.0434 0.0850 3 Research dynamics: Measuring the continuity and popularity of research topics (Yan, 2014)
2014 1.1837 0.0031 0.0463 0.0554 1 Making a mark: A computational and visual analysis of one researcher’s intellectual domain (Skupin, 2014)
2014 0.4462 0.0024 0.0270 0.0307 12 The knowledge base and research front of information science 2006-2010: An author cocitation and bibliographic coupling analysis (Zhao & Strotmann, 2014)
2013 2.5398 0.0112 0.0643 0.1223 13 Analysis of bibliometric indicators for individual scholars in a large data set (Radicchi & Castellano, 2013)
2013 1.0781 0.0065 0.2180 0.1152 6 A visual analytic study of retracted articles in scientific literature (Chen et al., 2013)
2013 1.7978 0.0064 0.0542 0.0854 24 Quantitative evaluation of alternative field normalization procedures (Li et al., 2013)
2012 3.6274 0.0107 0.0811 0.1466 29 SciMAT: A new science mapping analysis software tool (Cobo et al., 2012)
2012 3.4380 0.0248 0.0259 0.1302 15 A forward diversity index (Carley & Porter, 2012)
2012 1.0719 0.0032 0.0321 0.0479 11 Visualizing and mapping the intellectual structure of information retrieval (Rorissa & Yuan, 2012)
Figure 26. Three examples of articles with high modularity change rates: 1) (Waltman, 2016), 2) (Zupic, 2015), and 3) (Zhu et al., 2015).
These highly ranked articles represent a few types of studies that may serve as predictive indicators, namely review papers (Mingers & Leydesdorff, 2015; Waltman, 2016), applications of bibliometric studies to specific domains, software tools for science mapping (Cobo et al., 2012), new metrics and indicators (Li et al., 2013), and visual analytic studies of unconventional topics - retractions (Chen et al., 2013).

4.5 The Emergence of a Specialty

The emergence of a specialty is determined by two factors: the intellectual base and the research fronts associated with the intellectual base. The intellectual base is what the specialty cites, whereas the research fronts are what the specialty is currently addressing. As we have seen, on the one hand, a research front may remain in the same co-citation cluster as in the case of Cluster #2 Research Evaluation. On the other hand, a research front may belong to a different specialty and become the intellectual base of a new specialty as in the case of Cluster #1 Domain Analysis and Cluster #0 Science Mapping.
The citation trajectories of a researcher’s publications and the positions of these publications as cited references can be simultaneously shown by overlaying trajectories (dashed lines for novel links or solid lines for existing links) and citing papers as stars if they also appear in a co-citation cluster as cited references. For example, the series of stars in the following visualization tell us two things: First, the author is connecting topics in two clusters (Cluster #0 Science Mapping and Cluster 2 Research Evaluation) and second, the author belongs to the specialty of science mapping (Figure 27).
Figure 27. Stars indicate articles that are both cited and citing articles. Dashed lines indicate novel co-citation links. Illustrated based on 15 papers of the author’s own publications.
The example below illustrates the citation trajectories of Howard White’s publications and their own positions in the timelines of clusters (Figure 28). His publications appear in the early stage of the science mapping cluster (#0) and make novel connections between science mapping and domain analysis (Cluster #1), domain analysis (Cluster #1) and applications of ACA (Cluster #4), domain analysis (Cluster #1) and webometrics (Cluster #6).
Figure 28. Citation trajectories of Howard White’s publications and their own locations.
The next example depicts the novel co-citation links made by a review paper of informetrics (Bar-Ilan, 2008) (Figure 29). These novel links include within-cluster links as well as between-cluster links. It should be easy to tell that the scope of the review is essentially limited to research papers published about six to seven years prior to the time of the review. Furthermore, we can see that the review systematically emphasizes the diversity of topics instead of tracing to the origin of any particular specialty.
Figure 29. Novel links made by a review paper of informetrics (Bar-Ilan, 2008).

5 Discussions and Conclusions

We present a visual domain analysis of the science mapping research. Our intention is twofolds. First, our goal is to demonstrate the depth of a systematic review that one can reach by applying a science mapping approach to itself. In addition to the application of computational functions available in the new version of the CiteSpace software, we enrich the procedure of producing a systematic review of a knowledge domain by incorporating evolutionary models of a scientific specialty, especially the four-stage model of a scientific discipline, into the interpretation of the identified specialties. Our interpretation not only identifies thematic milestones of major streams of science mapping research, but also characterizes the developmental stages of the underlying specialties and the dynamics of transitions from one specialty to another.
Second, our goal is to provide a reliable historiographic survey of the science mapping research. The survey identifies the major clusters in terms of their high-impact members and citing articles that form new research fronts. We also demonstrate new insights that one can intuitively obtain through an inspection of citation trajectories and the positions of citing papers. The enhanced science mapping procedure introduced in this article is applicable to the analysis of other domains of interest. Researchers can utilize these visual analytic tools to perform timely surveys of the literature as frequently as they wish and find relevant publications more effectively.
The most active areas of scientific inquiries are also where the level of uncertainty is the highest (Chen, 2016; Fuchs, 1993). The evidence revealed in our study suggests that science mapping is a Stage III specialty. Research instruments become increasingly powerful and accessible. A wider range of applications of existing techniques will in turn widen our horizen and deepen our understanding of the challenges that we need to overcome in order to advance the state of the art of science mapping.
Dr. Chaomei Chen is a Professor of Informatics in the College of Computing and Informatics at Drexel University, USA. He received a B.Sc. in Mathematics (Nankai University, China), an M.Sc. in Computation (University of Oxford, UK) and a Ph.D. in Computer Science (University of Liverpool, UK). He served as a Visiting Professor at Brunel University, UK and a Chang Jiang Scholar at Dalian University of Technology, China. He served as a member of Thomson Reuter Strategic Advisory Board, the Research Portfolio Analysis Subcommittee of the CISE/SBE Advisory Committee of the National Science Foundation of the USA, a reviewer of the Chang Jiang Scholars Program of the Chinese Ministry of Education, and expert reviewers for national funding agencies of countries such as Austria, Canada, Ireland, the Netherlands as well as the USA. Dr. Chen is the founding editor and the Editor-in-Chief of the journal Information Visualization, the founding editor and the Specialty Chief Editor of Frontiers in Research Metrics and Analytics, and serves on the editorial board of Journal of Data and Information Science. His research and scholarly expertise is in the visual analytic reasoning and assessment of critical information in complex adaptive systems. His work focuses on identifying emerging trends and potentially transformative changes in the development of science and technology, especially through computational and visual analytic approaches. He is the author of The Fitness of Information: Quantitative Assessments of Critical Information Wiley, 2014), Turning Points: The Nature of Creativity (Springer, 2011), Information Visualization: Beyond the Horizon (Springer 2004 2006) and Mapping Scientific Frontiers: The Quest for Knowledge Visualization (Springer 2003, 2013). Dr. Chen has published over 200 peer-reviewed articles in multiple disciplines, including computer science and information science. His work has been cited over 12,000 times on Google Scholar. His research has been supported by the National Science Foundation (NSF) and other government agencies as well as industrial sponsors such as Elsevier, IMS Health, Lockheed Martin, and Pfizer. His earlier research was funded by the European Commission, the Engineering and Physical Sciences Research Council (UK), and the Library and Information Commission (UK). Dr. Chen has designed and developed the widely used visual analytics software CiteSpace for visualizing and analyzing structural and temporal patterns in scientific literature.

The authors have declared that no competing interests exist.

[1]
Ahlgren P., Jarneving B., & Rousseau R. (2003). Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. Journal of the American Society for Information Science and Technology, 54(6), 550-560.Author cocitation analysis (ACA), a special type of cocitation analysis, was introduced by White and Griffith in 1981. This technique is used to analyze the intellectual structure of a given scientific field. In 1990, McCain published a technical overview that has been largely adopted as a standard. Here, McCain notes that Pearson's correlation coefficient (Pearson's r) is often used as a similarity measure in ACA and presents some advantages of its use. The present article criticizes the use of Pearson's r in ACA and sets forth two natural requirements that a similarity measure applied in ACA should satisfy. It is shown that Pearson's r does not satisfy these requirements. Real and hypothetical data are used in order to obtain counterexamples to both requirements. It is concluded that Pearson's r is probably not an optimal choice of a similarity measure in ACA. Still, further empirical research is needed to show if, and in that case to what extent, the use of similarity measures in ACA that fulfill these requirements would lead to objectively better results in full-scale studies. Further, problems related to incomplete cocitation matrices are discussed.

DOI

[2]
Almind,T.C., & Ingwersen,P. (1997). Informetric analyses on the world wide web: Methodological approaches to “webometrics”. Journal of Documentation, 53(4), 404-426.This article introduces the application of informetric methods to the World Wide Web (WWW), also called Webometrics. A case study presents a workable method for general informetric analyses of the WWW. In detail, the paper describes a number of specific informetric analysis parameters. As a case study the Danish proportion of the WWW is compared to those of other Nordic countries. The methodological approach is comparable with common bibliometric analyses of the ISI citation databases. Among other results the analyses demonstrate that Denmark would seem to fall seriously behind the other Nordic countries with respect to visibility on the Net and compared to its position in scientific databases.

DOI

[3]
Börner K., Chen C., & Boyack K.W. (2003). Visualizing knowledge domains. Annual Review of Information Science and Technology, 37(1), 179-255.

[4]
Bar-Ilan,J. (2008). Informetrics at the beginning of the 21st century - A review. Journal of Informetrics, 2(1), 1-52.This paper reviews developments in informetrics between 2000 and 2006. At the beginning of the 21st century we witness considerable growth in webometrics, mapping and visualization and open access. A new topic is comparison between citation databases, as a result of the introduction of two new citation databases Scopus and Google Scholar. There is renewed interest in indicators as a result of the introduction of the h-index. Traditional topics like citation analysis and informetric theory also continue to develop. The impact factor debate, especially outside the informetric literature continues to thrive. Ranked lists (of journal, highly cited papers or of educational institutions) are of great public interest.

DOI

[5]
Borgatti S.P., Everett M.G., & Freeman L.C. (2002). Ucinet for Windows: Software for social network analysis. Harvard, MA: Analytic Technologies.

[6]
Borgman,C.L., & Furner, J. (2002). Scholarly communication and bibliometrics. Annual Review of Information Science and Technology, 36, 3-72.

[7]
Bornmann,L., & Daniel, H.D. (2008). What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1), 45-80.

[8]
Bostock M., Ogievetsky V., & Heer J. (2011). D³: Data-driven documents. IEEE Transactions on Visualization and Computer Graphics, 17(12), 2301-2309.Data-Driven Documents (D3) is a novel representation-transparent approach to visualization for the web. Rather than hide the underlying scenegraph within a toolkit-specific abstraction, D3 enables direct inspection and manipulation of a native representation: the standard document object model (DOM). With D3, designers selectively bind input data to arbitrary document elements, applying dynamic transforms to both generate and modify content. We show how representational transparency improves expressiveness and better integrates with developer tools than prior approaches, while offering comparable notational efficiency and retaining powerful declarative components. Immediate evaluation of operators further simplifies debugging and allows iterative development. Additionally, we demonstrate how D3 transforms naturally enable animation and interaction with dramatic performance improvements over intermediate representations.

[9]
Boyack K.W., Klavans R., & Börner K. (2005). Mapping the backbone of science. Scientometrics, 64(3), 351-374.This paper presents a new map representing the structure of all of science, based on journal articles, including both the natural and social sciences. Similar to cartographic maps of our world, the map of science provides a bird’s eye view of today’s scientific landscape. It can be used to visually identify major areas of science, their size, similarity, and interconnectedness. In order to be useful, the map needs to be accurate on a local and on a global scale. While our recent work has focused on the former aspect,1 this paper summarizes results on how to achieve structural accuracy. Eight alternative measures of journal similarity were applied to a data set of 7,121 journals covering over 1 million documents in the combined Science Citation and Social Science Citation Indexes. For each journal similarity measure we generated two-dimensional spatial layouts using the force-directed graph layout tool, VxOrd. Next, mutual information values were calculated for each graph at different clustering levels to give a measure of structural accuracy for each map. The best co-citation and inter-citation maps according to local and structural accuracy were selected and are presented and characterized. These two maps are compared to establish robustness. The inter-citation map is then used to examine linkages between disciplines. Biochemistry appears as the most interdisciplinary discipline in science.

DOI

[10]
Boyack K.W., Wylie B.N., & Davidson G.S. (2002). Domain visualization using VxInsight® for science and technology management. Journal of the American Society for Information Science and Technology, 53(9), 764-774.We present the application of our knowledge visualization tool, VxInsight庐, to enable domain analysis for science and technology management within the enterprise. Data mining from sources of bibliographic information is used to define subsets of information relevant to a technology domain. Relationships between the individual objects (e.g., articles) are identified using citations, descriptive terms, or textual similarities. Objects are then clustered using a force-directed placement algorithm to produce a terrain view of the many thousands of objects. A variety of features that allow exploration and manipulation of the landscapes and that give detail on demand, enable quick and powerful analysis of the resulting landscapes. Examples of domain analyses used in S&T management at Sandia are given.

DOI

[11]
Brehmer,M., & Munzner, T. (2013). A multi-level typology of abstract visualization tasks. IEEE Transactions on Visualization and Computer Graphics, 19(12), 2376-2385.The considerable previous work characterizing visualization usage has focused on low-level tasks or interactions and high-level tasks, leaving a gap between them that is not addressed. This gap leads to a lack of distinction between the ends and means of a task, limiting the potential for rigorous analysis. We contribute a multi-level typology of visualization tasks to address this gap, distinguishing why and how a visualization task is performed, as well as what the task inputs and outputs are. Our typology allows complex tasks to be expressed as sequences of interdependent simpler tasks, resulting in concise and flexible descriptions for tasks of varying complexity and scope. It provides abstract rather than domain-specific descriptions of tasks, so that useful comparisons can be made between visualization systems targeted at different application domains. This descriptive power supports a level of analysis required for the generation of new designs, by guiding the translation of domain-specific problems into abstract tasks, and for the qualitative evaluation of visualization usage. We demonstrate the benefits of our approach in a detailed case study, comparing task descriptions from our typology to those derived from related work. We also discuss the similarities and differences between our typology and over two dozen extant classification systems and theoretical frameworks from the literatures of visualization, human-computer interaction, information retrieval, communications, and cartography.

[12]
Brzezinski,M. (2015). Power laws in citation distributions: Evidence from Scopus. Scientometrics, 103(1), 213-228.Modeling distributions of citations to scientific papers is crucial for understanding how science develops. However, there is a considerable empirical controversy on which statistical model fits the citation distributions best. This paper is concerned with rigorous empirical detection of power-law behaviour in the distribution of citations received by the most highly cited scientific papers. We have used a large, novel data set on citations to scientific papers published between 1998 and 2002 drawn from Scopus. The power-law model is compared with a number of alternative models using a likelihood ratio test. We have found that the power-law hypothesis is rejected for around half of the Scopus fields of science. For these fields of science, the Yule, power-law with exponential cut-off and log-normal distributions seem to fit the data better than the pure power-law model. On the other hand, when the power-law hypothesis is not rejected, it is usually empirically indistinguishable from most of the alternative models. The pure power-law model seems to be the best model only for the most highly cited papers in physics and Astronomy. Overall, our results seem to support theories implying that the most highly cited scientific papers follow the Yule, power-law with exponential cut-off or log-normal distribution. Our findings suggest also that power laws in citation distributions, when present, account only for a very small fraction of the published papers (less than 1 % for most of science fields) and that the power-law scaling parameter (exponent) is substantially higher (from around 3.2 to around 4.7) than found in the older literature.

DOI PMID

[13]
Callon M., Courtial J.P., Turner W.A., & Bauin S. (1983). From translations to problematic networks - an introduction to co-word analysis. Social Science Information Sur Les Sciences Sociales, 22(2), 191-235.Recent empirical studies on research laboratories and the mechanism of policy elaboration in the field of science and technology have called into question some of the basic hypotheses guiding the sociology of the sciences over the past ten years. First, it now

DOI

[14]
Card S., Mackinlay D.J., & Shneiderman B. (1999). Readings in information visualization: Using vision to think. San Francisco, CA: Morgan Kaufmann Publisher.

[15]
Carley,S., & Porter, A.L. (2012). A forward diversity index. Scientometrics, 90(2), 407-427.We introduce an indicator to measure the diffusion of scientific research. Consistent with Stirling’s 3-factor diversity model, the diffusion score captures not only variety and balance, but also disparity among citing article cohorts. We apply it to benchmark article samples from six 1995 Web of Science subject categories (SCs) to trace trends in knowledge diffusion over time since publication. Findings indicate that, for most SCs, diffusion scores steadily increase with time. Mathematics is an outlier. We employ a typology of citation trends among benchmark SCs and correlate this with diffusion scores. We also find that self-cites do not, in most cases, significantly influence diffusion scores.

DOI

[16]
Chen,C. (1999a). Visualising semantic spaces and author co-citation networks in digital libraries. Information Processing & Management, 35(2), 401-420.This paper describes the development and application of visualisation techniques for users to access and explore information in a digital library effectively and intuitively. Salient semantic structures and citation patterns are extracted from several collections of documents, including the ACM SIGCHI Conference Proceedings (1995–1997) and ACM Hypertext Conference Proceedings (1987–1998), using Latent Semantic Indexing and Pathfinder Network Scaling. The unique spatial metaphor leads to a natural combination of search and navigation within the same semantic space in a 3-dimensional virtual world. Author co-citation patterns are visualised through a number of author co-citation maps in attempts to reveal the structure of the hypertext, including an overall co-citation map of 367 authors and three periodical maps. These maps highlight predominant research areas in the field. This approach provides a means for transcending the boundaries of collections of documents and visualising more profound patterns in terms of semantic structures and co-citation networks.

DOI

[17]
Chen,C. (1999b). Visualising semantic spaces and author co-citation networks in digital libraries. Information Processing & Management, 35(3), 401-420.This paper describes the development and application of visualisation techniques for users to access and explore information in a digital library effectively and intuitively. Salient semantic structures and citation patterns are extracted from several collections of documents, including the ACM SIGCHI Conference Proceedings (1995–1997) and ACM Hypertext Conference Proceedings (1987–1998), using Latent Semantic Indexing and Pathfinder Network Scaling. The unique spatial metaphor leads to a natural combination of search and navigation within the same semantic space in a 3-dimensional virtual world. Author co-citation patterns are visualised through a number of author co-citation maps in attempts to reveal the structure of the hypertext, including an overall co-citation map of 367 authors and three periodical maps. These maps highlight predominant research areas in the field. This approach provides a means for transcending the boundaries of collections of documents and visualising more profound patterns in terms of semantic structures and co-citation networks.

DOI

[18]
Chen,C. (2004). Searching for intellectual turning points: Progressive knowledge domain visualization. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl.), 5303-5310.This article introduces a previously undescribed method progressively visualizing the evolution of a knowledge domain's cocitation network. The method first derives a sequence of cocitation networks from a series of equal-length time interval slices. These time-registered networks are merged and visualized in a panoramic view in such a way that intellectually significant articles can be identified based on their visually salient features. The method is applied to a cocitation study of the superstring field in theoretical physics. The study focuses on the search of articles that triggered two superstring revolutions. Visually salient nodes in the panoramic view are identified, and the nature of their intellectual contributions is validated by leading scientists in the field. The analysis has demonstrated that a search for intellectual turning points can be narrowed down to visually salient nodes in the visualized network. The method provides a promising way to simplify otherwise cognitively demanding tasks to a search for landmarks, pivots, and hubs.

DOI PMID

[19]
Chen,C. (2005). Top 10 unsolved information visualization problems. IEEE Computer Graphics and Applications, 25(4), 12-16.The author presents a revised and extended version of the top unsolved problems of information visualization that he outlined in an IEEE Visualization 2004 panel. These problems are not necessarily imposed by technical barriers; rather, they are problems that might hinder the growth of information visualization as a field. The first three problems highlight issues from a user-centered perspective. The fifth, sixth, and seventh problems are technical challenges in nature. The last three are the ones that need tackling at the disciplinary level.

DOI PMID

[20]
Chen,C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359-377.This article describes the latest development of a generic approach to detecting and visualizing emerging trends and transient patterns in scientific literature. The work makes substantial theoretical and methodological contributions to progressive knowledge domain visualization. A specialty is conceptualized and visualized as a time-variant duality between two fundamental concepts in information science: research fronts and intellectual bases. A research front is defined as an emergent and transient grouping of concepts and underlying research issues. The intellectual base of a research front is its citation and co-citation footprint in scientific literature-an evolving network of scientific publications cited by research-front concepts. Kleinberg's (2002) burst-detection algorithm is adapted to identify emergent research-front concepts. Freeman's (1979) betweenness centrality metric is used to highlight potential pivotal points of paradigm shift over time. Two complementary visualization views are designed and implemented: cluster views and time-zone views. The contributions of the approach are that (a) the nature of an intellectual base is algorithmically and temporally identified by emergent research-front terms, (b) the value of a co-citation cluster is explicitly interpreted in terms of research-front concepts, and (c) visually prominent and algorithmically detected pivotal points substantially reduce the complexity of a visualized network. The modeling and visualization process is implemented in CiteSpace II, a Java application, and applied to the analysis of two research fields: mass extinction (1981-2004) and terrorism (1990-2003). Prominent trends and pivotal points in visualized networks were verified in collaboration with domain experts, who are the authors of pivotal-point articles. Practical implications of the work are discussed. A number of challenges and opportunities for future studies are identified.

DOI

[21]
Chen,C. (2008). An information-theoretic view of visual analytics. IEEE Computer Graphics & Applications, 28(1), 18-23.Visual analytics is an emerging discipline that helps connect dots. It facilitates analytical reasoning and decision making through integrated and highly interactive visualization of complex and dynamic data and situations. Solving mysteries is only part of the game. Visual analytics must augment analyst and decision-maker capabilities to assimilate complex situations and reach informed decisions. Information theory offers a framework for keeping focused on the right questions.

DOI PMID

[22]
Chen,C. (2010). Information visualization. Wiley Interdisciplinary Review: Computational Statistics, 2(4), 387-403.

[23]
Chen,C. (2012). Predictive effects of structural variation on citation counts. Journal of the American Society for Information Science and Technology, 63(3), 431-449.A critical part of a scientific activity is to discern how a new idea is related to what we know and what may become possible. As the number of new scientific publications arrives at a rate that rapidly outpaces our capacity of reading, analyzing, and synthesizing scientific knowledge, we need to augment ourselves with information that can effectively guide us through the rapidly growing intellectual space. In this article, we address a fundamental issue concerning what kinds of information may serve as early signs of potentially valuable ideas. In particular, we are interested in information that is routinely available and derivable upon the publication of a scientific paper without assuming the availability of additional information such as its usage and citations. We propose a theoretical and computational model that predicts the potential of a scientific publication in terms of the degree to which it alters the intellectual structure of the state of the art. The structural variation approach focuses on the novel boundary-spanning connections introduced by a new article to the intellectual space. We validate the role of boundary-spanning in predicting future citations using three metrics of structural variation-namely, modularity change rate, cluster linkage, and Centrality Divergence-along with more commonly studied predictors of citations such as the number of coauthors, the number of cited references, and the number of pages. Main effects of these factors are estimated for five cases using zero-inflated negative binomial regression models of citation counts. Key findings indicate that (a) structural variations measured by cluster linkage are a better predictor of citation counts than are the more commonly studied variables such as the number of references cited, (b) the number of coauthors and the number of references are both good predictors of global citation counts to a lesser extent, and (c) the Centrality Divergence metric is potentially valuable for detecting boundary-spanning activities at interdisciplinary levels. The structural variation approach offers a new way to monitor and discern the potential of newly published papers in context. The boundary-spanning mechanism offers a conceptually simplified and unifying explanation of the roles played by commonly studied extrinsic properties of a publication in the study of citation behavior.

DOI

[24]
Chen,C. (2016). Grand challenges in measuring and characterizing scholarly impact. Frontiers in Research Metrics and Analytics, 1(4).

[25]
Chen C., Chen Y., Horowitz M., Hou H., Liu Z., & Pellegrino D. (2009). Towards an explanatory and computational theory of scientific discovery. Journal of Informetrics, 3(3), 191-209.We propose an explanatory and computational theory of transformative discoveries in science. The theory is derived from a recurring theme found in a diverse range of scientific change, scientific discovery, and knowledge diffusion theories in philosophy of science, sociology of science, social network analysis, and information science. The theory extends the concept of structural holes from social networks to a broader range of associative networks found in science studies, especially including networks that reflect underlying intellectual structures such as co-citation networks and collaboration networks. The central premise is that connecting otherwise disparate patches of knowledge is a valuable mechanism of creative thinking in general and transformative scientific discovery in particular. In addition, the premise consistently explains the value of connecting people from different disciplinary specialties. The theory not only explains the nature of transformative discoveries in terms of the brokerage mechanism but also characterizes the subsequent diffusion process as optimal information foraging in a problem space. Complementary to epidemiological models of diffusion, foraging-based conceptualizations offer a unified framework for arriving at insightful discoveries and optimizing subsequent pathways of search in a problem space. Structural and temporal properties of potentially high-impact scientific discoveries are derived from the theory to characterize the emergence and evolution of intellectual networks of a field. Two Nobel Prize winning discoveries, the discovery of Helicobacter pylori and gene targeting techniques, and a discovery in string theory demonstrated such properties. Connections to and differences from existing approaches are discussed. The primary value of the theory is that it provides not only a computational model of intellectual growth, but also concrete and constructive explanations of where one may find insightful inspirations for transformative scientific discoveries.

DOI

[26]
Chen C., Cribbin T., Macredie R., & Morar S. (2002). Visualizing and tracking the growth of competing paradigms: Two case studies. Journal of the American Society for Information Science and Technology, 53(8), 678-689.In this article we demonstrate the use of an integrative approach to visualizing and tracking the development of scientific paradigms. This approach is designed to reveal the long-term process of competing scientific paradigms. We assume that a cluster of highly cited and cocited scientific publications in a cocitation network represents the core of a predominant scientific paradigm. The growth of a paradigm is depicted and animated through the rise of citation rates and the movement of its core cluster towards the center of the cocitation network. We study two cases of competing scientific paradigms in the real world: (1) the causes of mass extinctions, and (2) the connections between mad cow disease and a new variant of a brain disease in humans CJD. Various theoretical and practical issues concerning this approach are discussed.

DOI

[27]
Chen C., Dubin R., & Kim M.C. (2014). Emerging trends and new developments in regenerative medicine: A scientometric update (2000-2014). Expert Opinion on Biological Therapy, 14(9), 1295-1317.Introduction: Our previous scientometric review of regenerative medicine provides a snapshot of the fast-growing field up to the end of 2011. The new review identifies emerging trends and new developments appearing in the literature of regenerative medicine based on relevant articles and reviews published between 2000 and the first month of 2014.Areas covered: Multiple datasets of publications relevant to regenerative medicine are constructed through topic search and citation expansion to ensure adequate coverage of the field. Networks of co-cited references representing the literature of regenerative medicine are constructed and visualized based on a combined dataset of 71,393 articles published between 2000 and 2014. Structural and temporal dynamics are identified in terms of most active topical areas and cited references. New developments are identified in terms of newly emerged clusters and research areas. Disciplinary-level patterns are visualized in dual-map overlays.Expert opinion: While research in induced pluripotent stem cells remains the most prominent area in the field of regenerative medicine, research related to clinical and therapeutic applications in regenerative medicine has experienced a considerable growth. In addition, clinical and therapeutic developments in regenerative medicine have demonstrated profound connections with the induced pluripotent stem cell research and stem cell research in general. A rapid adaptation of graphene-based nanomaterials in regenerative medicine is evident. Both basic research represented by stem cell research and application-oriented research typically found in tissue engineering are now increasingly integrated in the scientometric landscape of regenerative medicine. Tissue engineering is an interdisciplinary field in its own right. Advances in multiple disciplines such as stem cell research and graphene research have strengthened the connections between tissue engineering and regenerative medicine.

DOI

[28]
Chen C., Hu Z., Liu S., & Tseng H. (2012). Emerging trends in regenerative medicine: A scientometric analysis in CiteSpace. Expert Opinions on Biological Therapy, 12(5), 593-608.INTRODUCTION: Regenerative medicine involves research in a number of fields and disciplines such as stem cell research, tissue engineering and biological therapy in general. As research in these areas advances rapidly, it is critical to keep abreast of emerging trends and critical turns of the development of the collective knowledge. AREAS COVERED: A progressively synthesized network is derived from 35,963 original research and review articles that cite 3875 articles obtained from an initial topic search on regenerative medicine between 2000 and 2011. CiteSpace is used to facilitate the analysis of the intellectual structure and emerging trends. EXPERT OPINION: A major ongoing research trend is concerned with finding alternative reprogramming techniques as well as refining existing ones for induced pluripotent stem cells (iPSCs). A more recent emerging trend focuses on the structural and functional equivalence between iPSCs and human embryonic stem cells and potential clinical and therapeutic implications on regenerative medicine in a long run. The two trends overlap in terms of what they cite, but they are distinct and have different implications on future research. Visual analytics of the literature provides a valuable, timely, repeatable and flexible approach in addition to traditional systematic reviews so as to track the development of new emerging trends and identify critical evidence.

DOI PMID

[29]
Chen C., Hu Z., Milbank J., & Schultz T. (2013). A visual analytic study of retracted articles in scientific literature. Journal of the American Society for Information Science and Technology, 64(2), 234-253.Retracting published scientific articles is increasingly common. Retraction is a self-correction mechanism of the scientific community to maintain and safeguard the integrity of scientific literature. However, a retracted article may pose a profound and long-lasting threat to the credibility of the literature. New articles may unknowingly build their work on false claims made in retracted articles. Such dependencies on retracted articles may become implicit and indirect. Consequently, it becomes increasingly important to detect implicit and indirect threats. In this article, our aim is to raise the awareness of the potential threats of retracted articles even after their retraction and demonstrate a visual analytic study of retracted articles with reference to the rest of the literature and how their citations are influenced by their retraction. The context of highly cited retracted articles is visualized in terms of a co-citation network as well as the distribution of articles that have high-order citation dependencies on retracted articles. Survival analyses of time to retraction and postretraction citation are included. Sentences that explicitly cite retracted articles are extracted from full-text articles. Transitions of topics over time are depicted in topic-flow visualizations. We recommend that new visual analytic and science mapping tools should take retracted articles into account and facilitate tasks specifically related to the detection and monitoring of retracted articles.

DOI

[30]
Chen C., Ibekwe-SanJuan F., & Hou J. (2010). The structure and dynamics of cocitation clusters: A multiple-perspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386-1409.A multiple-perspective co-citation analysis method is introduced for characterizing and interpreting the structure and dynamics of co-citation clusters. The method facilitates analytic and sense making tasks by integrating network visualization, spectral clustering, automatic cluster labeling, and text summarization. Co-citation networks are decomposed into co-citation clusters. The interpretation of these clusters is augmented by automatic cluster labeling and summarization. The method focuses on the interrelations between a co-citation cluster's members and their citers. The generic method is applied to a three-part analysis of the field of Information Science as defined by 12 journals published between 1996 and 2008: 1) a comparative author co-citation analysis (ACA), 2) a progressive ACA of a time series of co-citation networks, and 3) a progressive document co-citation analysis (DCA). Results show that the multiple-perspective method increases the interpretability and accountability of both ACA and DCA networks.

DOI

[31]
Chen,C., & Leydesdorff,L. (2014). Patterns of connections and movements in dual-map overlays: A new method of publication portfolio analysis. Journal of the American Society for Information Science and Technology, 65(2), 334-351.Portfolio analysis of the publication profile of a unit of interest, ranging from individuals and organizations to a scientific field or interdisciplinary programs, aims to inform analysts and decision makers about the position of the unit, where it has been, and where it may go in a complex adaptive environment. A portfolio analysis may aim to identify the gap between the current position of an organization and a goal that it intends to achieve or identify competencies of multiple institutions. We introduce a new visual analytic method for analyzing, comparing, and contrasting characteristics of publication portfolios. The new method introduces a novel design of dual-map thematic overlays on global maps of science. Each publication portfolio can be added as one layer of dual-map overlays over 2 related, but distinct, global maps of science: one for citing journals and the other for cited journals. We demonstrate how the new design facilitates a portfolio analysis in terms of patterns emerging from the distributions of citation threads and the dynamics of trajectories as a function of space and time. We first demonstrate the analysis of portfolios defined on a single source article. Then we contrast publication portfolios of multiple comparable units of interest; namely, colleges in universities and corporate research organizations. We also include examples of overlays of scientific fields. We expect that our method will provide new insights to portfolio analysis.

DOI

[32]
Chen,C., & Morris,S. (2003). Visualizing evolving networks: Minimum spanning trees versus Pathfinder networks. Paper presented at the Proceedings of IEEE Symposium on Information Visualization, Seattle, Washington.

[33]
Cobo M.J., López-Herrera A.G., Herrera-Viedma E., & Herrera F. (2012). SciMAT: A new science mapping analysis software tool. Journal of the American Society for Information Science and Technology, 63(8), 1609-1630.This article presents a new open-source software tool, SciMAT, which performs science mapping analysis within a longitudinal framework. It provides different modules that help the analyst to carry out all the steps of the science mapping workflow. In addition, SciMAT presents three key features that are remarkable in respect to other science mapping software tools: (a) a powerful preprocessing module to clean the raw bibliographical data, (b) the use of bibliometric measures to study the impact of each studied element, and (c) a wizard to configure the analysis.

DOI

[34]
Deerwester S., Dumais S.T., Landauer T K., Furnas G.W., & Harshman R.A. (1990). Indexing by latent semantic analysis. Journal of the American Society for Information Science, 41(6), 391-407.

[35]
Ding Y., Chowdhury G., & Foo S. (1999). Mapping the intellectual structure of information retrieval studies: An author co-citation analysis, 1987-1997. Journal of Information Science, 25(1), 67-78.Author co-citation analysis (ACA) has been used to explore changes in the intellectual base of the information retrieval (LR) held over two consecutive time periods: 1987-1991 and 1992-1997. Thirty-nine highly cited IR researchers were selected as the research sample. Multidimensional scaling (MDS) and clustering techniques (CT) were used to create the two-dimensional maps to display the dynamic intellectual structure of IR, based on scholars citing their work over these two time periods. Factor analysis (FA) has been used to reveal the 'breadth' of the authors' research areas. ACA offers a good technique that contributes to the understanding of intellectual structure in the sciences and possibly in other areas to the extent that those areas rely on formal scholarly communication such as serial publications. Nonetheless, obvious drawbacks exist in ACA. These include the subjective nature of the interpretation of results, the difficulty of readily identifying clusters and the inability to distinguish collaborative research relationships between authors, Thus, ACA by itself is insufficient. However, ACA can be enhanced significantly when combined with FA to give a more accurate acid useful picture of the MDS results.

DOI

[36]
Egghe,L. (2006). Theory and practice of the g-index. Scientometrics, 69(1), 131-152.

DOI

[37]
Fekete,J. (2004). The InfoVis toolkit. Paper presented at the IEEE Symposium on Information Visualization, Austin, Texas.

[38]
Fuchs,S. (1993). A sociological theory of scientific change. Social Forces, 71(4), 933-953.In current science studies, there are only few systematic efforts at explaining how different scientific specialties change over time. Such specialties are viewed here as organizations in which workers deal with various degrees of task uncertainty and mutual dependence. The sociological theory of change suggests that scientific change is generally triggered by competition, but that various types of change depend on the social organization and status of scientific groups. Some fields change through permanent discoveries, some through specialization and cumulation, yet others change through cognitive fragmentation. This argument can synthesize the various independent branches of contemporary science studies. The proposed theory has wider significance for some core problems in sociology, such as the relationship between the natural and social sciences, the prospects for a science of society, and the possibility of cumulative progress.

DOI

[39]
Garfield,E. (1955). Citation indexes for science: A new dimension in documentation through association of ideas. Science, 122(3159), 108-111.Int J Epidemiol. 2006 Oct;35(5):1123-7; discussion 1127-8. Epub 2006 Sep 19. Classical Article; Historical Article

DOI PMID

[40]
Henry N., Fekete J.D., & McGuffin M.J. (2007). NodeTrix: A hybrid visualization of social networks. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1302-1309.The need to visualize large social networks is growing as hardware capabilities make analyzing large networks feasible and many new data sets become available. Unfortunately, the visualizations in existing systems do not satisfactorily resolve the basic dilemma of being readable both for the global structure of the network and also for detailed analysis of local communities. To address this problem, we present NodeTrix, a hybrid representation for networks that combines the advantages of two traditional representations: node-link diagrams are used to show the global structure of a network, while arbitrary portions of the network can be shown as adjacency matrices to better support the analysis of communities. A key contribution is a set of interaction techniques. These allow analysts to create a NodeTrix visualization by dragging selections to and from node-link and matrix forms, and to flexibly manipulate the NodeTrix representation to explore the dataset and create meaningful summary visualizations of their findings. Finally, we present a case study applying NodeTrix to the analysis of the InfoVis 2004 coauthorship dataset to illustrate the capabilities of NodeTrix as both an exploration tool and an effective means of communicating results.

DOI PMID

[41]
Heradio R., Perez-Moragoa H., Fernandez-Amorosa D., Cabrerizoa F.J., & Herrera-Viedmab E. (2016). A bibliometric analysis of 20 years of research on software product lines. Information and Software Technology, 72, 1-15.Conclusion : Science mapping has been used to identify the main researched topics, the evolution of the interest in those topics and the relationships among topics. Performance analysis has been used to recognize the most influential papers, the journals and conferences that have published most papers, how numerous is the literature on product lines and what is its distribution over time.

DOI

[42]
Herman I., Melançon G., & Marshall M.S. (2000). Graph visualization and navigation in information visualization: A survey. IEEE Transactions on Visualization and Computer Graphics, 6(1), 24-44.

[43]
Hicks D., Wouters P., Waltman L., Rijcke S.D., & Rafols I. (2015). Bibliometrics: The Leiden Manifesto for research metrics. Nature, 520(7548), 429-431.Use these ten principles to guide research evaluation, urge Diana Hicks, Paul Wouters and colleagues.

DOI PMID

[44]
Hirsch,J.E. (2005). An index to quantify an individual’s scientific research output. Proceedings of the National Academy of Sciences of the United States of America, 102(46), 16569-16572.I propose the index $h$, defined as the number of papers with citation number higher or equal to $h$, as a useful index to characterize the scientific output of a researcher.

DOI PMID

[45]
Hjørland,B. (1997).Information seeking and subject representation:An activity-theoretical approach to information science .Westport , CT: Greenwood Press.

[46]
Hjørland,B. (2002). Epistemology and the socio-cognitive perspective in information science. Journal of the American Society for Information Science and Technology, 53(4), 257-270.This article presents a socio-cognitive perspective in relation to information science (IS) and information retrieval (IR). The differences between traditional cognitive views and the socio-cognitive or domain-analytic view are outlined. It is claimed that, given elementary skills in computer-based retrieval, people are basically interacting with representations of subject literatures in IR. The kind of knowledge needed to interact with representations of subject literatures is discussed. It is shown how different approaches or paradigms in the represented literature imply different information needs and relevance criteria (which users typically cannot express very well, which is why IS cannot primarily rely on user studies). These principles are exemplified by comparing behaviorism, cognitivism, psychoanalysis, and neuroscience as approaches in psychology. The relevance criteria implicit in each position are outlined, and empirical data are provided to prove the theoretical claims. It is further shown that the most general level of relevance criteria is implied by epistemological theories. The article concludes that the fundamental problems of IS and IR are based in epistemology, which therefore becomes the most important allied field for IS.

DOI

[47]
Johnson,B., & Shneiderman,B.(1991, October 1991). Tree-maps: A space filling approach to the visualization of hierarchical information structures. Paper presented at the IEEE Visualization 91.

[48]
Keim D., Mansmann F., Schneidewind J., Thomas J., & Ziegler H. (2008). Visual analytics: Scope and challenges. in S.J. Simoff, M.H. Böhlen, & A. Mazeika (Eds.), Visual Data Mining (pp. 76-90). Berlin: springer-Verlag.In today’s applications data is produced at unprecedented rates. While the capacity to collect and store new data rapidly grows, the ability to analyze these data volumes increases at much lower rates

DOI

[49]
Kim M.C., Zhu Y., & Chen C. (2016). How are they different? A quantitative domain comparison of information visualization and data visualization (2000-2014). Scientometrics, 107(1), 123.

[50]
Kleinberg,J. (2002. Bursty and hierarchical structure in streams. Paper presented at the Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada. Retrieved on February 19, 2017, from .

[51]
Kuhn T.S.(1962). The structure of scientific revolutions. Chicago: University of Chicago Press.

[52]
Lam H., Bertini E., Isenberg P., Plaisant C., & Carpendale S. (2012). Empirical studies in information visualization: Seven scenarios. IEEE Transactions on Visualization and Computer Graphics, 18(9), 1520-1536.We take a new, scenario-based look at evaluation in information visualization. Our seven scenarios, evaluating visual data analysis and reasoning, evaluating user performance, evaluating user experience, evaluating environments and work practices, evaluating communication through visualization, evaluating visualization algorithms, and evaluating collaborative data analysis were derived through an extensive literature review of over 800 visualization publications. These scenarios distinguish different study goals and types of research questions and are illustrated through example studies. Through this broad survey and the distillation of these scenarios, we make two contributions. One, we encapsulate the current practices in the information visualization research community and, two, we provide a different approach to reaching decisions about what might be the most effective evaluation of a given information visualization. Scenarios can be used to choose appropriate research questions and goals and the provided examples can be consulted for guidance on how to design one's own study.

DOI

[53]
Leydesdorff,L., & Rafols,I. (2009). A global map of science based on the ISI subject categories. Journal of the American Society for Information Science and Technology, 60(2), 348-362.

[54]
Li Y., Radicchi F., Castellano C., & Ruiz-Castillo J. (2013). Quantitative evaluation of alternative field normalization procedures. Journal of Informetrics, 7(3), 746-755.ide differences in publication and citation practices makes impossible the direct comparison of raw citation counts across scientific disciplines. Recent research has studied new and traditional normalization procedures aimed at suppressing as much as possible these disproportions in citation numbers among scientific domains. Using the recently introduced IDCP (Inequality due to Differences in Citation Practices) method, this paper rigorously tests the performance of six cited-side normalization procedures based on the Thomson Reuters classification system consisting of 172 subfields. We use six yearly datasets from 1980 to 2004, with widely varying citation windows from the publication year to May 2011. The main findings are the following three. Firstly, as observed in previous research, within each year the shapes of sub-field citation distributions are strikingly similar. This paves the way for several normalization procedures to perform reasonably well in reducing the effect on citation inequality of differences in citation practices. Secondly, independently of the year of publication and the length of the citation window, the effect of such differences represents about 13% of total citation inequality. Thirdly, a recently introduced two-parameter normalization scheme outperforms the other normalization procedures over the entire period, reducing citation disproportions to a level very close to the minimum achievable given the data and the classification system. However, the traditional procedure of using sub-field mean citations as normalization factors yields also good results.

DOI

[55]
Meho,L.I., & Yang,K. (2007). Impact of data sources on citation counts and rankings of LIS faculty: Web of science versus scopus and google scholar. Journal of the American Society for Information Science and Technology, 58(13), 2105-2125.The Institute for Scientific Information's (ISI) citation databases have been used for decades as a starting point and often as the only tools for locating citations and/or conducting citation analyses. ISI databases (or Web of Science [WoS]), however, may no longer be sufficient because new databases and tools that allow citation searching are now available. Using citations to the work of 25 library and information science faculty members as a case study, this paper examines the effects of using Scopus and Google Scholar (GS) on the citation counts and rankings of scholars as measured by WoS. Overall, more than 10,000 citing and purportedly citing documents were examined. Results show that Scopus significantly alters the relative ranking of those scholars that appear in the middle of the rankings and that GS stands out in its coverage of conference proceedings as well as international, non-English language journals. The use of Scopus and GS, in addition to WoS, helps reveal a more accurate and comprehensive picture of the scholarly impact of authors. WoS data took about 100 hours of collecting and processing time, Scopus consumed 200 hours, and GS a grueling 3,000 hours.

DOI

[56]
Milojević S., Sugimoto C.R., Yan E., & Ding Y. (2011). The cognitive structure of Library and Information Science: Analysis of article title words. Journal of the American Society for Information Science and Technology, 62(10), 1933-1953.This study comprises a suite of analyses of words in article titles in order to reveal the cognitive structure of Library and Information Science (LIS). The use of title words to elucidate the cognitive structure of LIS has been relatively neglected. The present study addresses this gap by performing (a) co-word analysis and hierarchical clustering, (b) multidimensional scaling, and (c) determination of trends in usage of terms. The study is based on 10,344 articles published between 1988 and 2007 in 16 LIS journals. Methodologically, novel aspects of this study are: (a) its large scale, (b) removal of non-specific title words based on the "word concentration" measure (c) identification of the most frequent terms that include both single words and phrases, and (d) presentation of the relative frequencies of terms using "heatmaps". Conceptually, our analysis reveals that LIS consists of three main branches: the traditionally recognized library-related and information-related branches, plus an equally distinct bibliometrics/scientometrics branch. The three branches focus on: libraries, information, and science, respectively. In addition, our study identifies substructures within each branch. We also tentatively identify "information seeking behavior" as a branch that is establishing itself separate from the three main branches. Furthermore, we find that cognitive concepts in LIS evolve continuously, with no stasis since 1992. The most rapid development occurred between 1998 and 2001, influenced by the increased focus on the Internet. The change in the cognitive landscape is found to be driven by the emergence of new information technologies, and the retirement of old ones.

DOI

[57]
Mingers,J., & Leydesdorff,L. (2015). A review of theory and practice in scientometrics. European Journal of Operational Research, 246(1), 1-19.Scientometrics is the study of the quantitative aspects of the process of science as a communication system. It is centrally, but not only, concerned with the analysis of citations in the academic literature. In recent years it has come to play a major role in the measurement and evaluation of research performance. In this review we consider: the historical development of scientometrics, sources of citation data, citation metrics and the "laws" of scientometrics, normalisation, journal impact factors and other journal metrics, visualising and mapping science, evaluation and policy, and future developments.

DOI

[58]
Morris S.A., Yen G., Wu Z., & Asnake B. (2003). Timeline visualization of research fronts. Journal of the American Society for Information Science and Technology, 55(5), 413-422.Research fronts, defined as clusters of documents that tend to cite a fixed, time invariant set of base documents, are plotted as time lines for visualization and exploration. Using a set of documents related to the subject of anthrax research, this article illustrates the construction, exploration, and interpretation of time lines for the purpose of identifying and visualizing temporal changes in research activity through journal articles. Such information is useful for presentation to members of expert panels used for technology forecasting.

DOI

[59]
Mulkay M.J., Gilbert G.N., & Woolgar S. (1975). Problem areas and research networks in science. Sociology, 9(2), 187-203.Problem areas and research networks in science

DOI

[60]
Munzner,T. (2014). Visualization analysis and design. Natick, MA: A K Peters/CRC Press.

[61]
Nerur,S.P. (2008). The intellectual structure of the strategic management field: An author co-citation analysis. Strategic Management Journal, 29(3), 319-336.This paper complements a recent study by Ramos-Rodriguez and Ruiz-Navarro (2004) that investigated the intellectual structure of the strategic management field through co-citation analysis. By using authors as the units of analysis and incorporating all the citations that are included in the Science Citation Index and the Social Science Citation Index, we trace the evolution of the intellectual structure of the strategic management field during the period 1980~2000. Using a variety of data analytic techniques such as multidimensional scaling, factor analysis, and Pathfinder analysis, we (1) delineate the subfields that constitute the intellectual structure of strategic management; (2) determine the relationships between the subfields; (3) identify authors who play a pivotal role in bridging two or more conceptual domains of research; and (4) graphically map the intellectual structure in two-dimensional space in order to visualize spatial distances between intellectual themes. The analysis provides insights about the influence of individual authors as well as changes in their influence over time.

DOI

[62]
Quirin A., Cordón O., Guerrero-Bote V.P., Vargas-Quesada B., & Moya-Anegón F. (2008). A quick MST-based algorithm to obtain pathfinder networks (∞, n - 1). Journal of the American Society for Information Science and Technology, 59(2), 1912-1924.Network scaling algorithms such as the Pathfinder algo- rithm are used to prune many different kinds of networks, including citation networks, random networks, and social networks. However, this algorithm suffers from run time problems for large networks and online processing due to its O(n4) time complexity. In this article, we introduce a new alternative, the MST-Pathfinder algorithm, which will allow us to prune the original network to get its PFNET(∞, n 61 1) in just O(n2· log n) time.The underlying idea comes from the fact that the union (superposition) of all the Min- imum Spanning Trees extracted from a given network is equivalent to the PFNET resulting from the Pathfinder algorithm parameterized by a specific set of values (r =∞ and q = n 61 1), those usually considered in many differ- ent applications. Although this property is well-known in the literature, it seems that no algorithm based on it has been proposed, up to now, to decrease the high compu- tational cost of the original Pathfinder algorithm. We also present a mathematical proof of the correctness of this new alternative and test its good efficiency in two differ- ent case studies: one dedicated to the post-processing of large random graphs, and the other one to a real world case in which medium networks obtained by a cocitation analysis of the scientific domains in different countries are pruned.

DOI

[63]
Radicchi,F., & Castellano,C. (2013). Analysis of bibliometric indicators for individual scholars in a large data set. Scientometrics, 97(3), 627-637.Citation numbers and other quantities derived from bibliographic databases are becoming standard tools for the assessment of productivity and impact of research activities. Though widely used, still their statistical properties have not been well established so far. This is especially true in the case of bibliometric indicators aimed at the evaluation of individual scholars, because large-scale data sets are typically difficult to be retrieved. Here, we take advantage of a recently introduced large bibliographic data set, Google Scholar Citations, which collects the entire publication record of individual scholars. We analyze the scientific profile of more than 30,000 researchers, and study the relation between the h-index, the number of publications and the number of citations of individual scientists. While the number of publications of a scientist has a rather weak relation with his/her h-index, we find that the h-index of a scientist is strongly correlated with the number of citations that she/he has received so that the number of citations can be effectively be used as a proxy of the h-index. Allowing for the h-index to depend on both the number of citations and the number of publications, we find only a minor improvement.

DOI

[64]
Radicchi F., Fortunato S., & Castellano C. (2008). Universality of citation distributions: Toward an objective measure of scientific impact. Proceedings of the National Academy of Sciences of the United States of America, 105(45), 17268-17272.We study the distributions of citations received by a single publication within several disciplines, spanning broad areas of science. We show that the probability that an article is cited $c$ times has large variations between different disciplines, but all distributions are rescaled on a universal curve when the relative indicator $c_f=c/c_0$ is considered, where $c_0$ is the average number of citations per article for the discipline. In addition we show that the same universal behavior occurs when citation distributions of articles published in the same field, but in different years, are compared. These findings provide a strong validation of $c_f$ as an unbiased indicator for citation performance across disciplines and years. Based on this indicator, we introduce a generalization of the h-index suitable for comparing scientists working in different fields.

DOI PMID

[65]
Ramos-rodriguez,A.R. (2004). Changes in the intellectual structure of strategic management research: A bibliometric study of the Strategic Management Journal, 1980-2000. Strategic Management Journal, 25(10), 981-1004.The aim of this paper is to identify the works that have had the greatest impact on strategic management research and to analyze the changes that have taken place in the intellectual structure of this discipline. The methodology is based on the bibliometric techniques of citation and co-citation analysis which are applied to all the articles published in the Strategic Management Journal from its first issue in 1980 through 2000.

DOI

[66]
Rorissa,A., & Yuan,X. (2012). Visualizing and mapping the intellectual structure of information retrieval. Information Processing & Management, 48(1), 120-135.Information retrieval is a long established subfield of library and information science. Since its inception in the early- to mid -1950s, it has grown as a result, in part, of well-regarded retrieval system evaluation exercises/campaigns, the proliferation of Web search engines, and the expansion of digital libraries. Although researchers have examined the intellectual structure and nature of the general field of library and information science, the same cannot be said about the subfield of information retrieval. We address that in this work by sketching the information retrieval intellectual landscape through visualizations of citation behaviors. Citation data for 10 years (2000-2009) were retrieved from the Web of Science and analyzed using existing visualization techniques. Our results address information retrieval's co-authorship network, highly productive authors, highly cited journals and papers, author-assigned keywords, active institutions, and the import of ideas from other disciplines. (C) 2011 Elsevier Ltd. All rights reserved.

DOI

[67]
Shannon P., Markiel A., Ozier O., Baliga N.S., Wang J.T., Ramage D., .. Ideker T. (2003). Cytoscape: A software environment for integrated models of biomolecular interaction networks. Genome Research, 13(11), 2498-2504.

[68]
Shneider,A.M. (2009). Four stages of a scientific discipline: Four types of scientists. Trends in Biochemical Sciences, 34(5), 217-223.In this article I propose the classification of the evolutionary stages that a scientific discipline evolves through and the type of scientists that are the most productive at each stage. I believe that each scientific discipline evolves sequentially through four stages. Scientists at stage one introduce new objects and phenomena as subject matter for a new scientific discipline. To do this they have to introduce a new language adequately describing the subject matter. At stage two, scientists develop a toolbox of methods and techniques for the new discipline. Owing to this advancement in methodology, the spectrum of objects and phenomena that fall into the realm of the new science are further understood at this stage. Most of the specific knowledge is generated at the third stage, at which the highest number of original research publications is generated. The majority of third-stage investigation is based on the initial application of new research methods to objects and/or phenomena. The purpose of the fourth stage is to maintain and pass on scientific knowledge generated during the first three stages. Groundbreaking new discoveries are not made at this stage. However, new ways to present scientific information are generated, and crucial revisions are often made of the role of the discipline within the constantly evolving scientific environment. The very nature of each stage determines the optimal psychological type and modus operandi of the scientist operating within it. Thus, it is not only the talent and devotion of scientists that determines whether they are capable of contributing substantially but, rather, whether they have the ‘right type’ of talent for the chosen scientific discipline at that time. Understanding the four different evolutionary stages of a scientific discipline might be instrumental for many scientists in optimizing their career path, in addition to being useful in assembling scientific teams, precluding conflicts and maximizing productivity. The proposed model of scientific evolution might also be instrumental for society in organizing and managing the scientific process. No public policy aimed at stimulating the scientific process can be equally beneficial for all four stages. Attempts to apply the same criteria to scientists working on scientific disciplines at different stages of their scientific evolution would be stimulating for one and detrimental for another. In addition, researchers operating at a certain stage of scientific evolution might not possess the mindset adequate to evaluate and stimulate a discipline that is at a different evolutionary stage. This could be the reason for suboptimal implementation of otherwise well-conceived scientific policies.

DOI

[69]
Skupin,A. (2014). Making a mark: A computational and visual analysis of one researcher’s intellectual domain. International Journal of Geographical Information Science, 28(6), 1209-1232.Dr. David Mark is widely regarded as a path-breaking researcher in geographic information science. What are the structural and temporal characteristics of his intellectual contributions, as seen through the eyes of the broader academic community? Aiming to answer that question, this article presents a scientometric analysis of publications that have been cited alongside David Mark papers. In deliberate contrast to the widespread focus on using citation data to condense scientific impact into a handful of indicators, the methodological contribution of this study lies in its mix of computational and visualization approaches. In the search for latent domain structures, state-of-the-art practices in information science, bibliometrics, and network visualization are combined and extended. An initial network of 50,000+ publications and 4,000,000+ document co-citations undergoes a series of transformations reducing it to 9000 publications that are then clustered in a two-stage process, leading to 678 communities whose co-citation linkages are used to delineate 19 super-communities. To enable replication of this approach for other studies, much focus in this article is on detailed discussion of that workflow as well as on highlighting the reasoning behind the choices made among data sources and analytical methods. The topical evolution of David Mark domain of influence is explored in some detail, based on tabular and graphic representations of extracted community structures. Results confirm not only the enormous overall breadth of his influence but also how lasting and recurrent it has been in some areas.

DOI

[70]
Small,H. (1973). Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 24(4), 265-269.A new form of document coupling, co-citation, is defined as the frequency with which two documents are cited together. Clusters of co-cited papers provide a new way to study the specialty structure of science. They may provide a new approach to indexing and to the creation of SDI profiles. (12 references) (Author/SJ)

DOI

[71]
Small H. (1999). Visualizing science by citation mapping. Journal of the American Society for Information Science and Technology, 50(9), 799-813.Science mapping is discussed in the general context of information visualization. Attempts to construct maps of science using citation data are reviewed, focusing on the use of co-citation clusters. New work is reported on a dataset of about 36,000 documents using simplified methods for ordination, and nesting maps hierarchically. An overall map of the dataset shows the multidisciplinary breadth of the document sample, and submaps allow drilling down to the document level. An effort to visualize these data using advanced virtual reality software is described, and the creation of document pathways through the map is seen as a realization of Bush's (1945) associative trails.

DOI

[72]
Stasko J., Gorg C., & Liu Z. (2008). Jigsaw: Supporting investigative analysis through interactive visualization. Information Visualization, 7(2), 118-132.Investigative analysts who work with collections of text documents connect embedded threads of evidence in order to formulate hypotheses about plans and activities of potential interest. As the number of documents and the corresponding number of concepts and entities within the documents grow larger, sense-making processes become more and more difficult for the analysts. We have developed a visual analytic system called Jigsaw that represents documents and their entities visually in order to help analysts examine reports more efficiently and develop theories about potential actions more quickly. Jigsaw provides multiple coordinated views of document entities with a special emphasis on visually illustrating connections between entities across the different documents.

DOI

[73]
Tabah,A.N. (1999). Literature dynamics: Studies on growth, diffusion, and epidemics. Annual Review of Information Science and Technology, 34(1), 249-286.This review provides a summary of the main theoretical arguments and empirical results available in literature dynamics. Discusses studies of growth, diffusion of information, epidemic theory, and fast-growing literatures. Deals with methodological problems of publication counts, the time factor, and converging indicators, and discusses two important considerations for literature dynamics, paradigm shifts, and development of scientific specialties. (Contains 148 references.) (AEF)

DOI

[74]
Thelwall M., Haustein S., Larivière V., & Sugimoto C.R. (2013). Do altmetrics work? Twitter and ten other social web services. PLoS ONE, 8(5), e64841.Altmetric measurements derived from the social web are increasingly advocated and used as early indicators of article impact and usefulness. Nevertheless, there is a lack of systematic scientific evidence that altmetrics are valid proxies of either impact or utility although a few case studies have reported medium correlations between specific altmetrics and citation rates for individual journals or fields. To fill this gap, this study compares 11 altmetrics with Web of Science citations for 76 to 208,739 PubMed articles with at least one altmetric mention in each case and up to 1,891 journals per metric. It also introduces a simple sign test to overcome biases caused by different citation and usage windows. Statistically significant associations were found between higher metric scores and higher citations for articles with positive altmetric scores in all cases with sufficient evidence (Twitter, Facebook wall posts, research highlights, blogs, mainstream media and forums) except perhaps for Google+ posts. Evidence was insufficient for LinkedIn, Pinterest, question and answer sites, and Reddit, and no conclusions should be drawn about articles with zero altmetric scores or the strength of any correlation between altmetrics and citations. Nevertheless, comparisons between citations and metric values for articles published at different times, even within the same year, can remove or reverse this association and so publishers and scientometricians should consider the effect of time when using altmetrics to rank articles. Finally, the coverage of all the altmetrics except for Twitter seems to be low and so it is not clear if they are prevalent enough to be useful in practice.

DOI PMID

[75]
Thomas J.J.,& Cook, K.A. (2005).Illuminating the path:The research and development agenda for visual analytics Los Alamitos,CA: IEEE Computer Society Press The research and development agenda for visual analytics. Los Alamitos,CA: IEEE Computer Society Press.

[76]
Tibély G., Pollner P., Vicsek T., & Palla G. (2013). Extracting tag-hierarchies. PLoS ONE, 8(12), e84133.

[77]
van Eck, N.J., & Waltman,L. (2010). Software survey: VOSviewer, a computer program for bibliometric mapping. Scientometrics, 84(2), 523-538.We present VOSviewer, a freely available computer program that we have developed for constructing and viewing bibliometric maps. Unlike most computer programs that are used for bibliometric mapping, VOSviewer pays special attention to the graphical representation of bibliometric maps. The functionality of VOSviewer is especially useful for displaying large bibliometric maps in an easy-to-interpret way. The paper consists of three parts. In the first part, an overview of VOSviewer's functionality for displaying bibliometric maps is provided. In the second part, the technical implementation of specific parts of the program is discussed. Finally, in the third part, VOSviewer's ability to handle large maps is demonstrated by using the program to construct and display a co-citation map of 5,000 major scientific journals.

DOI PMID

[78]
Van Raan, A.F J. (2003). Sleeping beauties in science. Scientometrics, 59(3), 461-466.

[79]
Viégas F.B., Wattenberg M., Ham F.v., Kriss J., & McKeon M. (2007). Many eyes: A site for visualization at Internet scale. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1121-1128.We describe the design and deployment of Many Eyes, a public Web site where users may upload data, create interactive visualizations, and carry on discussions. The goal of the site is to support collaboration around visualizations at a large scale by fostering a social style of data analysis in which visualizations not only serve as a discovery tool for individuals but also as a medium to spur discussion among users. To support this goal, the site includes novel mechanisms for end-user creation of visualizations and asynchronous collaboration around those visualizations. In addition to describing these technologies, we provide a preliminary report on the activity of our users.

DOI PMID

[80]
Vogel R., &Güttel W.H. (2013). The dynamic capability view in strategic management: A bibliometric review. International Journal of Management Reviews, 15(4), 426-446.The dynamic capability view (DCV) is one of the most vibrant approaches to strategic management. In this study, the extant literature published between 1994 and 2011 is analysed, using bibliometric methods in order to explore the scope of this approach and detect current research priorities. For this purpose, the method of bibliographic cou-pling is introduced in management research, which shifts the focus of analysis from past traditions to current trends. Several clusters of thematically related research are extracted from bibliographic networks, which represent interconnected yet distinct subfields of inquiry within the DCV. The core cluster of the current DCV, which visualizes this research field's nascent but fragile identity, focuses on learning and change capabilities and relates them to firm performance, thus merging aspects of organization theory and strategic management. In addition, several peripheral clusters of research are identified, which reflect a parallel process of differentiation in the overall field. Both trends, i.e. of integration and differentiation, attest to the emanci-pation of the DCV as a distinct approach to strategic management. However, the DCV still lacks consensual concepts that allow comparisons of empirical studies and advance the theoretical understanding of dynamic capabilities. In the light of the above, some implications of this analysis for further research are discussed.

DOI

[81]
Waltman L. (2016). A review of the literature on citation impact indicators. Journal of Informetrics, 10(2), 365-391.The number of publications is the first criteria for assessing a researcher output. However, the main measurement for author productivity is the number of citations, and citations are typically related to the paper's visibility. In this paper, the relationship between article visibility and the number of citations is investigated. A case study of two researchers who are using publication marketing tools confirmed that the article visibility will greatly improve the citation impact. Some strategies to make the publications available to a larger audience have been presented at the end of this paper.

DOI

[82]
White H.D. (2003). Pathfinder networks and author cocitation analysis: A remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology, 54(5), 423-434.In their 1998 article "Visualizing a discipline: An author cocitation analysis of information science, 1972-1995," White and McCain used multidimensional scaling, hierarchical clustering, and factor analysis to display the specialty groupings of 120 highly-cited ("paradigmatic") information scientists. These statistical techniques are traditional in author cocitation analysis (ACA). It is shown here that a newer technique, Pathfinder Networks (PFNETs), has considerable advantages for ACA. In PFNETs, nodes represent authors, and explicit links represent weighted paths between nodes, the weights in this case being cocitation counts. The links can be drawn to exclude all but the single highest counts for author pairs, which reduces a network of authors to only the most salient relationships. When these are mapped, dominant authors can be defined as those with relatively many links to other authors (i.e., high degree centrality). Links between authors and dominant authors define specialties, and links between dominant authors connect specialties into a discipline. Maps are made with one rather than several computer routines and in one rather than many computer passes. Also, PFNETs can, and should, be generated from matrices of raw counts rather than Pearson correlations, which removes a computational step associated with traditional ACA. White and McCain's raw data from 1998 are remapped as a PFNET. It is shown that the specialty groupings correspond closely to those seen in the factor analysis of the 1998 article. Because PFNETs are fast to compute, they are used in AuthorLink, a new Web-based system that creates live interfaces for cocited author retrieval on the fly.

DOI

[83]
White H.D., &McCain K.W. (1997). Visualization of literatures. Annual Review of Information Science and Technology, 32, 99-168.

[84]
White H.D., &McCain K.W.(1998). Visualizing a discipline: An author co-citation analysis of information science, 1972-1995. Journal of the American Society for Information Science, 49(4), 327-356. Presents an extensive domain analysis of information science in terms of its authors. Names of those most frequently cited in 12 key journals from 1972 through 1995 were retrieved from Social Scisearch via DIALOG. The top 120 were submitted to author co-citation analyses, yielding automatic classifications relevant to histories of the field. Fourteen tables and figures show results. (AEF)

DOI

[85]
White H.D., &McCain K.W.(1998). Visualizing a discipline: An author co-citation analysis of information science, 1972-1995. Journal of the American Society for Information Science and Technology, 49(4), 327-355. Presents an extensive domain analysis of information science in terms of its authors. Names of those most frequently cited in 12 key journals from 1972 through 1995 were retrieved from Social Scisearch via DIALOG. The top 120 were submitted to author co-citation analyses, yielding automatic classifications relevant to histories of the field. Fourteen tables and figures show results. (AEF)

DOI

[86]
Yan,E. (2014). Research dynamics: Measuring the continuity and popularity of research topics. Journal of Informetrics, 8(1), 98-110.Dynamic development is an intrinsic characteristic of research topics. To study this, this paper proposes two sets of topic attributes to examine topic dynamic characteristics: topic continuity and topic popularity. Topic continuity comprises six attributes: steady, concentrating, diluting, sporadic, transforming, and emerging topics; topic popularity comprises three attributes: rising, declining, and fluctuating topics. These attributes are applied to a data set on library and information science publications during the past 11 years (2001–2011). Results show that topics on “web information retrieval”, “citation and bibliometrics”, “system and technology”, and “health science” have the highest average popularity; topics on “ h -index”, “online communities”, “data preservation”, “social media”, and “web analysis” are increasingly becoming popular in library and information science.

DOI

[87]
Yi J.S., Kang Y.A., Stasko T.J., & Jacko A.J. (2007). Towards a deeper understanding of the role of interaction in information visualization. IEEE Transactions on Visualization and Computer Graphics, 13(6), 1224-1231.

PMID

[88]
Zhao,D., & Strotmann,A. (2014). The knowledge base and research front of information science 2006-2010: An author cocitation and bibliographic coupling analysis. Journal of the Association for Information Science and Technology, 65(5), 995-1006.This study continues a long history of author cocitation analysis (and more recently, author bibliographic coupling analysis) of the intellectual structure of information science (IS) into the time period 2006 to 2010 (IS 2006–2010). We find that web technologies continue to drive developments, especially at the research front, although perhaps more indirectly than before. A broadening of perspectives is visible in IS 2006–2010, where network science becomes influential and where full-text analysis methods complement traditional computer science influences. Research in the areas of the h-index and mapping of science appears to have been highlights of IS 2006–2011. This study tests and confirms a forecast made previously by comparing knowledge-base and research-front findings for IS 2001–2005, which expected both the information retrieval (IR) systems and webometrics specialties to shrink in 2006 to 2010. A corresponding comparison of the knowledge base and research front of IS 2006–2010 suggests a continuing decline of the IR systems specialty in the near future, but also a considerable (re)growth of the webometrics area after a period of decline from 2001 to 2005 and 2006 to 2010, with the latter due perhaps in part to its contribution to an emerging web science.

DOI

[89]
Zhu Q., Kong X., Hong S., Li J., & He Z. (2015). Global ontology research progress: a bibliometric analysis. Aslib Journal of Information Management, 67(1), 27-54.Purpose – The purpose of this paper is to analyse the global scientific outputs of ontology research, an important emerging discipline that has huge potential to improve information understanding, organization, and management. Design/methodology/approach – This study collected literature published during 1900-2012 from the Web of Science database. The bibliometric analysis was performed from authorial, institutional, national, spatiotemporal, and topical aspects. Basic statistical analysis, visualization of geographic distribution, co-word analysis, and a new index were applied to the selected data. Findings – Characteristics of publication outputs suggested that ontology research has entered into the soaring stage, along with increased participation and collaboration. The authors identified the leading authors, institutions, nations, and articles in ontology research. Authors were more from North America, Europe, and East Asia. The USA took the lead, while China grew fastest. Four major categories of frequently used keywords were identified: applications in Semantic Web, applications in bioinformatics, philosophy theories, and common supporting technology. Semantic Web research played a core role, and gene ontology study was well-developed. The study focus of ontology has shifted from philosophy to information science. Originality/value – This is the first study to quantify global research patterns and trends in ontology, which might provide a potential guide for the future research. The new index provides an alternative way to evaluate the multidisciplinary influence of researchers.

DOI

[90]
Zupic,I. (2015). Bibliometric methods in management and organization. Organizational Research Methods, 18(3), 429-472.We aim to develop a meaningful single source reference for (management) scholars interested in bibliometric methods. Bibliometric methods introduce a measure of objectivity into evaluation of scientific literature and have the potential to increase rigor and mitigate researcher bias in reviews of scientific literature. We reviewed 86 studies that used bibliometric methods in management and organization science published between 1993 and 2012. We found that co-citation analysis is the most used method for exploration of knowledge base of scientific fields. Bibliographic coupling is neglected method for examining research fronts. Co-word and co-author analysis are used only sporadically. Social network analysis tools are gradually replacing multidimensional scaling for visualization of research fields. We describe some newer methods (tri-citation, latent semantic analysis, webometrics) which could become prominent in the future and present a workflow for conducting bibliometric studies.

DOI

Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn