With the development of natural language processing and information extraction techniques, text mining research is attracting more and more attention.
Swan and Jensen (2000) presented the TimeMines system, which can detect, rank, and group terms or keywords from data-tagged free text corpora based on their statistical properties.
Havre et al. (2002) developed the ThemeRiver visualization system to depict thematic variations over time within a large collection of documents. These systems, however, are neither open source nor freely available to other researchers. In recent years, many methods for topic evolution research based on probabilistic topic models have been published, and such models include, in particular, the seminal topic models of PLSA (probabilistic latent semantic analysis) and LDA (latent dirichlet allocation).
Zhou, Yu, and Hu (2017) reviewed notable research on topic evolution based on probabilistic topic models from multiple aspects over the past decade. A certain level of computer programming skills is required, however, to establish and improve probabilistic topic models. It is thus necessary to standardize the synonyms and abbreviations for identical terms or to stop using certain terms to improve analysis results and to make them more accurate, but this is difficult to automate.