Research Papers

Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Research Literature

  • Jessica Cox , ,
  • Darin McBeath ,
  • Corey Harper ,
  • Ron Daniel
Expand
  • Elsevier Labs, 230 Park Avenue, New York, NY 10169, USA
†Jessica Cox (E-mail: ).

Received date: 2020-01-31

  Accepted date: 2020-04-23

  Online published: 2020-09-04

Copyright

Copyright reserved © 2020

Abstract

Purpose: The use of in vitro cell culture and experimentation is a cornerstone of biomedical research, however, more attention has recently been given to the potential consequences of using such artificial basal medias and undefined supplements. As a first step towards better understanding and measuring the impact these systems have on experimental results, we use text mining to capture typical research practices and trends around cell culture.

Design/methodology/approach: To measure the scale of in vitro cell culture use, we have analyzed a corpus of 94,695 research articles that appear in biomedical research journals published in ScienceDirect from 2000-2018. Central to our investigation is the observation that studies using cell culture describe conditions using the typical sentence structure of cell line, basal media, and supplemented compounds. Here we tag our corpus with a curated list of basal medias and the Cellosaurus ontology using the Aho-Corasick algorithm. We also processed the corpus with Stanford CoreNLP to find nouns that follow the basal media, in an attempt to identify supplements used.

Findings: Interestingly, we find that researchers frequently use DMEM even if a cell line’s vendor recommends less concentrated media. We see long-tailed distributions for the usage of media and cell lines, with DMEM and RPMI dominating the media, and HEK293, HEK293T, and HeLa dominating cell lines used.

Research limitations: Our analysis was restricted to documents in ScienceDirect, and our text mining method achieved high recall but low precision and mandated manual inspection of many tokens.

Practical implications: Our findings document current cell culture practices in the biomedical research community, which can be used as a resource for future experimental design.

Originality/value: No other work has taken a text mining approach to surveying cell culture practices in biomedical research.

Cite this article

Jessica Cox , Darin McBeath , Corey Harper , Ron Daniel . Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Research Literature[J]. Journal of Data and Information Science, 2020 , 5(3) : 161 -177 . DOI: 10.2478/jdis-2020-0016

1 Introduction

Experimentation using in vitro cell culture serves as a foundation of biomedical research. Immortalized cell lines and primary cells are cultured and maintained in cell culture media; a specially formulated mixture of metabolites that supports cellular growth and proliferation. Widely used basal medias (e.g. DMEM, RPMI, MEM) are manufactured with the intent of minimizing variability between culture techniques (Asayama, 2017). These basal media are further supplemented with various compounds to fit the needs of the specific cell line or type. However, the composition of these medias rarely resembles that of the true physiologic conditions these cells would encounter in vivo.
The development of defined culture media began as a mission to identify the minimal required metabolites to sustain cell viability outside of the body. Experiments conducted by Harry Eagle in the 1950’s resulted in the development of Basal Medium Eagle (BME) (Eagle, 1955) and it’s more nutritionally dense derivative, Minimum Essential Medium (MEM) (Eagle, 1959). In 1959 Dulbecco and Freeman reported on the use of a concentrated version of MEM, which is now known as DMEM (Dulbecco & Freeman, 1959). RPMI 1640 was developed in 1967 specifically for the culture of white blood cells but supports growth of various immune cell types (Moore, Gerner, & Franklin, 1967). Typically, these basal medias are supplemented with fetal bovine/calf serum (FBS/FCS), an undefined media first described in 1958 (Puck, Cieciura, & Robinson, 1958). FBS is a source of growth factors, trace elements, vitamins, hormones, and proteins that stimulate and sustain in vitro cell growth. Aside from the ethical concerns around the collection of serum, there is variability between batches of serum, which has downstream effects on experimental results (Sikora et al., 2016) (Zheng et al., 2008). Beyond FBS, basal media may be additionally supplemented with antibiotics and/or non-essential amino acids, amongst other compounds necessary for cellular growth within the specific system. The combination of cell line, basal media and supplements used to build cell culture systems is seemingly limitless.
Several studies have shown that the composition of basal medias has a significant impact on the results of their experiments (Ariffin et al., 2016; Kim et al., 2015; Pirsko et al., 2018; Selenius et al., 2019; Tomoya Kawakami, 2016). However, culture media and supplementation remain a largely overlooked experimental parameter. Within the past decade, there has been an increase in awareness of how these environmental conditions influence cell metabolism, and more questions are being raised on whether the simplistic environments in which cells are cultured can really be appropriately used to model in vivo conditions (Adams, 2019; Cantor, 2019; Hirsch & Schildknecht, 2019; Mckee & Komarova, 2017; Vande Voorde et al., 2019).
These questions are particularly significant given the broader attention given to reproducibility in science in the past decade. While there is some debate over how serious the issue is in biomedical research, the community agrees that quality work must be reproducible. In 2012, Amgen reported an 11% reproducibility rate of 53 landmark papers in cancer biology (Begley & Ellis, 2012), and in 2018 The Reproducibility Project: Cancer Biology cut the number of studies they worked to replicate from 50 to 18, due to a variety of factors, and of these could only replicate 5 (Kaiser, 2018). Experiments that are not reproducible may be due to dishonest reporting, but more likely due to unreported laboratory and methodological conditions within a paper. This may be further reduced to variability in lots or batches of materials used to culture cells.
In order to understand how biomedical researchers use cell media in practice, and the subsequent downstream effects of these choices, we have analyzed a corpus of nearly 100,000 biomedical research articles published in ScienceDirect since 2000. From that larger corpus we selected 12,732 sentences from full-length articles that contain mentions of known basal media and cell lines. We found only one study that assessed use of media types in biomedical research (Arora, 2013), and believe our work will strengthen the findings of their review. Our contributions are to provide basic counts of the media types, cell lines, and supplement types; to provide information on the co-occurrences of those items, and to provide data on how the usage of those items has changed since the year 2000.
Understanding at a high level what the most frequent co-occurring combinations are can pave the way towards understanding and establishing community standards, and fuel a larger discussion of the value, or risk, of using such artificial media in biomedical research. This also provides a baseline of current practice that can help identify future trends towards more physiologically representative media.

2 Methodology

2.1 Corpus

We sourced articles from a list of manually curated biology journals, developed in 2017 by the authors (Groth & Cox, 2017). From these journals, we selected all full-length research articles that were published in the year 2000 or later. From these 174,971 papers, we selected all of the sentences that appeared in the methods section of the paper. To do this, we first used our open source AnnotationQuery tool (AQ) (McBeath, 2017) to filter section titles that contained the terms “experiment”, “procedure”, “method”, “in vitro”, “cell culture”, or “cell”. Using AQ we selected all of the sentences that appeared in these sections. This returned 6.99 million sentences from 94,695 unique documents.

2.2 Dictionary development

We downloaded a version of Cellosaurus in February 2019, version 29. Cellosaurus is an ontology of cell lines, developed by the Swiss Institute of Bioinformatics (Bairoch, 2018). In total, the ontology covers 109,135 cell lines. Each of these cell lines also has associated synonyms. For example, HeLa cells may also be represented as “Hela”, “He La”, “hela”, etc. Inspection of the entries revealed several noisy tokens. To improve precision of our tagging, we filtered out terms if they were represented by all numerals, a single or double or triple letter string, (e.g. “A” or “AB” or “ABC”), a combination of a single letter and a single digit (e.g. “1H” or “H1”), a string of numbers with a “-” or “.” in between them (e.g. “2-2” or “2.2”), a single letter followed by a single digit separated by a “-” or “.” (e.g. “A.1”), or two letters followed by a single number (e.g. “CH3”). There was also a significant amount of manual review of the remaining terms and their synonyms. We filtered out any names (e.g. “Fisher”) or overlap with medium types (i.e. “F12”). If any inappropriate tokens appeared in our analysis, they were reviewed by JC and excluded. One example is of the cell line RERF-LC-MS, which was annotated several thousand times in our corpus. One of its synonyms is “LC-MS”, is more commonly used to refer to “liquid chromatography-mass spectrometry” in the biomedical literature. After removal of this synonym, it did not appear in our corpus at all. Our final cleaned dictionary contained 104,625 cell lines and 167,823 synonyms (Cox, Media and Cell Line Dictionaries, 2020).
In addition to cell types, we developed a manually curated list of commonly used cell medias (Cox, Media and Cell Line Dictionaries, 2020). This was sourced from the commercial vendors Thermo-Fisher and SigmaAldrich. We also included search terms to include more biologically relevant media. Specifically, those terms are “Plasmax” (Vande Voorde et al., 2019) and “human plasma-like medium” (Cantor et al., 2017). Like Cellosaurus, we included synonyms terms for each parent term. For example, the entry DMEM had synonyms “Dulbecco’s modified eagle media”, “Dulbecco’s modified eagle medium”, etc. In total, we had a list of 26 parent cell media terms, and 508 synonyms.

2.3 Annotation

We used the Aho-Corasick algorithm to tag the papers in our corpus with these two dictionaries. The corpus was annotated so that we maintained case-sensitivity but did not maintain punctuation-sensitivity. An advantage of this method is that we are able to find terms that are in the middle of a sentence in between punctuation, such as commas or periods. The tradeoff is that we lose precision, and there are more noisy tokens in our dataset. The corpus was also processed with Stanford Core NLP for sentence breaks and Part of Speech (POS) tags (Manning et al., 2014).

2.4 Cell and media co-occurrence

We hypothesized that papers using traditional in vitro cell culture methods will describe their system first describing the cell type, then the media type, followed by any supplementations. A hypothetical example may read, “HeLa cells were cultured in DMEM and supplemented with 10% FBS and 1% penicillin-streptomycin.” To find these statements, we selected sentences that had co-occurrence of a cell and media annotation, in part because we did not have a curated list of supplements. This returned a list of 12,732 sentences from 8,790 unique documents.
We took a random sample of 5% (n = 638) of those sentences and counted how many followed our hypothesized structure (cell line, basal media, supplement). We found 78% adhered to our hypothesized structure (n = 500). That meant 22% (N = 138) did not follow our hypothesized structure and may produce errors.

2.5 Compound annual growth rate (CAGR) calculation

We calculated the compound annual growth rate (CAGR) using the following equation:
$CAGR=\big(\frac{Ending\ articles\ count}{Beginning\ article\ count}\big)^{(\frac{1}{number\ of\ years})}-1$

2.6 Data visualization

All visualizations were built in Tableau Desktop version 2018.2.

3 Results

To understand the basic characteristics of the reporting of culture media in this corpus, we count the number of distinct articles that mention one or more of the basal medias. Of the 26 basal medias curated in our list, only 21 appeared in our corpus. The 5 that were excluded were: Plasmax, human plasma-like media (HPLM), Swim’s S77 Media, NCTC and Cmrl 1066. In the context of the larger corpus of ~ 95k documents, media are only mentioned in about 27.5% of the biomedical articles. There were 126,409 annotations of media made in 64,637 methods section sentences in 26,036 unique documents. There were 68,216 document-level annotations of media, meaning each term (such as “DMEM”) was only counted once per document, reducing multiple mentions or synonyms in a document to 1. Table 1 provides the count of the 5 most commonly-mentioned media types, and their proportion of the corpus. DMEM was annotated in 60.6% of all the documents mentioning media, followed by RPMI at 31.2% and MEM at 20.6%. These percentages do not sum to 100%, as a single document can mention multiple media. The top 5 most commonly mentioned media types accounted for 89.4% of all media annotations.
Table 1 Document level count of top 5 basal medias and their representation in biomedical research mentioning cell cultured published since 2000.
Basal media Raw annotation counts
(n = 126,409)
Count of unique documents that mention media Percentage of documents that mention media
(n = 26,036)
CAGR
(2000-2018)
DMEM 40,347 15,770 60.6% 13.4%
RPMI 36,490 8,114 31.2% 9.2%
MEM 17,154 5,369 20.6% 10.4%
F12 9,413 3,660 14.1% 15.4%
DMEM/F12 9,604 2,221 8.5% 21%

Of the 26 basal medias in our list, these 5 represent the most commonly used medias within our corpus, with DMEM and RPMI representing the majority.

To understand if the mentions of particular culture media are increasing or decreasing over time, the document level annotation counts were grouped by year and visualized. Figure 1A shows the document level mentions of each of the top 5 media types over time, where we observe an increase in the raw counts, with DMEM at the top. Note that because the number of articles published grows each year, we need to discount the raw counts if we want to understand how media type mentions are changing as a proportion of the biomedical literature. To do so, Figure 1B shows the CAGR of these media types over time. As a baseline, included in the graph is the CAGR of the articles published in this subset of biomedical journals, with an overall CAGR rate of 5.3% (line highlighted in gray). DMEM had an overall CAGR of 13.4%, RPMI was 9.2%, DMEM/F12 was 21%, F12 was 15.4%, and MEM was 10.4%.
Figure 1. Trends in top 5 basal media mentions since the year 2000.

A: We plotted the unique document count of our top 5 medias (DMEM, RPMI, MEM, F12, DMEM/F12) over time, from 2000-2018, to observe changes in counts over time. Most articles published referenced DMEM. B: We plotted the compound annual growth rate (CAGR) of the number of mentions from 2000-2018. The media curves are plotted in black, and as a baseline, the CAGR of all full length articles published in the same journals was is plotted in gray. Articles grew at a rate of 5.3% over the 18 year period. DMEM/F12 had the highest growth rate, around 21%, while the others maintained between 10-18.

We performed a slightly different analysis for cell lines, due to the frequency of false positives in our annotations, described in section 2.2. After annotating our corpus, we used AQ to extract sentences that contained a co-occurrence of cell lines and media, with the assumption that these sentences would have higher precision of true cell line mentions than those without a co-occurring media type. This produced a set of 12,732 sentences. We found 2,149 unique cell lines occurring within this set of sentences. The 10 most frequently occurring cell lines and their counts are shown in Table 2. Cumulatively, the top 10 cell lines make up 23.6% of all the cell line annotations of this corpus. This is reflective of the diversity of research occurring in the field, where no one cell line is dominating the entire landscape, unlike the media types. Like cell media, we plotted the document level counts and CAGR of each cell line by year. Figure 2A shows that mentions of the top 10 cell lines are increasing during this period of time, with HEK293T, HEK293 and HeLa at the top. In Figure 2B we plot the CAGR of the cell line mentions, as well as the CAGR of the articles published in this subset of biomedical journals (5.3%; in gray). HEK293 had a CAGR of 24.2% and HEK293T had 20.7%.
Table 2 Document level count of top 10 cell lines and their representation in sentences mentioning a cell line and basal media.
Cell line Count of unique documents that have a cell line and media co-occur Percentage of documents that have a cell line and media co-occur
(n = 12,732)
CAGR (2000-2018)
HEK293T 1,314 10.3% 20.7%
HeLa 1,220 9.6% 11.2%
HEK293 848 6.7% 24.2%
MCF-7 771 6.1% 14.9%
Hep-G2 523 4.1% 16.2%
MDA-MB-231 490 3.9% 22.9%
THP-1 374 3% 23.1%
RAW 264.7 358 2.8% 20.1%
NIH 3T3 342 2.7% 10.2%
SH SY5Y 335 2.6% 19.8%

Of the 2,174 cell lines mentioned in our dataset of sentences mentioning a media and cell line together, these 10 represent the most commonly used cell lines. HEK293T and HeLa were the most prevalent in this dataset. These cell types include both human and mouse cells.

Figure 2. Trends in top 10 cell line mentions since the year 2000.

A: We plotted the unique document count of our top 10 cell lines (HEK293, HEK293T, HeLa, Hep-G2, MCF-7, MDA-MB-231, NIH 3T3, RAW 264.7, SH-SY5Y, THP-1) over time, from 2000-2018, to observe changes in counts over time. Most articles published referenced HEK293, HEK293T or HeLa. B: We plotted the compound annual growth rate (CAGR) of the number of mentions from 2000-2018. The cell line curves are plotted in black, and as a baseline, the CAGR of all full length articles published in the same journals is plotted in gray. Articles grew at a rate of 5.3% over the 18 year period. HEK293T cells had the highest growth rate of 24.2%.

Figure 3 is a bar graph of the breakdown of the co-occurrence of the top 5 basal media with the top 10 cell lines. The height of each column represents the total number of sentences containing the cell line, given on the x-axis. Those columns are broken down proportionally by how many also contained reference to a specific basal media. Most cell lines co-occurred most frequently with DMEM, the most frequently mentioned basal media. The one exception is THP-1 cells, which co-occurred most often with RPMI. This is to be expected, because RPMI was specially formulated for immune-derived cell types, such as THP-1. MCF-7, Hep-G2, and MDA-MB-231 co-occurred most commonly with DMEM, though they were more evenly distributed amongst other basal medias as well.
Figure 3. Co-occurrence of cell line and basal media.

Each bar represents the total number of sentences that mentioned the cell line (x-axis), broken down by count that referenced a specific basal media. Note that DMEM is the dominant media in all but one cell line. THP-1, an immune-type cell, is the only cell type to occur most frequently with RPMI, a media developed for these cell types.

The final part of our analysis was to capture the nouns following basal media and quantify their occurrence, operating under the hypothesis that these nouns represented media supplements. We expected to see tokens that represented commonly used supplements like serum, antibiotics and growth factors. We identified 7,497 unique tokens that came after mentions of basal media. We manually filtered out noisy tokens, such as vendor names, numbers and countries. Table 3 contains raw counts of the 10 most frequently occurring tokens in these sentences. It should be noted, in identifying tokens, we limited to single nouns anywhere after the mention of the basal media. As an example, we would capture “fetal”, “bovine”, and “serum” as three different tokens, and “FBS” would also be captured as a single token. We chose to include “serum” as its own token because we did observe sentences that did not specify what kind of serum was used.
Table 3 Raw count of top 10 noun tokens following basal media mentions and their proportion of sentences describing cell culture conditions.
Token Raw count Proportion of sentences mentioning cell and media
n = 12,732
serum 10,685 83.9%
fbs 9,283 72.9%
penicillin 4,708 37%
streptomycin 4,632 36.4%
l-glutamine 2,739 21.5%
fcs 2,598 20.4%
calf 2,497 19.6%
penicillin-streptomycin 1,667 13%
antibiotic 1,232 9.7%
acid 1,217 9.6%

This table lists the most commonly occurring tokens (tagged as nouns by the Stanford CoreNLP POS tagger) following mention of basal media in sentences that have co-occurrence of a cell line and media. These tokens represent supplements that are added to the cell culture system. The majority of these sentences mention use of serum, followed by antibiotics.

We were most interested in visualizing the most frequently occurring cell line/basal media/supplement combinations. Figure 4 represents these combinations as a heatmap, with the top 5 occurring medias and top 10 occurring cell lines comprising the rows, and the top 5 supplements tokens comprising the columns. The values within the cells are the counts of sentences contain that combination of tokens, and the shading of the heatmap represents the percentage of the sentences within that media grouping. In order to interpret this data, Table 4 provides an outline of the recommended cell culture conditions, provided by the ATCC, the largest cell line vendor in the US (Berns, Bond, & Manning, 1996). This table presents the cell line, cell type and species, and ATCC’s recommended basal media and supplementations. It also presents the most frequently mentioned basal media type from the sentences mentioning the cell line. Of the 12,732 sentences that contained these three elements, 10,443 mentioned one of the top 5 media. Breaking these groups down by cell line and supplement, an overwhelming majority of 7,107 sentences mentioned DMEM. RPMI was mentioned in 1,606 of these sentences, MEM in 882, F12 in 554 and DMEM/F12 in 284. Of the DMEM sentences, most co-occurred with HEK293, HEK293T and HeLa, which was to be expected, and the prevailing supplement types were “serum” (2,196) and “fbs” (1,865). In each of the media groups, “serum” and “fbs” were the most commonly co-occurring supplement tokens, followed by “streptomycin” and “penicillin”. There was high concordance between streptomycin and penicillin counts, which we expected as they are typically packaged together. There were a few cell-line specific points that came out of the data, such as THP-1 cells being cultured predominantly with RPMI and 2-mercaptoethanol, which matches ATCC recommendations. THP-1 cells were the only cell type to co-occur with 2-mercaptoethanol (data too rare to appear in figures and tables). The most frequently co-occurring cell line for F12 was SH-SY5Y cells, which is somewhat in line with ATCC recommendations to culture these cells in MEM:F12 (1:1 ratio). L-glutamine was the least frequent supplement token, but had highest co-occurrence with HEK293T cells, which is consistent with ATCC cell culture recommendations. Interestingly, most of the cell lines co-occur with DMEM, despite only HEK293T and NIH3T3 cells being recommended to be cultured in DMEM.
Figure 4. Co-occurrence heatmap of cell line, basal media and supplement tokens.

In this figure, the values represent the raw count of sentences that contain that combination of words (cell line, basal media and supplement token). The shading represents the percent of the total sentences within that pane, which is defined by media, in the left column.

Table 4 ATCC Cell Culture recommendations for the 10 most frequently occurring cell lines in our corpus.
Cell Line Species and cell type Recommended basal media Recommended supplements Most frequently co-occurring basal media
HEK293 Homo sapiens, embryonic kidney MEM 10% FBS DMEM
HEK293T Homo sapiens, embryonic kidney DMEM 10% FBS, 2 mM l-glutamine DMEM
HeLa Homo sapiens, cervix, epithelial MEM 10% FBS DMEM
Hep-G2 Homo sapiens, liver, epithelial MEM 10% FBS DMEM
MCF-7 Homo sapiens, mammary glands, epithelial MEM 10% FBS, 0.01 mg/ml insulin DMEM
MDA-MB-231 Homo sapiens, mammary glands, epithelial L-15 10% FBS DMEM
NIH 3T3 Mus musculus, embryo, fibroblasts DMEM 10% FBS DMEM
RAW 264.7 Mus musculus, monocyte/macrophage RPMI 10% FBS DMEM
SH-SY5Y Homo sapiens, bone marrow, epithelial MEM:F12 10% FBS F12
THP-1 Homo sapiens, peripheral blood, monocyte RPMI 0.05 mM 2-mercaptoethanol, 10% FBS RPMI
This table presents the cell culture recommendations for our top 10 cell lines within our corpus. We describe the species and cell type, recommended basal media and supplements. The last column presents the most commonly co-occurring media type in sentences mentioning these cell lines. Note that for eight of the ten cell lines, DMEM is used more frequently than the ATCC recommended media.

4 Discussion & conclusion

This study presents a high-level view of the cell culture methodology being employed in biomedical research. The goal of the study was to use text mining to tabulate the cell line, basal media and supplement combinations present in the literature and their trends over time. We observed that cell lines are frequently being cultured in media inconsistent with manufacturer’s recommendations.
The most frequently occurring cell lines and media types were unsurprising. DMEM appears to have become an all-purpose basal cell media, able to support the growth of various cell types, while RPMI remains the basal media of choice of immune derived cells. The most popular cell lines included those that have been used for decades. HEK293 cells are robust and easy to grow in culture. They are easily transfected, enabling various kinds of assays evaluating gene and protein expression and activity, and the generation of adenoviral vectors. HEK293T cells are a derivative of HEK293 cells that have been transformed with SV40 large T antigen (DuBridge et al., 1987), which facilitates higher copy numbers of recombinant protein or retrovirus being produced by these cells. HeLa cells are the oldest cell line and are also fast growers and easy to maintain (Gey, Coffman, & Kubicek, 1952). MCF-7 cells are a breast cancer cell line that has also been around for decades (Suole et al., 1973). Their popularity may also reflect the high breast cancer research activities in the US and abroad.
Typically, cell line vendors provide manufacturers recommendations on basal media and supplements to be used to promote optimal cell growth. We relied on the ATCC as a source of this information. Because these manufacturer’s recommendations exist and are easily accessible, we assumed there would be high concordance between the recommendations and what was reported. We were surprised to see that the majority of our sentences contained reference to DMEM, even though that is only recommended for two of the cell lines in our top 10: NIH 3T3 and HEK293T cells. There may be a few reasons for this. DMEM is a variation of MEM, but it contains four times more vitamins, amino acids and glucose. These metabolites are necessary for growth, and scientists may be taking a “more is better” approach in terms of providing complete media for their cell lines. The superfluous amount of metabolites allow for less maintenance of these cell lines, and may explain its high adoption, despite being recommended for very few. We hope that researchers may begin to reflect on the unintended consequences of providing cells with highly artificial quantities of metabolites may have on their experimentation.
Another reason we may be seeing such a large skew towards DMEM, is that not every sentence in our corpus is structured how we hypothesized. There are more complicated sentence structures that contain references to more than one cell culture system. For example, the sentence, “HEK293T and Ba/F3 cells were grown in Dulbecco’s modified Eagle’s medium and RPMI-1640, supplemented with 10% fetal bovine serum.” (Wasag et al., 2011). In this case, HEK293T was cultured with DMEM and 10% FBS, and Ba/F3 cells were cultured with RPMI-1640 and 10% FBS. However, using our initial methodology, we would record all combinations: [HEK293T, DMEM, FBS], [HEK293T, RPMI, FBS], [Ba/F3, DMEM, FBS] and [Ba/F3, RPMI, FBS], and we know these not to be correct. As mentioned in our methods section, we found that 78% of our sentences adhered to our structure. A deeper analysis showed that only 5% of the random sample mentioned multiple cell lines and media types, which would result in errors like those above. We do not believe this error rate is enough to account for the very high rates of DMEM being used in cell culture.
The basal medias that we did not see in the literature include two recent additions, Plasmax and HPLM. The most likely explanation is that they have not been in the literature long enough to have been cited by any works in our corpus. We are optimistic that as more researchers consider these variables in the age of reproducible science, we would see higher adoption rates of these more physiologically relevant media types. The other medias we did not observe (Swim’s S77, NCTC and Cmrl 1066) may have fallen out of favor with researcher’s due to their specialized but incomplete formulas.
There are a few limitations of our study. The first is in our text mining method. The way our annotation scheme was configured, we achieved high recall, but lower precision, allowing for noisy tokens to slip through. In the manual inspection of the cell culture triplet sentences, there were many false hits of F10 and RPMI observed, coming through in the names of cell lines that included a hyphen to one of these media types, and so they would not be excluded. While further constraint of our parameters would eliminate such noisy tokens, it would also eliminate perfectly valid hits. Unfortunately, there is no perfect tool to account for the noisy names of cell lines that often contain characters like periods and hyphens, and combinations of digits and letters. Related, due to the very complete nature of Cellosaurus, there were several noisy tokens that appeared. Cell lines had synonyms like “cancer” or those of gene names, that needed cleaning to achieve better precision. Again, in some instances a cell line may have the same name as a media or disease, and there is no way short of true AI to distinguish the two.
Another limitation is that we did not consider neuronal-derived cell lines or medias, and constrained our dataset to cancer and immortalized cell lines rather than neuronal-derived cells. This was done simply to constrain the dataset, and because of lack of expertise on the authors part in cell culture systems development in these cell types, that can often require considerably different conditions.
We also did not consider cell culture systems that use the newer approach of induced pluripotent stem cells, iPSCs. After the discovery of the induction of various cell types from common progenitor cells (Takahashi & Yamanaka, 2006), more studies are using cell cultures systems derived from primary stem cells, completely bypassing the use of immortalized cell lines. However, because there is no central ontology capturing the nomenclature of these primary cells, we did not include it in this study. It would be interesting to understand how the different cell culture conditions in that field have grown and evolved over time, particularly in contrast to more classical cell culture systems.
Our hope is that this paper shed light on the high variability of conditions of cell culture systems being reported in the literature, and how difficult it becomes to compare studies due to these fundamental differences. Many before us have called attention to how differences in media have significant impacts on the results of experiments. This paper goes a step beyond to measure the scale within a large corpus of literature. Our intention is not to suggest there is no value in cell culture, we owe much of the progress of biomedical research to the very existence of these systems. Instead, it is to highlight how these basic experimental parameters typically overlooked may not be as consistent as researchers assume and should be carefully considered when making conclusions about ones work in the context of all that has come before it.

Author contributions

Jessica Cox (j.cox@elsevier.com) proposed the research problem, performed the research, designed the framework, collected and analyzed the data and wrote and revised the manuscript. Darin McBeath (d.mcbeath@elsevier.com) designed the research framework. Corey Harper (c.harper@elsevier.com) designed the research framework and revised the manuscript. Ron Daniel, Jr. (r.daniel@elsevier.com) designed the research framework and wrote and revised the manuscript. This work is an extension of a poster presented at ISSI 2019 (Cox, McBeath, Harper, & Daniel, Jr., 2019).
1
Adams J.C . ( 2019). A new initiative for AJP-Cell Physiology: “Making Cell Culture More Physiological”. American Journal of Physiology-Cell Physiology, 316(6), C828-C829. https://doi.org/10.1152/ajpcell.00157.2019

DOI PMID

2
Ariffin S., Rozali N., Wahab R., Senafi S., Abidin I., & Ariffin Z . ( 2016). Analyses of basal media and serum for in vitro expansion of suspension peripheral blood mononucleated stem cell. Cytotechnology, 68(4), 675-686. DOI: 10.1007/s10616-014-9819-8

DOI PMID

3
Arora M . ( 2013). Cell Culture Media: A Review. Materials and Methods, 3, 175. DOI: http://dx.doi.org/10.13070/mm.en.3.175

4
Bairoch A . ( 2018). The Cellosaurus, a Cell-Line Knowledge Resource. J Biomol Tech, 29(2), 25-38.

DOI PMID

5
Begley C., & Ellis L.M . ( 2012). Raise standards for preclinical cancer research. Nature, 483, 531-533.

DOI PMID

6
Berns K., Bond E., & Manning F. . ( 1996). The American Type Culture Collection. In Resource Sharing in Biomedical Research. Washington DC: National Academies Press.

7
Cantor J.R . ( 2019). The Rise of Physiologic Media. Trends in Cell Biology, 29(11), 854-861.

DOI PMID

8
Cantor J.R., Abu-Remaileh M., Kanarek N., Freinkman E., Gao X., Louissant Jr., A.,… Sabatini D.M . ( 2017). Physiologic Medium Rewires Cellular Metabolism and Reveals Uric Acid as an Endogenous Inhibitor of EMP Synthase. Cell, 258-272.

9
Cox J . ( 2020). Media and Cell Line Dictionaries. Mendeley Data. doi: 10.17632/3nnsyxdsvd.1.

10
Cox J., McBeath D., Harper C., & Daniel Jr., R . ( 2019). Co-occurrence of Cell Lines, Basal Media and Supplementation in the Biomedical Research Literature. 17th International Conference on Scientometrics & Informetrics ISSI2019 with a Special STI Indicators Conference Track (p. 2545). Edizioni Efesto.

11
DuBridge R.B., Tang P., Hsia H.C., Leong P.-M., Miller J.H., & Calos M.P . ( 1987). Analysis of Mutation in Human Cells by Using an Epstein-Barr Virus Shuttle System. Molecular and Cellular Biology, 7(1), 379-387.

PMID

12
Dulbecco R., & Freeman G . ( 1959). Plaque production by the polyoma virus. Virology, 8(3), 396-397.

DOI PMID

13
Eagle H . ( 1955). Nutrition needs of mammalian cells in tissue culture. Science, 122(3168), 501-504.

DOI PMID

14
Eagle H . ( 1959). Amino acid metabolism in mammalian cell cultures. Science, 130(3373), 432-437.

DOI PMID

15
Gey G.O., Coffman W.D., & Kubicek M.T . ( 1952). Tissue Culture Studies of the Proliferative Capacity of Carcinoma and Normal Epithelium. Scientific Proceedings American Association for Cancer Research, Inc. New York.

16
Groth P., & Cox J . ( 2017). Indicators for the use of robotic labs in basic biomedical research: A literature analysis. PeerJ. 5, e3997. https://doi.org/10.7717/peerj.3997

DOI PMID

17
Hirsch C., & Schildknecht S . ( 2019). In Vitro Research Reproducibility: Keeping Up High Standards. Frontiers in Pharmacology, 10(1484).

18
Kaiser J . ( 2018). Plan to replicate 50 high-impact cancer papers shrinks to just 18. Retrieved from Science: https://www.sciencemag.org/news/2018/07/plan-replicate-50-high-impact-cancer-papers-shrinks-just-18

19
Kim S.W., Kim S.-J., Langley R.R., & Fidler I.J . ( 2015). Modulation of the cancer cell transcriptome by culture media formulations and cell density. International Journal of Oncology, 46(5), 2067-2075.

DOI PMID

20
Manning C.D., Surdeanu M., Bauer J., Finkel J., Bethard S., & McClosky D . ( 2014). The Stanford CoreNLP Natural Language Processing Toolkit. Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, (pp. 55-60).

21
McBeath D . ( 2017). AnnotationQuery [Computer Software]. https://github.com/elsevierlabs-os/AnnotationQuery.

22
McKee T.J., & Komarova S.V . ( 2017). Is it time to reinvent basic cell culture medium? American Journal of Physiology - Cell Physiology, 312(5), C624-C626. https://doi.org/10.1152/ajpcell.00336.2016

DOI PMID

23
Moore G.E., Gerner R.E., & Franklin A . ( 1967). Culture of Normal Human Leukocytes. JAMA, 199(8), 519-524.

PMID

24
Pirsko V., Cakstina I., Priedite M., Dortane R., Feldmane L., Nakazawa-Miklasevica M., … Miklasevics E . ( 2018). An Effect of Culture Media on Epithelial Differentiation Markers in Breast Cancer Cell Lines MCF7, MDA-MB-436 and SkBr3. Medicina.

DOI PMID

25
Puck T.T., Cieciura S.J., & Robinson A . ( 1958). Genetics of somatic mammalian cells III. Long-term cultivation of euploid cells from human and animal subjects. Journal of Experimental Medicine, 108(6), 945-956.

DOI PMID

26
Selenius L.A., Lundgren M.W., Jawad R., Danielsson O., & Bjornstedt M . ( 2019). The Cell Culture Medium Affects Growth, Phenotype Expression and the Response to Selenium Cytotoxicity in A549 and HepG2 Cells. Antioxidants, 8(130).

27
Sikora M.J., Johnson M.D., Lee A.V., & Oesterreich S . ( 2016). Endocrine Response Phenotypes Are Altered by Charcoal-Stripped Serum Variability. Endocrinology, 157(10), 3760-3766.

DOI PMID

28
Suole H., Vazquez J., Long A., & Brennan M . ( 1973). A Human Cell Line From a Pleural Effusion Derived From a Breast Carcinoma. Journal of the National Cancer Institute, 51(5), 1409-1416.

PMID

29
Takahashi K., & Yamanaka S . ( 2006). Induction of Pluripotent Stem Cells from Mouse Embryonic and Adult Fibroblast Cultures by Defined Factors. Cell, 126(4), 663-676.

DOI PMID

30
Tomoya Kawakami K.K. . ( 2016). Influence of the culture medium on the production of nitric oxide and expression of inducible nitric oxide synthase by activated macrophages in vitro. Biochemistry and Biophysics Reports, 328-334.

DOI PMID

31
Vande Voorde J., Ackerman T., Pletzer N., Sumpton D., Mackay G., Kalna G., … Tardito S . ( 2019). Improving the metabolic fidelity of cancer models with a physiological cell culture medium. Science Advances, 5(1), eaau7314. doi: 10.1126/sciadv.aau7314

32
Wasag B., Niedoszytko M., Piskorz A., Lange M., Renke J., Jassem E., … Limon J . ( 2011). Novel, activating KIT-N822I mutation in familial cutaneous mastocytosis. Experimental Hematology, 39(8), 859-865.

DOI

33
Yao T., & Asayama Y . ( 2017). Animal-cell culture media: History, characteristics, and current issues. Reproductive Medicine and Biology, 16(2), 99-117.

DOI PMID

34
Yong E . ( 2019). Scientists Have Been Studying Cancers in a Very Strange Way for Decades. Retrieved from The Atlantic: https://www.theatlantic.com/science/archive/2019/01/cancer-culture-media-plasmax/579283/

35
Zheng X., Baker H., Hancock W.S., Fawaz F., McCaman M., & Pungor Jr., E . ( 2008). Proteomic Analysis for the Assessment of Different Lots of Fetal Bovine Serum as a Raw Material for Cell Culture. Part IV. Application of Proteomics to the Manufacture of Biological Drugs. Biotechnology Progress, 22(5), 1294-1300.

DOI PMID

Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn