1 Introduction
2 Methodology
2.1 Corpus
2.2 Dictionary development
2.3 Annotation
2.4 Cell and media co-occurrence
2.5 Compound annual growth rate (CAGR) calculation
2.6 Data visualization
3 Results
Table 1 Document level count of top 5 basal medias and their representation in biomedical research mentioning cell cultured published since 2000. |
Basal media | Raw annotation counts (n = 126,409) | Count of unique documents that mention media | Percentage of documents that mention media (n = 26,036) | CAGR (2000-2018) |
---|---|---|---|---|
DMEM | 40,347 | 15,770 | 60.6% | 13.4% |
RPMI | 36,490 | 8,114 | 31.2% | 9.2% |
MEM | 17,154 | 5,369 | 20.6% | 10.4% |
F12 | 9,413 | 3,660 | 14.1% | 15.4% |
DMEM/F12 | 9,604 | 2,221 | 8.5% | 21% |
Of the 26 basal medias in our list, these 5 represent the most commonly used medias within our corpus, with DMEM and RPMI representing the majority. |
Figure 1. Trends in top 5 basal media mentions since the year 2000.A: We plotted the unique document count of our top 5 medias (DMEM, RPMI, MEM, F12, DMEM/F12) over time, from 2000-2018, to observe changes in counts over time. Most articles published referenced DMEM. B: We plotted the compound annual growth rate (CAGR) of the number of mentions from 2000-2018. The media curves are plotted in black, and as a baseline, the CAGR of all full length articles published in the same journals was is plotted in gray. Articles grew at a rate of 5.3% over the 18 year period. DMEM/F12 had the highest growth rate, around 21%, while the others maintained between 10-18. |
Table 2 Document level count of top 10 cell lines and their representation in sentences mentioning a cell line and basal media. |
Cell line | Count of unique documents that have a cell line and media co-occur | Percentage of documents that have a cell line and media co-occur (n = 12,732) | CAGR (2000-2018) |
---|---|---|---|
HEK293T | 1,314 | 10.3% | 20.7% |
HeLa | 1,220 | 9.6% | 11.2% |
HEK293 | 848 | 6.7% | 24.2% |
MCF-7 | 771 | 6.1% | 14.9% |
Hep-G2 | 523 | 4.1% | 16.2% |
MDA-MB-231 | 490 | 3.9% | 22.9% |
THP-1 | 374 | 3% | 23.1% |
RAW 264.7 | 358 | 2.8% | 20.1% |
NIH 3T3 | 342 | 2.7% | 10.2% |
SH SY5Y | 335 | 2.6% | 19.8% |
Of the 2,174 cell lines mentioned in our dataset of sentences mentioning a media and cell line together, these 10 represent the most commonly used cell lines. HEK293T and HeLa were the most prevalent in this dataset. These cell types include both human and mouse cells. |
Figure 2. Trends in top 10 cell line mentions since the year 2000.A: We plotted the unique document count of our top 10 cell lines (HEK293, HEK293T, HeLa, Hep-G2, MCF-7, MDA-MB-231, NIH 3T3, RAW 264.7, SH-SY5Y, THP-1) over time, from 2000-2018, to observe changes in counts over time. Most articles published referenced HEK293, HEK293T or HeLa. B: We plotted the compound annual growth rate (CAGR) of the number of mentions from 2000-2018. The cell line curves are plotted in black, and as a baseline, the CAGR of all full length articles published in the same journals is plotted in gray. Articles grew at a rate of 5.3% over the 18 year period. HEK293T cells had the highest growth rate of 24.2%. |
Figure 3. Co-occurrence of cell line and basal media.Each bar represents the total number of sentences that mentioned the cell line (x-axis), broken down by count that referenced a specific basal media. Note that DMEM is the dominant media in all but one cell line. THP-1, an immune-type cell, is the only cell type to occur most frequently with RPMI, a media developed for these cell types. |
Table 3 Raw count of top 10 noun tokens following basal media mentions and their proportion of sentences describing cell culture conditions. |
Token | Raw count | Proportion of sentences mentioning cell and media n = 12,732 |
---|---|---|
serum | 10,685 | 83.9% |
fbs | 9,283 | 72.9% |
penicillin | 4,708 | 37% |
streptomycin | 4,632 | 36.4% |
l-glutamine | 2,739 | 21.5% |
fcs | 2,598 | 20.4% |
calf | 2,497 | 19.6% |
penicillin-streptomycin | 1,667 | 13% |
antibiotic | 1,232 | 9.7% |
acid | 1,217 | 9.6% |
This table lists the most commonly occurring tokens (tagged as nouns by the Stanford CoreNLP POS tagger) following mention of basal media in sentences that have co-occurrence of a cell line and media. These tokens represent supplements that are added to the cell culture system. The majority of these sentences mention use of serum, followed by antibiotics. |
Figure 4. Co-occurrence heatmap of cell line, basal media and supplement tokens.In this figure, the values represent the raw count of sentences that contain that combination of words (cell line, basal media and supplement token). The shading represents the percent of the total sentences within that pane, which is defined by media, in the left column. |
Table 4 ATCC Cell Culture recommendations for the 10 most frequently occurring cell lines in our corpus. |
Cell Line | Species and cell type | Recommended basal media | Recommended supplements | Most frequently co-occurring basal media |
---|---|---|---|---|
HEK293 | Homo sapiens, embryonic kidney | MEM | 10% FBS | DMEM |
HEK293T | Homo sapiens, embryonic kidney | DMEM | 10% FBS, 2 mM l-glutamine | DMEM |
HeLa | Homo sapiens, cervix, epithelial | MEM | 10% FBS | DMEM |
Hep-G2 | Homo sapiens, liver, epithelial | MEM | 10% FBS | DMEM |
MCF-7 | Homo sapiens, mammary glands, epithelial | MEM | 10% FBS, 0.01 mg/ml insulin | DMEM |
MDA-MB-231 | Homo sapiens, mammary glands, epithelial | L-15 | 10% FBS | DMEM |
NIH 3T3 | Mus musculus, embryo, fibroblasts | DMEM | 10% FBS | DMEM |
RAW 264.7 | Mus musculus, monocyte/macrophage | RPMI | 10% FBS | DMEM |
SH-SY5Y | Homo sapiens, bone marrow, epithelial | MEM:F12 | 10% FBS | F12 |
THP-1 | Homo sapiens, peripheral blood, monocyte | RPMI | 0.05 mM 2-mercaptoethanol, 10% FBS | RPMI |