1 Introduction
2 Related work
2.1 Concept drift on Twitter data
2.2 Detecting concept drift
3 Concept drift: Background, challenges, and our approach
3.1 Concept drift
3.1.1 Class drift (Changes in P(Ci|X))
3.1.2 Covariant drift (Changes in P(X))
Figure 1. Class and co-variant drifts and how our approach tends to eventually address the domain concept drifts in AIS. |
Figure 2. High-level overview of our iterative process. |
Figure 3. Current research focuses on the prediction of concept drift, while future work aims at addressing the drift. |
3.2 A motivating example
Figure 4. Sudden drift in P(X): The frequency of pedestrian safety-related topics, on March 18, 2018, the day a self-driving Uber ran over a pedestrian in Arizona. |
4 Concept drift specification and analysis
4.1 Defining pedestrian concept features (CP)
4.1.1 Linguistically-important features
Table 1. Top five words returned by different search queries on Google Books N-gram. |
“pedestrian” + [verb] | “pedestrian” + [noun] |
---|---|
pedestrian crossing | pedestrian traffic |
pedestrian walks | pedestrian mall |
pedestrian killed | pedestrian bridge |
pedestrian pass | pedestrian street |
pedestrian moving | pedestrian zone |
[verb] + “pedestrian” | [noun] + “pedestrian” |
protect pedestrian | child pedestrian |
warn pedestrian | adult pedestrian |
hurrying pedestrian | level pedestrian |
involving pedestrian | street pedestrian |
encourage pedestrian | block pedestrian |
Table 2. Top ten similar words to pedestrian from Wikipedia and Google News corpora. |
Wiki terms | Similarity | Google terms | Similarity |
---|---|---|---|
walkway | 0.6928 | bicyclist | 0.6166 |
lanes | 0.6808 | crosswalk | 0.5942 |
sidewalks | 0.6572 | motorist | 0.5460 |
roadway | 0.6411 | bike lanes | 0.5416 |
vehicular | 0.6380 | pedestrian walkways | 0.5328 |
thoroughfare | 0.6337 | bicycle lanes | 0.5256 |
subway | 0.6296 | bikeway | 0.5248 |
underpass | 0.6193 | traffic calming | 0.5239 |
overpass | 0.6157 | roadway | 0.5181 |
parking | 0.6129 | traffic | 0.5173 |
4.1.2 Visually-important features
Figure 5. Model-generated captions and detected objects. |
4.1.3 The overlap
4.2 Community-based context specification (RD)
4.2.1 Data collection
Table 3. Collected datasets for autonomous car accidents. |
Date | Accident | # Tweets |
---|---|---|
29 July 2016 | Tesla | 89,881 |
18 March 2018 | Uber | 119,121 |
26 April 2019 | Tesla | 154,916 |
4.2.2 Data cleaning
4.2.3 Structuring the community-based context
4.3 Temporal analysis of concept drift (t)
4.3.1 Mapping significant features to community topics
4.3.2 Identifying significant topic drifts
Figure 6. The change in mean similarity scores for “pedestrian” in social topics before, during, and after car accidents. |
4.4 Event-based drift specification (X)
Figure 7. The Gaussian probability density function shows the interval probabilities as areas under the curve. |
Figure 8. The set of terms and their probability shifts in the car accident data. |
5 Evaluation
5.1 Dataset
5.2 Topic drifts
Figure 9. The change in mean similarity scores for “pedestrian” in social topics before, during, and after Halloween. |
5.3 Specifying X
Figure 10. The set of terms and their probability shifts in the Halloween data. |
6 Generalizability of the framework
6.1 Experimental setup
Table 4. Summary of the GDELT dataset for airplane crash events. |
Airplane Crash | Date of Crash | Total News Articles | Articles Containing “Airplane” |
---|---|---|---|
California 2020 | Jan 26, 2020 | 1,813,710 | 3,900 (before: 1,327, during: 1,313, after: 1,260) |
Washington 2022 | Sep 4, 2022 | 1,126,899 | 1,979 (before: 576, during: 768, after: 635) |
Washington 2025 | Jan 29, 2025 | 1,886,173 | 11,541 (before: 1,437, during: 6,984, after: 3,120) |
6.2 Data processing and analysis
6.3 Results and observations
Figure 11. Probability shifts of terms in the Airplane Crashes dataset. |
Figure 12. The change in mean similarity scores for the term “airplane” in social topics before, during, and after the Plane Crash. |
6.4 Implications for generalizability
7 Discussion
Table 5. Qualitative comparison of concept drift detection methods. |
Metric | Proposed Framework | FiCSUM | DDM/EDDM |
---|---|---|---|
Proactivity | High (Proactive) | Moderate (Recurring Drifts) | Low (Post Hoc) |
Adaptability | High (Domain-Agnostic) | Moderate (Recurring Drifts) | Low (Frequent Retraining) |
Feature | Semantic + Visual | Meta-Features | Error-Based Only |
Efficiency | High | Moderate | Moderate |
Detection Accuracy | High | High | Moderate |