Dynamic domain analysis for predicting concept drift in engineering AI-enabled software

Expand
  • Department of Computer Science, Northern Illinois University, Illinois 60115, USA
†Murtuza Shahzad (Email: z1819332@students.niu.edu; ORCID: 0000-0001-7630-1617).

Received date: 2024-11-15

  Revised date: 2025-02-20

  Accepted date: 2025-03-11

  Online published: 2025-04-11

Abstract

Purpose: This research addresses the challenge of concept drift in AI-enabled software, particularly within autonomous vehicle systems where concept drift in object recognition (like pedestrian detection) can lead to misclassifications and safety risks. This study introduces a proactive framework to detect early signs of domain-specific concept drift by leveraging domain analysis and natural language processing techniques. This method is designed to help maintain the relevance of domain knowledge and prevent potential failures in AI systems due to evolving concept definitions.
Design/methodology/approach: The proposed framework integrates natural language processing and image analysis to continuously update and monitor key domain concepts against evolving external data sources, such as social media and news. By identifying terms and features closely associated with core concepts, the system anticipates and flags significant changes. This was tested in the automotive domain on the pedestrian concept, where the framework was evaluated for its capacity to detect shifts in the recognition of pedestrians, particularly during events like Halloween and specific car accidents.
Findings: The framework demonstrated an ability to detect shifts in the domain concept of pedestrians, as evidenced by contextual changes around major events. While it successfully identified pedestrian-related drift, the system’s accuracy varied when overlapping with larger social events. The results indicate the model’s potential to foresee relevant shifts before they impact autonomous systems, although further refinement is needed to handle high-impact concurrent events.
Research limitations: This study focused on detecting concept drift in the pedestrian domain within autonomous vehicles, with results varying across domains. To assess generalizability, we tested the framework for airplane-related incidents and demonstrated adaptability. However, unpredictable events and data biases from social media and news may obscure domain-specific drifts. Further evaluation across diverse applications is needed to enhance robustness in evolving AI environments.
Practical implications: The proactive detection of concept drift has significant implications for AI-driven domains, especially in safety-critical applications like autonomous driving. By identifying early signs of drift, this framework provides actionable insights for AI system updates, potentially reducing misclassification risks and enhancing public safety. Moreover, it enables timely interventions, reducing costly and labor-intensive retraining requirements by focusing only on the relevant aspects of evolving concepts. This method offers a streamlined approach for maintaining AI system performance in environments where domain knowledge rapidly changes.
Originality/value: This study contributes a novel domain-agnostic framework that combines natural language processing with image analysis to predict concept drift early. This unique approach, which is focused on real-time data sources, offers an effective and scalable solution for addressing the evolving nature of domain-specific concepts in AI applications.

Cite this article

Murtuza Shahzad, Hamed Barzamini, Joseph Wilson, Hamed Alhoori, Mona Rahimi . Dynamic domain analysis for predicting concept drift in engineering AI-enabled software[J]. Journal of Data and Information Science, 0 : 1 -8 . DOI: 10.2478/jdis-2025-0020

References

[1] ABCNews. (2015, November 12). Google’s driving cars learn caution around kids. ABC News. https://abcnews.go.com/Technology/googles-driving-cars-learn-caution-kids/story?id=34911071.
[2] Adams J. N., van Zelst S. J., Rose T., & van der Aalst, W. M. (2023). Explainable concept drift in process mining.Information Systems, 114, 102177.
[3] Baena-Garcıa M., del Campo-Ávila J., Fidalgo R., Bifet A., Gavalda R., & Morales-Bueno R. (2006). Early drift detection method. In Fourth international workshop on knowledge discovery from data streams, volume 6(pp.77-86). Citeseer.
[4] Barzamini H., Rahimi M., Shahzad M., & Alhoori H. (2022). Improving generalizability of ML-enabled software through domain specification. InProceedings of the 1st International Conference on AI Engineering: Software Engineering for AI(pp.181-192).
[5] Bechini A., Bondielli A., Ducange P., Marcelloni F., & Renda A. (2021). Addressing event-driven concept drift in twitter stream: A stance detection application.IEEE Access, 9, 77758-77770.
[6] Benenson R., Omran M., Hosang J., & Schiele B. (2015). Ten years of pedestrian detection, what have we learned? In Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6-7 and 12, 2014, Proceedings, Part II 13 (pp. 613-627). Springer.
[7] Blei D. M., Ng A. Y., & Jordan M. I. (2003). Latent dirichlet allocation.Journal of Machine Learning Research, 3(Jan), 993-1022.
[8] Brunetti A., Buongiorno D., Trotta G. F., & Bevilacqua V. (2018). Computer vision and deep learning techniques for pedestrian detection and tracking: A survey.Neurocomputing, 300, 17-33.
[9] Cabral, D. R., & Barros, R. S. M. (2018). Concept drift detection based on fisher’s exact test.Information Sciences, 442, 220-234.
[10] Cao J., Pang Y., Xie J., Khan F. S., & Shao L. (2021). From handcrafted to deep features for pedestrian detection: A survey.IEEE transactions on pattern analysis and machine intelligence, 44(9), 4913-4934.
[11] Costa J., Silva C., Antunes M., & Ribeiro B. (2014). Concept drift awareness in twitter streams. In 13th International Conference on Machine Learning and Applications (pp. 294-299). IEEE.
[12] Deshpande, L. A., & Narasingarao, M. (2019). Addressing social popularity in twitter data using drift detection technique.Journal of Engineering Science and Technology, 14(2), 922-934.
[13] Dollar P., Wojek C., Schiele B., & Perona P. (2011). Pedestrian detection: An evaluation of the state of the art.IEEE transactions on pattern analysis and machine intelligence, 34(4), 743-761.
[14] Dries, A., & Rückert, U. (2009). Adaptive concept drift detection.Statistical analysis and data mining: The ASA Data Science Journal, 2(5-6), 311-327.
[15] Gama J., Žliobaitė I., Bifet A., Pechenizkiy M., & Bouchachia A. (2014). A survey on concept drift adaptation.ACM computing surveys (CSUR), 46(4), 1-37.
[16] Google Books. (2024). Google books ngram viewer. Retrieved October 9, 2024, from https://books.google.com/ngrams
[17] Halstead B., Koh Y. S., Riddle P., Pechenizkiy M., & Bifet A. (2023). Combining diverse meta-features to accurately identify recurring concept drift in data streams.ACM Transactions on Knowledge Discovery from Data, 17(8), 1-36.
[18] Harel M., Mannor S., El-Yaniv R., & Crammer K. (2014). Concept drift detection through resampling. In International conference on machine learning(pp. 1009-1017). PMLR.
[19] Hasan I., Liao S., Li J., Akram S. U., & Shao L. (2021). Generalizable pedestrian detection: The elephant in the room. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition(pp. 11328-11337).
[20] Hinder F., Vaquet V., & Hammer B. (2024). One or two things we know about concept drift—a survey on monitoring in evolving environments. Part A: Detecting concept drift.Frontiers in Artificial Intelligence, 7, 1330257.
[21] Kanungo T., Mount D. M., Netanyahu N. S., Piatko C. D., Silverman R., & Wu A. Y. (2002). An efficient k-means clustering algorithm: Analysis and implementation.IEEE transactions on pattern analysis and machine intelligence, 24(7), 881-892.
[22] Kelly M. G., Hand D. J., & Adams N. M. (1999). The impact of changing populations on classifier performance. In Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining(pp. 367-371). Association for Computing Machinery.
[23] Li C.-T., Shan M.-K., Jheng S.-H., & Chou K.-C. (2016). Exploiting concept drift to predict popularity of social multimedia in microblogs.Information Sciences, 339, 310-331.
[24] Lifna, C., & Vijayalakshmi, M. (2015). Identifying concept-drift in twitter streams.Procedia Computer Science, 45, 86-94.
[25] Lin T.-Y., Dollár P., Girshick R., He K., Hariharan B., & Belongie S. (2017). Feature pyramid networks for object detection. InProceedings of the IEEE conference on computer vision and pattern recognition (pp. 2117-2125).
[26] New York Times. (2018, March 20). Self-driving Uber car kills pedestrian in Arizona, where robots roam. The New York Times. https://www.nytimes.com/interactive/2018/03/20/us/self-driving-uber-pedestrian-killed.html
[27] Nishida, K., & Yamauchi, K. (2007). Detecting concept drift using statistical testing. In International conference on discovery science,(pp.264-269). Springer.
[28] Park, H. J. & Goel, A. (2021). Dynamic adjustment of concurrent neural networks within limited power thermal constraints in autonomous driving. In 2021 20th IEEE International Conference on Machine Learning and Applications (ICMLA)(pp. 879-884). IEEE.
[29] Perez, L., & Wang, J. (2017). The effectiveness of data augmentation in image classification using deep learning. arXiv. https://doi.org/10.48550/arXiv.1712.04621
[30] Rahutomo F., Kitasuka T., & Aritsugi M. (2012). Semantic cosine similarity. In The 7th international student conference on advanced science and technology ICAST (vol.4, pp.1). University of Seoul South Korea.
[31] Ren S., He K., Girshick R., & Sun J. (2016). Faster R-CNN: Towards real-time object detection with region proposal networks.IEEE transactions on pattern analysis and machine intelligence, 39(6),1137-1149.
[32] Salay, R. & Czarnecki, K. (2019). Improving ml safety with partial specifications. In Computer Safety, Reliability, and Security: SAFECOMP 2019 Workshops, ASSURE, DECSoS, SASSUR, STRIVE, and WAISE, Turku, Finland, September 10, 2019, Proceedings 38(pp. 288-300). Springer.
[33] Shahzad, M. & Alhoori, H. (2022). Public reaction to scientific research via twitter sentiment prediction.Journal of Data and Information Science, 7(1), 97-124.
[34] Soares E., Angelov P., Filev D., Costa B., Castro M., & Nageshrao S. (2019). Explainable density-based approach for self-driving actions classification. In 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)(pp. 469-474). IEEE.
[35] Srinivasan K., Raman K., Chen J., Bendersky M., & Najork M. (2021). WIT: Wikipedia-based image text dataset for multimodal multilingual machine learning. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval(pp. 2443-2449). Association for Computing Machinery. https://doi.org/10.1145/3404835.3463257
[36] Tsymbal, A. (2004). The problem of concept drift: definitions and related work.Computer Science Department, Trinity College Dublin, 106(2), 58.
[37] Wagstaff K., Cardie C., Rogers S., & Schrödl S. (2001). Constrained k-means clustering with background knowledge. In Proceedings of the 18th International Conference on Machine Learning (ICML ‘01)(pp. 577-584). Morgan Kaufmann Publishers Inc.
[38] Wang, H., & Abraham, Z. (2015). Concept drift detection for streaming data. In 2015 international joint conference on neural networks (IJCNN)(pp. 1-9). IEEE.
[39] Wang P., Yang A., Men R., Lin J., Bai S., Li Z., Ma J., Zhou C., Zhou J., & Yang H. (2022). OFA: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework. In International conference on machine learning(pp. 23318-23340). PMLR.
[40] Webb G. I., Hyde R., Cao H., Nguyen H. L., & Petitjean F. (2016). Characterizing concept drift.Data Mining and Knowledge Discovery, 30(4), 964-994.
[41] Xiang Q., Zi L., Cong X., & Wang Y. (2023). Concept drift adaptation methods under the deep learning framework: A literature review.Applied Sciences, 13(11), 6515.
[42] Xie S., Girshick R., Dollár P., Tu Z., & He K. (2017). Aggregated residual transformations for deep neural networks. InProceedings of the IEEE conference on computer vision and pattern recognition (pp.1492-1500).
[43] Xu L., Ding X., Peng H., Zhao D., & Li X. (2023). Adtcd: An adaptive anomaly detection approach toward concept drift in IoT.IEEE Internet of Things Journal, 10(18), 15931-15942.
[44] Zhang Y., Kang B., Hooi B., Yan S.,& Feng, J.(2021). Deep long-tailed learning: A survey. arXiv. https://doi.org/10.48550/arXiv.2110.04596
Outlines

/

京ICP备05002861号-43

Copyright © 2023 All rights reserved Journal of Data and Information Science

E-mail: jdis@mail.las.ac.cn Add:No.33, Beisihuan Xilu, Haidian District, Beijing 100190, China

Support by Beijing Magtech Co.ltd E-mail: support@magtech.com.cn