1 Introduction
2 Methods and materials
2.1 Data
2.2 LLM-generated text detectors
2.3 Alternative detector
Figure 1. A schematic view of the LLM-Assisted Writing (LAW) detector. The detection process consists of two phases: First, during training, manuscripts are converted into vectors representing the author’s writing style using the technique provided in (Lazebnik & Rosenfeld, 2023). The average change and standard deviation of the presented writing style are measured to capture the dynamics in one’s writing style. Then, during inference, for each manuscript, we examine whether the change in its author’s writing style is substantial enough to be considered an anomaly and whether this anomaly is aligned with the style of an LLM-generated manuscript of the same title and abstract. If both conditions are met, the manuscript is deemed as an LLM-assisted manuscript. |
3 Results
Table 1. The performance of the examined detectors (columns) on the assessment set (first row) and the false-positive set (second row). The performance is presented as the accuracy with the F1-score in brackets (for the assessment set) and as the false positive rate (for the false-positive set). |
Model | LLMDet | DetectLLM | ZipPy | ConDA | LAW |
---|---|---|---|---|---|
Accuracy | 0.546 | 0.591 | 0.637 | 0.637 | 0.727 |
F1-score | 0.286 | 0.471 | 0.600 | 0.600 | 0.700 |
Recall | 0.334 | 0.534 | 0.627 | 0.627 | 0.700 |
Precision | 0.250 | 0.421 | 0.575 | 0.575 | 0.700 |
False Positive | 17.2% | 13.8% | 9.7% | 8.8% | 3.1% |
4 Discussion
Data availability
Author contributions
Statement of using AIGC
Appendix
1 Assessment set: List of publications
Table A1. List of manuscripts included in the assessment set. |
LLM-assisted Writing | Counterpart |
---|---|
Osterrieder, J., GPTChat, A Primer on Deep Reinforcement Learning for Finance, SSRN (2023) | Finance, F., Osterrieder, J., Generative Adversarial Networks in finance: an overview, arXiv (2021) |
Biswas, S., Will ChatGPT take my Job? Replies and Advice by ChatGPT, SSRN (2023) | Biswas, S., Role of Sonography in Ocular Trauma: A Study, ARC Journal of Surgery (2021) |
Askr, H., Darwish, A., Hassanien, A.E., ChatGPT, The Future of Metaverse in the Virtual Era and Physical World: Analysis and Applications. Studies in Big Data (2023) | Gad, I., Hassanien, A. E., A wind turbine fault identification using machine learning approach based on pigeon inspired optimizer, Tenth International Conference on Intelligent Computing and Information Systems (2021) |
King, M. R., chatGPT, A Conversation on Artificial Intelligence, Chatbots, and Plagiarism in Higher Education, Cellular and Molecular Bioengineering (2023) | King, M. R., CMBE Moves to the Structured Abstract Format: A Note from the Editor, Cellular and Molecular Bioengineering (2017) |
Kung et al., Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models, medRxiv (2022) | Kung, H. K., Host physician perspectives to improve predeparture training for global health electives, medical education (2017) |
O’Connor S., Open artificial intelligence platforms in nursing education: Tools for academic progress or abuse?, Nurse Education in Practice (2022) | O’Connor S., Exoskeletons in Nursing and Healthcare: A Bionic Future, Clinical nursing research (2021) |
Rossoni, L., A inteligencia artificial e eu: escrevendo o ˆ editorial juntamente com o ChatGPT, Revista Eletronica ˆ de Ciencia Administrativa (2022) | Rossoni, L., Editorial: A RECADM no Redalyc e o Dilema das Bases e Indexadores, Revista Eletronica de ˆ Ciencia Administrativa (2021) |
chatGPT, Zhavoronkov, A., Rapamycin in the context of Pascal’s Wager: generative pre-trained transformer perspective, Oncoscience (2022) | Zhavoronkov, A., The inherent challenges of classifying senescence, Science (2020) |
Biswas, S., ChatGPT and the Future of Medical Writing, Radiology (2023) | Biswas, S., Biswas, S., A Study on penile doppler, MedCrave Online Journal of Surgery (2017) |
Lazebnik, T., ChatGPT, The Impact of Fruit and Vegetable Consumption and Physical Activity on Diabetes Risk among Adults, arXiv (2022) | Lazebnik, T., Bunimovich-Mendrazitsky, S., The Signature Features of COVID-19 Pandemic in a Hybrid Mathematical Model—Implications for Optimal Work-School Lockdown Policy, Advanced Theory and Simulations (2021) |
BaHammam, A. S., Trabelsi, K., Pandi-Perumal, S. R., Jahrami, H., Adapting to the Impact of AI in Scientific Writing: Balancing Benefits and Drawbacks while Developing Policies and Regulations, Journal of Nature and Science of Medicine (2023) | Akhtar, N., Ravi Gupta, S.R. Pandi-Perumal, Ahmed S. BaHammam: Clinical Atlas of Polysomnography: A Book Review, Sleep and Vigilance (2021) |
2 False-positive set: Curation process
3 Further statistical analysis
Pair-wise comparisons
Table A2. Pair-wise comparison between the five detectors. The results are shown as p value with the statistics in brackets. Each cell contains the results for the assessment set on the left, and the results for the false positive set on the right. |
LLMDet | DetectLLM | ZipPy | ConDA | |
---|---|---|---|---|
DetectLLM | 0.66(0.19)/< 0.01(10.45) | |||
ZipPy | 0.38(0.78)/< 0.01(69.63) | 0.66(0.20)/< 0.01(20.96) | ||
ConDA | 0.06(3.67)/< 0.01(95.71) | 0.66(0.20)/< 0.01(34.21) | 1.0(0.0)/0.28(1.13) | |
LAW | 0.01(0.03)/< 0.01(729.19) | 0.15(2.06)/< 0.01(34.21) | 0.34(0.92)/< 0.01(161.74) | 0.34(0.92)/< 0.01(120.46) |
Agreement levels
Table A3. Pairwise Cohan’s κs calculated for the five detectors. Each cell contains the results for the assessment set on the left, and the results for the false positive set on the right. |
DetectLLM | ZipPy | ConDA | LAW | |
---|---|---|---|---|
LLMDet | 0.86 / 0.82 | 0.68 / 0.74 | 0.67 / 0.72 | 0.63 / 0.69 |
DetectLLM | 0.72 / 0.76 | 0.67 / 0.75 | 0.59 / 0.62 | |
ZipPy | 0.86 / 0.96 | 0.77 / 0.88 | ||
ConDA | 0.81 / 0.90 |