Google Gemini 1.5 Flash scores were compared with ChatGPT 4o-mini on evaluations of (a) 51 of the author’s journal articles and (b) up to 200 articles in each of 34 field-based Units of Assessment (UoAs) from the UK Research Excellence Framework (REF) 2021. From (a), the results suggest that Gemini 1.5 Flash, unlike ChatGPT 4o-mini, may work better when fed with a PDF or article full text, rather than just the title and abstract. From (b), Gemini 1.5 Flash seems to be marginally less able to predict an article’s research quality (using a departmental quality proxy indicator) than ChatGPT 4o-mini, although the differences are small, and both have similar disciplinary variations in this ability. Averaging multiple runs of Gemini 1.5 Flash improves the scores.
Mike Thelwall
. Is Google Gemini better than ChatGPT at evaluating research quality?[J]. Journal of Data and Information Science, 0
: 1
-1
.
DOI: 10.2478/jdis-2025-0014
[1] Thelwall, M. (2025). Evaluating research quality with Large Language Models: An analysis of ChatGPT’s effectiveness with different settings and inputs. Journal of Data and Information Science, https://doi.org/10.2478/jdis-2025-0011
[2] Thelwall, M., & Yaghi, A. (2024). In which fields can ChatGPT detect journal article quality? An evaluation of REF2021 results. https://arxiv.org/abs/2409.16695