Exploring variation in research priorities generated by AI tools.
Artificial intelligence (AI) tools based on large language models (LLMs) are being increasingly used by researchers and may play a role in health-related research priority-setting exercises (RPSEs). However, little is known about how these tools may differ in the types of research priorities they generate.
We examined research priorities aimed at improving treatments for four diseases: cancer, COVID-19, HIV, and Alzheimer. We compared the outputs from five AI tools (DeepSeek, ChatGPT, Claude, Perplexity, and Gemini) using SBERT-BioBERT embeddings and cosine similarity scores, and assessed the stability of differences between them by re-running identical prompts and slightly modified versions.
We found that the outputs produced by Gemini were highly similar to those produced by the other tools. The two most different outputs were those produced by DeepSeek and Perplexity, whereby the former tended to emphasise technical medical issues, while the latter emphasised public health concerns. This substantive distinction between DeepSeek and Perplexity remained stable across repeated and tweaked prompts.
Our exploratory analysis suggests that Gemini performs well for researchers who prefer to generate health-related research priorities using a single AI model. For those planning to draw on multiple models, Perplexity and DeepSeek offer complementary perspectives.
We examined research priorities aimed at improving treatments for four diseases: cancer, COVID-19, HIV, and Alzheimer. We compared the outputs from five AI tools (DeepSeek, ChatGPT, Claude, Perplexity, and Gemini) using SBERT-BioBERT embeddings and cosine similarity scores, and assessed the stability of differences between them by re-running identical prompts and slightly modified versions.
We found that the outputs produced by Gemini were highly similar to those produced by the other tools. The two most different outputs were those produced by DeepSeek and Perplexity, whereby the former tended to emphasise technical medical issues, while the latter emphasised public health concerns. This substantive distinction between DeepSeek and Perplexity remained stable across repeated and tweaked prompts.
Our exploratory analysis suggests that Gemini performs well for researchers who prefer to generate health-related research priorities using a single AI model. For those planning to draw on multiple models, Perplexity and DeepSeek offer complementary perspectives.