Digitisation

NIH develops AI agent to improve accuracy of gene set analysis

The system, called GeneAgent, cross-checks its own initial predictions for accuracy against information from established, expert-curated databases

Researchers at the National Institutes of Health (NIH) have developed an artificial intelligence (AI) agent powered by a large language model (LLM) that creates more accurate and informative descriptions of biological processes and their functions in gene set analysis than current systems.

The system, called GeneAgent, cross-checks its own initial predictions for accuracy against information from established, expert-curated databases and returns a verification report detailing its successes and failures. The AI agent can help researchers interpret high-throughput molecular data and identify relevant biological pathways or functional modules, which can lead to a better understanding of how different diseases and conditions affect groups of genes individually and together.

AI-generated content is produced by LLMs trained on enormous amounts of text data from across the internet. LLMs use those data to recognize patterns and predict what words might follow each other in a sentence. However, LLMs are not designed to verify truth, meaning AI-generated content can be false, misleading, or fabricated, a phenomenon called AI hallucinations. Additionally, LLMs are prone to circular reasoning—fact-checking their generated results against their own data—which makes them sound more confident in the output even when the information is false.

Staving off AI hallucinations is important when using LLM tools for gene set analysis—the process of generating collective functional descriptions of grouped genes and their potential interactions. Previous studies that taught LLMs to answer genomic questions or summarize biological processes in a given gene set did not explicitly address hallucinations in the generated content.

GeneAgent mitigates this issue by taking its own claims and independently comparing them to established knowledge compiled in external, expert-curated databases. The research team first tested GeneAgent on 1,106 gene sets sourced from existing databases with known functions and process names. For each gene set, GeneAgent first generated an initial list of functional claims. It then independently used its self-verification agent module to cross-check these claims against the curated databases and create a verification report that noted whether each of its claims was supported, partially supported, or refuted.

To best determine its accuracy in the self-verification step, the researchers next brought in two human experts to manually review 10 randomly selected gene sets with a cumulative 132 claims and judge whether GeneAgent’s self-verification reports were correct, partially correct, or incorrect. Of the self-verification reports generated by GeneAgent, the experts determined that 92% of its decisions were correct, indicating high performance in its ability to conduct self-verification, especially when compared to GPT-4. Their detailed review confirmed the model’s effectiveness in minimizing hallucinations and generating more reliable analytical narratives.