Google DeepMind2025年12月9日来源：Google DeepMind

FACTS基准套件：系统评估大模型事实性

FACTS Benchmark Suite: Systematically evaluating the factuality of large language models

摘要 · Summary

FACTS基准套件用于系统评估大语言模型的事实准确性。

FACTS is a benchmark suite designed to systematically evaluate the factuality of large language models.