Google DeepMind2025年12月9日来源:Google DeepMind
FACTS基准套件:系统评估大模型事实性
FACTS Benchmark Suite: Systematically evaluating the factuality of large language models
摘要 · Summary
FACTS基准套件用于系统评估大语言模型的事实准确性。
FACTS is a benchmark suite designed to systematically evaluate the factuality of large language models.