1️⃣2️⃣ arXiv · RAGPerf: End-to-End RAG Benchmarking Framework（⭐⭐⭐ 参考）

可复用信息

- 标签：Agentic RAG Enterprise FinanceBench Evaluation
- 链接：https://arxiv.org/html/2510.13910v2
- 细粒度评估 Agentic RAG 组件能力（规划/检索/推理中间过程），而非仅端到端 QA；揭示中间步骤错误如何级联影响最终答案
- 标签：Agentic RAG Benchmark Component Evaluation
- 链接：https://arxiv.org/html/2603.10765v1
- 端到端 RAG Benchmark，覆盖 Wikipedia (6.41M 条目)、Arxiv (30K PDFs)、GitHub Code (11M)、The People's Speech (300K 音频)；支持数据/张量/流水线并行配置
- 标签：RAG Benchmark End-to-End Performance
- 链接：https://recsys.substack.com/p/is-bm25-enough-for-agentic-deep-research