工程实践筛选 · Jay · 2026-06-21 上午（第二轮补充）

本次主题

GitHub Trending 2026 AI 工程生态快照 + Inference 优化实战数据

高价值条目

来源: ossinsight.io/trending/ai
URL: https://ossinsight.io/trending/ai
可信度: 高（GitHub API 实时数据）
工程价值: ⭐⭐⭐⭐

核心内容 - 实时 Star 排行（截至 2026-06）:

Rank	Repo	Stars	28d Growth	类别
1	Significant-Gravitas/AutoGPT	175,297	+30	AI Agents
2	ollama/ollama	147,846	+124	Inference
3	langchain-ai/langchain	116,717	+96	AI Agents
4	langgenius/dify	111,505	+133	LLM Tools
5	open-webui/open-webui	106,019	+139	Inference
6	ggml-org/llama.cpp	90,551	+192	Inference
7	nomic-ai/gpt4all	73,245	+4	Inference
8	zed-industries/zed	68,663	+58	Coding Agents
9	infiniflow/ragflow	61,470	+63	RAG
...	...	...	...	...
19	Mintplex-Labs/anything-llm	48,120	+48	RAG
20	openai/codex	44,720	+286	Coding Agents
21	anthropics/claude-code	44,651	+252	Coding Agents
26	mem0ai/mem0	40,018	+88	LLM Tools
27	lm-sys/FastChat	37,878	+2	Inference
28	crewAIInc/crewAI	37,647	+66	AI Agents
...	...	...	...	...
30	milvus-io/milvus	40,697	+12	Vector DB

Top Movers（28天增速）: - opencode (anomalyco): +405 ⭐ — Coding Agents - openai/codex: +286 ⭐ - anthropics/claude-code: +252 ⭐ - ggml-org/llama.cpp: +192 ⭐ - block/goose: +168 ⭐ — Coding Agents

保留理由: 实时 GitHub star 数据，反应当前工程社区实际采用的工具。llama.cpp 192 星/月的增速说明本地推理需求仍在爆发；dify +133 和 open-webui +139 说明 low-code agent 工具热度不减；codex/claude-code 的高速增长印证 coding agent 元年。

建议写入: ai-ecosystem-tools 主题页，或作为每周 trending 快照

2. Spheron: Why Your LLM Inference Is Slow（真实 Before/After 对比表）

来源: Spheron Blog
URL: https://www.spheron.network/blog/llm-inference-slow
可信度: 高（工程团队实测）
工程价值: ⭐⭐⭐⭐

核心内容 - Optimization Before/After Benchmarks:

优化项	优化前	优化后	提升幅度
70B 从 CPU offload 迁移到 H100 FP8	~2 tok/s	~40 tok/s	~20x
静态 batching → 连续 batching（vLLM）	40% GPU util	85% GPU util	~2x throughput
FP16 → FP8 on H100（70B）	~25 tok/s	~42 tok/s	~1.7x
FP8 → FP4 on B200（70B, MLPerf 估算）	~4,350 tok/s	~12,840 tok/s	~2.9x
标准 attention → FlashAttention 3 @ 32K ctx	~120ms TTFT	~55ms TTFT	~2.2x TTFT
无 PagedAttention → 有 PagedAttention	15 并发	60 并发	4x 并发
冷启动每次下载模型 → 缓存	20min 冷启动	<30s 冷启动	~40x 启动

额外关键信息: - FlashAttention 3 (Hopper/H100, H200): further optimized for HBM3 - FlashAttention 4 (Blackwell/SM100+): default backend in vLLM v0.17.0+ for B200 - vLLM v0.17+ (released March 2026) has FA 3/4 auto-enabled - 诊断命令: python -c "import vllm; print(vllm.__version__)"

保留理由: 量化了各种优化技术的实际收益，提供了明确的 before/after 数据。最有价值的是连续 batching 和 PagedAttention 的真实并发提升数据（4x），以及 FlashAttention 版本与 GPU 代的对应关系。

建议写入: inference-optimization 主题页（优化收益表）

3. nanochat (Andrej Karpathy) - 最小可训练 LLM 全栈

来源: GitHub (Andrej Karpathy)
URL: https://github.com/...（见原文）
Stars: 55k
可信度: 高（Karpathy 出品）
工程价值: ⭐⭐⭐⭐

核心内容: - 将完整 LLM 训练栈整合到一个 cohesive、minimal、readable、可 hack 的仓库 - 全流程在一个仓库: tokenization、pretraining、finetuning、evaluation、inference、chat UI - 每个步骤都可见可改，而非隐藏在抽象层后面

保留理由: 教育价值极高，但也可作为工程参考——理解全栈 LLM 系统的最小化实现。对比工业级推理引擎（如 vLLM），nanochat 代表了理解底层原理的另一端。

建议写入: llm-fundamentals 或 learning-resources

4. andrej-karpathy-skills: Claude Code 行为规范

来源: GitHub
Stars: 156k（2026 增速最快的 AI workflow repo 之一）
可信度: 高（Karpathy 病毒式观察 → 社区响应）
工程价值: ⭐⭐⭐

核心内容: - 将 Karpathy 对 LLM coding agents 痛点的观察编码为 CLAUDE.md 的 4 条硬规则 - 1 file, 0 runtime dependencies

保留理由: 156k stars 说明工程社区对此有强烈共鸣。4 条规则可作为 coding agent 使用最佳实践参考。但本质上是一个 .md 文件，价值在于工程文化洞察而非代码本身。

建议写入: ai-engineering-practices 或 agent-tooling

丢弃条目

条目	丢弃理由
ByteByteGo AI GitHub Repos 2026	内容与 OSSInsight 高度重叠，且无实时 star 数字
Firecrawl Best GitHub Repos	主观推荐为主，缺乏客观数据支撑
Kunal Ganglani Open Source AI Projects	商业推广感强，内容偏软

分类标签

github-trending ollama vllm llama.cpp dify open-webui coding-agents inference-optimization

建议写入路径

/shared/research-kb/inbox/jay/2026-06-21-engineering-inference-round2-supplement.md（本文档）
GitHub Star 排行数据 → ai-ecosystem-overview.md 或 github-trending-weekly.md
Optimization Before/After 表 → inference-optimization-practical.md

本轮未写入说明

本轮内容为补充扫描，未发现需要立即精读的新内容。第二轮内容建议： - OSSInsight 排行数据可作为周更 Trending 快照沉淀 - Spheron 的 Before/After 表有实用价值，建议归入 inference optimization 主题页 - nanochat 和 karpathy-skills 属社区高影响力项目，值得在资源页收录

后续行动

⭐ nanochat 值得深入分析其全栈架构，理解 pretraining → serving 的最小化实现
⭐ Claude Code / Codex 的 +252/+286 月增速值得单独研究——coding agent 元年是否真正到来
⭐ 建议在知识库中建立「优化收益表」作为工程决策参考文档