AI Agent(从 Chatbot 到生产级 Agent)
AI Agent是LLM应用的高级形态,能够自主规划、执行任务并与环境交互。
经典论文:
- Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR. [论文链接] - ReAct框架,Agent核心范式
- Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS. [论文链接]
- Wang, G., et al. (2024). “A Survey on Large Language Model based Autonomous Agents.” arXiv. [论文链接] - Agent综述
一、整体演进路径
Chatbot → Tool Use → Agent → Production Agent(工业级)
演进说明:
- Chatbot: 仅能对话,无法执行外部操作
- Tool Use: 能调用单一工具
- Agent: 能自主规划并执行多步骤任务
- Production Agent: 工业级,具备错误处理、权限控制、监控等能力
二、核心概念拆解
1️⃣ Chatbot
LLM + Prompt
特点:
- 只能对话
- 无法执行外部操作
- 无状态或弱状态
经典论文:
- Vinyals, O., & Le, Q. (2015). “A Neural Conversational Model.” ICML Deep Learning Workshop. [论文链接]
2️⃣ Function / Tool / Function Calling
Function
真实执行代码的最小单元。
Tool
Function 的 LLM 可调用接口描述。
经典论文:
- Schick, T., et al. (2023). “Toolformer: Language Models Can Teach Themselves to Use Tools.” NeurIPS. [论文链接] - 工具学习开创性工作
- Parisi, A., et al. (2022). “TALM: Tool Augmented Language Models.” arXiv. [论文链接]
Function Calling
LLM 生成结构化调用请求,Runtime 执行函数。
技术实现:
- OpenAI Function Calling API
- Anthropic Tool Use
- LangChain Tools
三、Agent 的本质
LLM + Tools + Loop
经典论文:
- Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR. [论文链接]
- Shinn, N., et al. (2023). “Reflexion: Language Agents with Verbal Reinforcement Learning.” NeurIPS. [论文链接]
Agent Loop
Thought(思考)→ Action(行动)→ Observation(观察)→ Thought...
ReAct框架示例:
Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?
Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of that area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building...
Thought 2: I need to search eastern sector of the Colorado orogeny.
Action 2: Search[eastern sector of Colorado orogeny]
...
四、Agent Runtime
Agent Runtime是执行Agent逻辑的核心基础设施。
核心职责:
- 解析 tool call
- 执行函数/API
- 控制循环(循环次数限制、超时控制)
- 错误处理
- 权限控制
- 状态管理
主流框架:
- LangChain
- AutoGPT
- BabyAGI
- CrewAI
- Microsoft AutoGen
经典论文:
五、Tool Retrieval
通过 embedding 做语义匹配,从大量工具中选 Top-K 给 LLM。
经典论文:
- Qin, Y., et al. (2023). “Tool Learning with Foundation Models.” arXiv. [论文链接] - 工具学习综述
流程:
用户问题 → Embedding → 工具库检索 → Top-K Tools → LLM选择调用
六、Skill
Skill = 多个 Tools 的组合能力
Skill是对复杂任务能力的封装,类似于人类的"技能"。
示例:
- “写博客” Skill = 搜索工具 + 写作工具 + 图片生成工具
- “数据分析” Skill = 数据查询工具 + 可视化工具 + 报告生成工具
七、生产级 Agent 架构
User → Retriever → LLM → Runtime → Tool → Observation → LLM → Response
生产级要求:
- 错误处理与重试
- 监控与日志
- 权限控制
- 成本控制
- 安全审计
八、关键原则
-
Agent 上限由模型决定,下限由 Runtime 决定
模型能力决定Agent能做什么,Runtime质量决定Agent的稳定性和可靠性。
-
先检索再推理
在执行任务前先检索相关信息和工具。
-
Skill 用于封装复杂能力
将常用复杂任务封装为Skill,提高复用性和可靠性。
九、终极模型
| 组件 | 类比 | 作用 |
|---|---|---|
| LLM | 大脑 | 推理、规划、决策 |
| Tools | 手 | 执行具体操作 |
| Skills | 肌肉记忆 | 封装的复杂能力 |
| Retriever | 感知系统 | 获取信息和工具 |
| Runtime | 神经系统 | 协调执行、传递信息 |
十、总结
Agent = LLM + Tools + Loop + Runtime + Retrieval (+ Skills)
AI Agent代表了LLM应用的高级形态,从简单的对话系统进化为能够自主规划、执行复杂任务的智能体。随着技术发展,Agent将在更多场景中发挥重要作用。
参考文献
Prompt Engineering
- Brown, T. B., et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS.
- Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS.
- Kojima, T., et al. (2022). “Large Language Models are Zero-Shot Reasoners.” NeurIPS.
- Zhou, Y., et al. (2022). “Large Language Models Are Human-Level Prompt Engineers.” ICLR.
RAG
- Lewis, P., et al. (2020). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS.
- Guu, K., et al. (2020). “REALM: Retrieval-Augmented Language Model Pre-Training.” ICML.
- Karpukhin, V., et al. (2020). “Dense Passage Retrieval for Open-Domain Question Answering.” EMNLP.
- Johnson, J., Douze, M., & Jégou, H. (2019). “Billion-scale similarity search with GPUs.” IEEE TBD.
AI Agent
- Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR.
- Schick, T., et al. (2023). “Toolformer: Language Models Can Teach Themselves to Use Tools.” NeurIPS.
- Shinn, N., et al. (2023). “Reflexion: Language Agents with Verbal Reinforcement Learning.” NeurIPS.
- Wang, G., et al. (2024). “A Survey on Large Language Model based Autonomous Agents.” arXiv.