AI Agent(从 Chatbot 到生产级 Agent)

AI Agent是LLM应用的高级形态,能够自主规划、执行任务并与环境交互。

经典论文:

  • Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR. [论文链接] - ReAct框架,Agent核心范式
  • Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS. [论文链接]
  • Wang, G., et al. (2024). “A Survey on Large Language Model based Autonomous Agents.” arXiv. [论文链接] - Agent综述

一、整体演进路径

Chatbot → Tool Use → Agent → Production Agent(工业级)

演进说明:

  • Chatbot: 仅能对话,无法执行外部操作
  • Tool Use: 能调用单一工具
  • Agent: 能自主规划并执行多步骤任务
  • Production Agent: 工业级,具备错误处理、权限控制、监控等能力

二、核心概念拆解

1️⃣ Chatbot

LLM + Prompt

特点:

  • 只能对话
  • 无法执行外部操作
  • 无状态或弱状态

经典论文:

  • Vinyals, O., & Le, Q. (2015). “A Neural Conversational Model.” ICML Deep Learning Workshop. [论文链接]

2️⃣ Function / Tool / Function Calling

Function

真实执行代码的最小单元。

Tool

Function 的 LLM 可调用接口描述。

经典论文:

  • Schick, T., et al. (2023). “Toolformer: Language Models Can Teach Themselves to Use Tools.” NeurIPS. [论文链接] - 工具学习开创性工作
  • Parisi, A., et al. (2022). “TALM: Tool Augmented Language Models.” arXiv. [论文链接]

Function Calling

LLM 生成结构化调用请求,Runtime 执行函数。

技术实现:

  • OpenAI Function Calling API
  • Anthropic Tool Use
  • LangChain Tools

三、Agent 的本质

LLM + Tools + Loop

经典论文:

  • Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR. [论文链接]
  • Shinn, N., et al. (2023). “Reflexion: Language Agents with Verbal Reinforcement Learning.” NeurIPS. [论文链接]

Agent Loop

Thought(思考)→ Action(行动)→ Observation(观察)→ Thought...

ReAct框架示例:

Question: What is the elevation range for the area that the eastern sector of the Colorado orogeny extends into?

Thought 1: I need to search Colorado orogeny, find the area that the eastern sector of the Colorado orogeny extends into, then find the elevation range of that area.
Action 1: Search[Colorado orogeny]
Observation 1: The Colorado orogeny was an episode of mountain building...
Thought 2: I need to search eastern sector of the Colorado orogeny.
Action 2: Search[eastern sector of Colorado orogeny]
...

四、Agent Runtime

Agent Runtime是执行Agent逻辑的核心基础设施。

核心职责:

  • 解析 tool call
  • 执行函数/API
  • 控制循环(循环次数限制、超时控制)
  • 错误处理
  • 权限控制
  • 状态管理

主流框架:

  • LangChain
  • AutoGPT
  • BabyAGI
  • CrewAI
  • Microsoft AutoGen

经典论文:

  • Richards, T. (2023). “AutoGPT.” [GitHub]
  • Nakajima, Y. (2023). “BabyAGI.” [GitHub]

五、Tool Retrieval

通过 embedding 做语义匹配,从大量工具中选 Top-K 给 LLM。

经典论文:

  • Qin, Y., et al. (2023). “Tool Learning with Foundation Models.” arXiv. [论文链接] - 工具学习综述

流程:

用户问题 → Embedding → 工具库检索 → Top-K Tools → LLM选择调用

六、Skill

Skill = 多个 Tools 的组合能力

Skill是对复杂任务能力的封装,类似于人类的"技能"。

示例:

  • “写博客” Skill = 搜索工具 + 写作工具 + 图片生成工具
  • “数据分析” Skill = 数据查询工具 + 可视化工具 + 报告生成工具

七、生产级 Agent 架构

User → Retriever → LLM → Runtime → Tool → Observation → LLM → Response

生产级要求:

  • 错误处理与重试
  • 监控与日志
  • 权限控制
  • 成本控制
  • 安全审计

八、关键原则

  • Agent 上限由模型决定,下限由 Runtime 决定

    模型能力决定Agent能做什么,Runtime质量决定Agent的稳定性和可靠性。

  • 先检索再推理

    在执行任务前先检索相关信息和工具。

  • Skill 用于封装复杂能力

    将常用复杂任务封装为Skill,提高复用性和可靠性。


九、终极模型

组件 类比 作用
LLM 大脑 推理、规划、决策
Tools 执行具体操作
Skills 肌肉记忆 封装的复杂能力
Retriever 感知系统 获取信息和工具
Runtime 神经系统 协调执行、传递信息

十、总结

Agent = LLM + Tools + Loop + Runtime + Retrieval (+ Skills)

AI Agent代表了LLM应用的高级形态,从简单的对话系统进化为能够自主规划、执行复杂任务的智能体。随着技术发展,Agent将在更多场景中发挥重要作用。


参考文献

Prompt Engineering

  1. Brown, T. B., et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS.
  2. Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS.
  3. Kojima, T., et al. (2022). “Large Language Models are Zero-Shot Reasoners.” NeurIPS.
  4. Zhou, Y., et al. (2022). “Large Language Models Are Human-Level Prompt Engineers.” ICLR.

RAG

  1. Lewis, P., et al. (2020). “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” NeurIPS.
  2. Guu, K., et al. (2020). “REALM: Retrieval-Augmented Language Model Pre-Training.” ICML.
  3. Karpukhin, V., et al. (2020). “Dense Passage Retrieval for Open-Domain Question Answering.” EMNLP.
  4. Johnson, J., Douze, M., & Jégou, H. (2019). “Billion-scale similarity search with GPUs.” IEEE TBD.

AI Agent

  1. Yao, S., et al. (2022). “ReAct: Synergizing Reasoning and Acting in Language Models.” ICLR.
  2. Schick, T., et al. (2023). “Toolformer: Language Models Can Teach Themselves to Use Tools.” NeurIPS.
  3. Shinn, N., et al. (2023). “Reflexion: Language Agents with Verbal Reinforcement Learning.” NeurIPS.
  4. Wang, G., et al. (2024). “A Survey on Large Language Model based Autonomous Agents.” arXiv.