Prompt Engineering

https://www.coursera.org/learn/chatgpt-prompt-engineering-for-developers-project/home/welcome

Prompt Engineering(提示工程)是设计和优化输入提示(prompts)以引导大语言模型生成期望输出的技术。这一领域随着GPT-3的发布而兴起,是连接用户意图与模型能力的关键桥梁。

经典论文:

  • Brown, T. B., et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS. [论文链接] - GPT-3论文,首次系统展示了prompt engineering的潜力
  • Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS. [论文链接]

1. Prompting Principles(提示原则)

Principle 1: Write clear and specific instructions(编写清晰具体的指令)

清晰性(Clarity)和具体性(Specificity)是prompt设计的核心原则。研究表明,模糊的指令会导致模型输出不确定性增加,而具体的指令能显著提升输出质量。

  • Tactic 1: Use delimiters to clearly indicate distinct parts of the input

    分隔符(如 ```、"""、—、<>)能帮助模型识别输入的不同部分,避免混淆。这在处理包含多段文本或代码的任务时尤为重要。

  • Tactic 2: Ask for a structured output

    要求结构化输出(如JSON、HTML、Markdown表格)便于后续程序处理,是构建AI应用的最佳实践。

  • Tactic 3: Ask the model to check whether conditions are satisfied

    让模型先验证条件再执行任务,可以减少错误输出。

  • Tactic 4: “Few-shot” prompting - give examples

    Few-shot prompting通过提供少量示例来引导模型理解任务格式和期望输出。

    经典论文:

    • Brown, T. B., et al. (2020). “Language Models are Few-Shot Learners.” NeurIPS. [论文链接] - 首次提出few-shot learning概念
    • Min, S., et al. (2022). “Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?” EMNLP. [论文链接]

Principle 2: Give the model time to “think”(给模型思考时间)

这一原则源于Chain-of-Thought(思维链)研究,让模型通过逐步推理来解决复杂问题。

经典论文:

  • Wei, J., et al. (2022). “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models.” NeurIPS. [论文链接] - CoT开山之作

  • Kojima, T., et al. (2022). “Large Language Models are Zero-Shot Reasoners.” NeurIPS. [论文链接] - “Let’s think step by step"的起源

  • Wang, X., et al. (2022). “Self-Consistency Improves Chain of Thought Reasoning in Language Models.” ICLR. [论文链接]

  • Tactic 1: Specify the steps required to complete a task

    明确列出任务步骤,引导模型按步骤执行。

  • Tactic 2: Instruct the model to work out its own solution before rushing to a conclusion

    让模型先独立推导答案再给出结论,避免直接猜测。

About Hallucinations(关于幻觉)

Hallucination(幻觉)是指模型生成看似合理但实际上错误或虚构的信息。

经典论文:

  • Maynez, J., et al. (2020). “On Faithfulness and Factuality in Abstractive Summarization.” ACL. [论文链接]
  • Zhang, Y., et al. (2023). “Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models.” arXiv. [论文链接]

缓解策略:

  • Ask model to find relative information first and answer questions based on the information
  • 要求模型标注信息来源
  • 使用RAG系统提供可靠上下文

2. Iterative Prompt Development(迭代式提示开发)

提示工程是一个迭代过程,需要不断测试和优化。

经典方法论:

  • Zhou, Y., et al. (2022). “Large Language Models Are Human-Level Prompt Engineers.” ICLR. [论文链接] - APE (Automatic Prompt Engineer)

迭代流程:

  • Try something(尝试初始prompt)
  • Analyze where the result does not give what you want(分析失败原因)
  • Clarify instructions, give more time to think(优化指令)
  • Refine prompts with a batch of examples(用批量示例优化)

3. Summary(摘要任务)

文本摘要是最常见的NLP任务之一,LLM在此任务上表现优异。

经典论文:

  • See, A., et al. (2017). “Get To The Point: Summarization with Pointer-Generator Networks.” ACL. [论文链接]
  • Liu, Y., & Lapata, M. (2019). “Text Summarization with Pretrained Encoders.” EMNLP. [论文链接]

技术要点:

  • Summarise with a word/sentence/character limit(限制长度)
  • Summarise with a focus on certain topics such as shipping and delivery(聚焦特定主题)
  • Try “extract” instead of “summarise”: for certain topic you want to see(提取而非摘要)
  • Summarise multiple product reviews(多文档摘要)

4. Inferring(推理任务)

LLMs are pretty good at extracting specific things out of a piece of text.

LLM在文本推理任务上表现出色,能够从非结构化文本中提取结构化信息。

经典论文:

  • Radford, A., et al. (2019). “Language Models are Unsupervised Multitask Learners.” - GPT-2论文,展示zero-shot能力 [论文链接]

应用场景:

  • Sentiment: positive or negative(情感分析)

    经典论文:Socher, R., et al. (2013). “Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank.” EMNLP.

  • Identity types of emotions: happy, grateful, anger etc(情绪分类)

    经典论文:Mohammad, S. M., & Turney, P. D. (2013). “Crowdsourcing a Word-Emotion Association Lexicon.” Computational Intelligence.

  • Identify specific emotion: such as ‘is the writer expressing anger?’(特定情绪识别)

  • Extract product and company name from customer reviews(命名实体识别)

    经典论文:Nadeau, D., & Sekine, S. (2007). “A survey of named entity recognition and classification.” Lingvisticae Investigationes.

  • Infer some topics, ask LLM give answer in JSON format(主题推理)

    经典论文:Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). “Latent Dirichlet Allocation.” JMLR. - LDA主题模型开山之作

  • Make a news alert for certain topics: Zero-Shot learning(零样本学习)

    经典论文:

    • Radford, A., et al. (2021). “Learning Transferable Visual Models From Natural Language Supervision.” ICML. - CLIP论文 [论文链接]
    • Xian, Y., et al. (2017). “Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly.” TPAMI. [论文链接]

5. Transforming(转换任务)

LLM擅长各种文本转换任务,包括语言翻译、风格转换、格式转换等。

经典论文:

  • Vaswani, A., et al. (2017). “Attention Is All You Need.” NeurIPS. [论文链接] - Transformer开山之作,奠定现代NLP基础

转换类型:

  • Language(语言翻译)

    经典论文:Wu, Y., et al. (2016). “Google’s Neural Machine Translation System.” arXiv. [论文链接]

  • Tone: informal to formal(语气转换)

  • Format Conversion: JSON to HTML(格式转换)

  • Spellcheck/Grammar check or given style such as APA style(语法检查)

    经典论文:Yuan, Z., et al. (2021). “Synthesizing Coherent Story with Generative Pre-trained Transformer.” IJCAI.

6. Expanding(扩展任务)

文本扩展是将简短输入扩展为更详细输出的任务。

经典论文:

  • Fan, A., et al. (2018). “Hierarchical Neural Story Generation.” ACL. [论文链接]

技术要点:

  • Customise the automated reply to a customer email
  • Remind the model to use details from the customer’s email
  • Use Temperature to control the answer: more creative or stable

Temperature参数详解

Temperature是控制LLM输出随机性的关键参数,源自统计力学和模拟退火算法。

经典论文:

  • Ackley, D. H., Hinton, G. E., & Sejnowski, T. J. (1985). “A Learning Algorithm for Boltzmann Machines.” Cognitive Science. - 模拟退火理论基础
  • Ficler, J., & Goldberg, Y. (2017). “Controlling Linguistic Style Aspects in Neural Language Generation.” EMNLP. [论文链接]

7. Chatbot(聊天机器人)

聊天机器人是LLM最直接的应用形式。

经典论文:

  • Vinyals, O., & Le, Q. (2015). “A Neural Conversational Model.” ICML Deep Learning Workshop. [论文链接]
  • Zhang, Y., et al. (2020). “DIALOGPT: Large-Scale Generative Pre-training for Conversational Response Generation.” ACL. [论文链接]

三种角色(Three Roles)

现代聊天机器人系统通常采用三种角色架构:

  • System: 定义AI助手的整体行为和人设
  • User: 用户输入
  • Assistant: 模型回复

Chatbot will obey system roles firstly, so we can use this to finish specific tasks.

## example
messages =  [  
{'role':'system', 'content':'You are an assistant that speaks like Shakespeare.'},    
{'role':'user', 'content':'tell me a joke'},   
{'role':'assistant', 'content':'Why did the chicken cross the road'},   
{'role':'user', 'content':'I don\'t know'}  ]

Context(上下文)

Context: all messages including 3 roles are needed for the next response of chatbot.

上下文管理是聊天机器人系统的核心技术挑战,涉及如何有效利用历史对话信息。

经典论文:

  • Bae, S., et al. (2022). “Keep Me Updated! Memory Management in Long-term Conversations.” EMNLP. [论文链接]