Link to paper
The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract
- Large language models can use chain-of-thought prompting to find answers.
- Creating many prompts by hand is costly.
- Synthetic prompting uses a few handcrafted examples to generate more examples.
- Synthetic prompting alternates between backward and forward processes.
- Evaluations show Synthetic prompting outperforms existing prompting techniques.
Paper Content
Introduction
- Few-shot demonstrations can enable LLMs to perform tasks without fine-tuning
- Chain-of-thought prompting can further improve LLMs’ performance
- Quality of demonstrations is important for complex reasoning tasks
- SYNTHETIC PROMPTING augments a limited set of demonstrations with self-synthesized examples
- In-cluster complexity based scheme is proposed to select diverse and informative demonstrations
- SYNTHETIC PROMPTING achieves up to 15.6% absolute gains over state-of-the-art methods
Related work
- LLMs can learn to perform tasks by mimicking in-context demonstrations
- Instruction tuning trains a language model on diverse tasks to generate desirable outputs
- In-context learning-based applications include text generation, dialogue generation, and resource construction
- Chain-of-thought prompting prompts LLMs to arrive at an answer after a step-by-step reasoning process
- Metrics for selecting demonstrations include diversity, reasoning complexity, and similarity with a test input
- Knowledge distillation from LLMs into symbolic knowledge is possible
Synthetic prompting
Overview
- LLMs can be used to perform reasoning tasks with a few examples
- A backward-forward procedure is used to automatically synthesize more examples
- During inference, the LLM is prompted with self-synthesized demonstrations
- A new scheme is used to ensure diversity and informativeness of the demonstrations
Example synthesis phase
- Automatically synthesize more examples by repeating a backward-forward process
- Each synthetic example is a question and reasoning chain pairs
- Reasoning chains are snippets of code, answers are obtained by executing the code
- Backward process generates a reasoning chain and then a question
- Question is conditioned on a given topic word, target reasoning complexity, and self-generated reasoning chain
- Forward process generates a more precise reasoning chain for the question
- Question and reasoning chain constitute a synthetic example
- Topic words control the type of reasoning
- Target complexity controls the complexity of the synthesized questions
- Majority voting used to ensure answer is confident
Inference phase
- Selecting demonstrations based on complexity can improve model performance
- Selecting demonstrations based on similarity may introduce biases
- Selecting demonstrations that are complementary to each other may help the model fuse knowledge
- Proposed an in-cluster complexity based scheme to select demonstrations that are both complex and complementary
Experiments
- Prompting baselines use all provided gold examples to construct prompts for inference.
- Synthetic prompting and its variants synthesize examples using the provided examples.
- 8 synthetic demonstrations are selected based on in-cluster complexity.
- Seed examples and synthetic prompts are provided in the Supplementary Materials.
Baselines
- Direct prompting prompts LLMs to generate answers with input-answer pairs.
- Chain-of-thought prompting prompts LLMs to generate natural language reasoning steps followed by an answer.
- PAL prompting is a variant of chain-of-thought prompting that improves reasoning with structured code.
- Vanilla Synthetic Prompting synthesizes new questions by mimicking seed questions.
Implementation details
- Used PAL-style reasoning chains
- Used InstructGPT for synthesis and inference
- Used top-p sampling for synthesis and greedy decoding for inference
- Annotated seed examples with CoT-style and PAL-style reasoning chains
- Repeated backward-forward synthesis for 1,000 times
- Used all-mpnet-base-v2 encoder for clustering
Main results
- Increasing the number of seed examples from 2 to 8 does not significantly improve performance.
Quality analysis of synthetic examples
- Evaluated 25 random examples synthesized by SYNTHETIC PROMPTING and vanilla SYNTHETIC PROMPTING
- SYNTHETIC PROMPTING synthesizes questions of higher complexities and lower error rate than vanilla SYNTHETIC PROMPTING
Conclusion
- Introducing SYNTHETIC PROMPTING, a technique for reasoning with large language models
- SYNTHETIC PROMPTING uses models as generators of additional examples
- SYNTHETIC PROMPTING improves reasoning performance on numerical, symbolic, and algorithmic tasks
- SYNTHETIC PROMPTING uses gold examples as seeds to automatically synthesize more examples