Link to paper The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract LLMs enable zero-shot task generalization Instruction learning has been approached as a fine-tuning problem In-Context Instruction Learning (ICIL) improves zero-shot task generalization ICIL uses a single fixed prompt to evaluate all tasks Paper Content Introduction LLMs can adapt to target tasks during inference LLMs have emergent capabilities, including the ability to generalize to unseen tasks by following instructions Instruction learning methods have been proposed to improve this ability In-Context Instruction Learning (ICIL) involves learning to follow instructions during inference ICIL uses a prompt that consists of multiple cross-task demonstrations ICIL is a zero-shot learning method ICIL significantly enhances the zero-shot task generalization performance of various pretrained LLMs ICIL improves the zero-shot instruction-following ability of LLMs LLMs learn the correspondence between the answer choice included in the instruction and output of each demonstration during inference In-context instruction learning ICIL consists of cross-task demonstrations Demonstrations are a concatenation of instruction, input, and output instance Fixed demonstration set is constructed to evaluate various tasks in a zero-shot manner Advantages of applying ICIL during inference of LLMs mentioned Demonstration set construction Filter tasks using heuristics Sample K tasks from N tasks Heuristics include task type, answer choice overlap, demonstration length, and demonstration ordering In-context instruction learning during inference ICIL uses a single fixed prompt to adapt to different tasks ICIL improves zero-shot task generalization performance for various LLMs ICIL also assists LLMs for zero-shot generalization after instruction tuning or RLHF Model-generated demonstration set is effective for ICIL Experiments Experimental setup Constructed demonstrations for ICIL from English training tasks of SUPER-NATURALINSTRUCTIONS (SUPERNI) benchmark Used held-out tasks from SUPERNI for testing, consisting of 119 tasks across 12 different categories Selected SUPERNI as evaluation benchmark because it offers diverse set of tasks with varying levels of complexity Evaluated 4 LLMs with various model sizes, including GPT-3, OPT, GPT-NeoX, and GPT-J Results Pretrained LLMs benefit from ICIL ICIL increases performance of pretrained LLMs by over 50% ICIL outperforms LLMs with much larger parameters ICIL gain is comparable to instruction tuning ICIL improves performance of LLMs fine-tuned through instruction tuning or RLHF Irrelevant ICIL does not harm performance much Analysis ICIL significantly improves the zero-shot task generalization performance of both pretrained and instruction-fine-tuned LLMs Constructing the demonstration set with classification tasks is important for ICIL LLMs learn the correspondence between answer choice in the instruction and the label of the demonstrations during ICIL ICIL reinforces the correspondence between the instruction and the label of the demonstrations during inference ICIL does not require any backpropagation and uses the pretrained model checkpoint without any gradient update Increasing the number of demonstrations improves the performance Ordering the demonstrations by the number of answer choices reduces the variance Answer choice overlap between demonstrations harms the performance ICIL is effective for machine-generated demonstrations Performance of ICIL is comparable to adaptive in-context learning methods There is still a large gap between ICIL and few-shot in-context learning