arxiv-summary: AI-summarized AI papers

KNIFE: Knowledge Distillation with Free-Text Rationales

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract FTRs are natural language explanations of reasoning processes. Recent works have studied how to use FTRs to improve language model generalization. KNIFE distills FTR knowledge from an FTR-augmented teacher LM to a student LM. KNIFE outperforms existing FTR learning methods on two question answering datasets. Paper Content Introduction Pretrained language models (LMs) are typically finetuned on downstream tasks using only task labels as supervision....

The case for 4-bit precision: k-bit Inference Scaling Laws

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Quantization methods reduce the number of bits required to represent parameters in a model. The final model size depends on the number of parameters and the rate of compression. This work studies the trade-off between accuracy and model size. 35,000 zero-shot experiments were run with 16-bit inputs and k-bit parameters. 4-bit precision is almost universally optimal for total model bits and zero-shot accuracy....

Continual Learning for Instruction Following from Realtime Feedback

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of training an instruction-following agent through user feedback Human users instruct agent using natural language and provide binary feedback Learning cast as a contextual bandit problem 15.4% absolute improvement in instruction execution over time Robust to design variations Feedback signal equivalent to supervised demonstration data Paper Content Introduction Human-agent interactions expose language learning signals Example of signal: explicit feedback from users Learning from this signal reduces data costs and enables continual improvement Signal differs from gold-standard annotated data Learning from user feedback in human-agent interactions is studied in this paper Setup: two participants collaborate towards a common goal in a shared world Challenge: complexity of learning signal Approach: contextual bandit scenario Experiment: dramatic improvements in agent behavior observed Technical overview Two participants collaborate to collect sets of matching cards in a 3D environment Leader plans and describes follower’s part of the plan using natural language instructions Follower’s role is to follow instructions Leader can provide binary feedback signals to the follower Agent’s task is to map natural language instructions and observations to follower actions Goal is to generate a sequence of observations and actions, ending with STOP Agent parameters are optimized through rounds of continual learning Main metric is instruction execution accuracy, evaluated through human judgments and user feedback Continual learning Estimate policy parameters from user feedback Process progresses in rounds Each round includes deploying agent policy, computing rewards from user feedback, and optimizing policy parameters Initialize process with policy parameterized by θ 1 estimated on human demonstration data Deployment interactions Users collaborate with the agent and give it tasks by typing natural language instructions....

Unnatural Instructions: Tuning Language Models with (Almost) No Human Labor

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Instruction tuning enables language models to perform new tasks from natural language descriptions Unnatural Instructions is a large dataset of creative and diverse instructions, collected with virtually no human labor Unnatural Instructions contains approximately 240,000 examples of instructions, inputs, and outputs Training on Unnatural Instructions rivals the effectiveness of training on open-source manually-curated datasets Paper Content Introduction Instruction tuning enables pretrained language models to generalize to unseen tasks in a zero-shot setting Data can be collected by reformulating existing NLP datasets in an explicit instruction-input-output format Alternatively, user-generated prompts can be collected Process requires only 15 instruction examples to produce 64,000 diverse triplets of instructions, inputs, and outputs More than 50% of generated examples are correct, and incorrect examples contain valuable information Unnatural Instructions contains highly creative tasks and has a more diverse set of instructions than Super-Natural Instructions Experiments show that fine-tuning on Unnatural Instructions outperforms other models on several benchmarks Log-linear relationship between number of generated examples and downstream task performance Language models can produce creative and diverse data faster and cheaper than human labor Data collection Unnatural Instructions is a dataset of 240,670 natural language instructions....

A Natural Bias for Language Generation Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Standard probabilistic models for language generation have difficulty estimating the right probability distribution over next tokens. Models identify a simple, loss-minimising behaviour of outputting the unigram distribution of the target training corpus. A separate module can be implemented to reflect unigram frequency statistics as prior knowledge. Initialising the bias term in a model’s final linear layer with the log-unigram distribution improves learning efficiency and overall performance....

Multilingual Sequence-to-Sequence Models for Hebrew NLP

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent work attributes progress in NLP to large language models with increased model size and large quantities of pretraining data. Current state-of-the-art LMs for Hebrew are both under-parameterized and under-trained compared to LMs in other languages. Multilingual sequence-to-sequence models present a promising building block for NLP for morphologically rich languages. Paper Content Introduction Large pretrained language models have been used in a variety of NLP tasks, domains, and languages....

MatCha: Enhancing Visual Language Pretraining with Math Reasoning and Chart Derendering

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Visual language data is common in the human world. Current vision-language models do not perform well on these data. MatCha is proposed to enhance visual language models’ capabilities. MatCha pretraining outperforms state-of-the-art methods by nearly 20%. MatCha pretraining is useful for broader visual language tasks. Paper Content Introduction Visual language is a system that uses text and visuals to convey meaning Visual language is found in textbooks, scientific papers, web pages, etc....

Visconde: Multi-document QA with GPT-3 and Neural Reranking

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a question-answering system that can answer questions with evidence spread over multiple documents System uses a three-step pipeline: decompose, retrieve, and aggregate Evaluated on three datasets: IIRC, Qasper, and StrategyQA Results suggest current retrievers are the main bottleneck Model is more effective when it gives explanations before answering a question Paper Content Introduction QA tasks that use short contexts have seen progress in multiple domains Necessary information to answer a question is often spread over multiple documents or long ones QA models are based on a pipeline of a retriever and a reader component LLMs can reduce costs for solving QA tasks by allowing implementation of QA systems for different domains without needing a specific annotated dataset Adding a chain-of-thought reasoning step before answering significantly improves LLMs’ zero or few-shot effectiveness Visconde is a QA system that combines a retriever and a few-shot approach to induce an LLM to generate the answer Visconde rivals state-of-the-art supervised models in three datasets Future work should focus on improving retrievers Related work Most approaches for multi-document QA use a retriever and reader component Retriever selects relevant documents for a given question, reader infers final answer Retrievers use dense retrievers, commercial search engines, etc....

NusaCrowd: Open Source Initiative for Indonesian NLP Resources

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract NusaCrowd is a collaborative initiative to collect and unite existing resources for Indonesian languages. 137 datasets and 117 standardized data loaders have been brought together. Quality of datasets has been assessed manually and automatically. NusaCrowd enables the creation of the first zero-shot benchmarks for natural language understanding and generation in Indonesian and its local languages....

Optimal Transport for Unsupervised Hallucination Detection in Neural Machine Translation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural Machine Translation (NMT) is the standard for machine translation applications. NMT models can produce severely pathological translations, known as hallucinations. It is important to implement strategies to guarantee proper functioning. This paper addresses the problem of hallucination detection in NMT. Hallucinations exhibit encoder-decoder attention patterns that are different from good quality translations....