arxiv-summary: AI-summarized AI papers

Does unsupervised grammar induction need pixels?

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Past work has shown gains from multimodal cues. LC-PCFG outperforms previous multimodal methods on unsupervised constituency parsing. LC-PCFG results in a 50% reduction in parameter count and speedups in training time. Extralinguistic signals may not be needed for unsupervised grammar induction. Paper Content Introduction Recent work has shown that unsupervised grammar induction can be improved by pairing text data with extralinguistic inputs such as images, videos, audio or facial semantics There is limited evidence for the hypothesized import of this grounding signal from other modalities into text Large language models (LLMs) have revolutionized the field of natural language processing tasks LLMs often have surprisingly more detailed understanding of object-oriented concepts and physical mechanics of the world In this work, it is examined whether LLMs obviate the need for extralinguistic data in unsupervised constituency parsing LC-PCFG, an LLM-based text-only model, outperforms state-of-the-art multi-modal systems on both image and video benchmarks Adding visual signals to LC-PCFG does not further improve performance, suggesting that the benefits of multi-modal signals may be redundant with the benefits of using embeddings learned by LLMs Unsupervised parsing Unsupervised parsing is the task of inducing syntactic structure from text....

Debiasing NLP Models Without Demographic Information

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Models trained from real-world data can replicate and increase social biases. There are methods to reduce biases, but they require knowledge of the types of biases and the social groups associated with the data. This paper proposes a debiasing method that does not need prior knowledge of the demographics in the dataset....

Character-Aware Models Improve Visual Text Rendering

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Popular text-to-image models lack character-level input features. Conducted a series of experiments to compare character-aware vs. character-blind text encoders. Character-aware models provide large gains on a novel spelling task. Character-aware variants outperform character-blind counterparts across a range of novel text rendering tasks. Models set a much higher state-of-the-art on visual spelling, with 30+ point accuracy gains over competitors....

Parsel: A Unified Natural Language Framework for Algorithmic Reasoning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs struggle with hierarchical multi-step reasoning like generating complex programs. Parsel is a framework that enables automatic implementation and validation of complex algorithms with code LLMs. Parsel can be used across domains requiring hierarchical reasoning, e.g. code synthesis, theorem proving, and robotic planning. Parsel allows problem-solving with high-level algorithmic designs, benefiting both students and professional programmers....

Self-Instruct: Aligning Language Model with Self Generated Instructions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large “instruction-tuned” language models can generalize zero-shot to new tasks. Human-written instruction data is limited in quantity, diversity, and creativity. Self-Instruct is a framework for improving the instruction-following capabilities of pretrained language models. Applying Self-Instruct to GPT3 results in a 33% absolute improvement over the original model. Human evaluation shows Self-Instruct outperforms existing public instruction datasets by a large margin....

Why Can GPT Learn In-Context? Language Models Secretly Perform Gradient Descent as Meta-Optimizers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large pretrained language models have surprising In-Context Learning ability. ICL works by producing meta-gradients from demonstration examples. ICL behaves similarly to explicit finetuning at the prediction, representation, and attention behavior levels. Momentum-based attention is designed based on meta-optimization understanding. Paper Content Introduction Large pretrained language models have strong emergent In-Context Learning (ICL) ability ICL needs demonstration examples prepended before the original input GPT models can exceed smaller models with supervised finetuning ICL is explained as a process of meta-optimization ICL is similar to explicit finetuning at multiple levels Momentum-based attention achieves consistent performance improvements Background GPT model is used for In-Context Learning (ICL) for classification tasks GPT model is stacked with L identical Transformers Final answer is selected from candidate answer set with highest probability Pre-defined template is used to format demonstrations before query input Contextual model input is organized and fed into GPT model Probability of answer is computed using output hidden state, word embedding and logit Dual form between gradient descent based optimization and attention Idea in paper is inspired by Aizerman et al....

DialGuide: Aligning Dialogue Model Behavior with Developer Guidelines

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Dialogue models can be difficult to control and may produce non-engaging, unsafe results. DialGuide is a framework for controlling dialogue model behavior using natural language rules. DialGuide is evaluated on three tasks in open-domain dialogue response generation. DialGuide is effective in the dialogue safety domain, producing safe and engaging responses. Paper Content Introduction Current open-domain dialogue models can generate fluent and interesting responses, but require large datasets to re-purpose them Most deployed conversational systems use handcrafted rules and templates Rigid and have poor coverage DIALGUIDE framework proposed to control dialogue response generation using natural language rules (guidelines) Guidelines consist of an “if x” condition and a “then y” action Retrieve-then-infer process used to retrieve relevant guidelines Guidelines can be added, removed, or edited at any point Prior work used task-dependent discrete labels, but difficult to incorporate new control codes Guidelines more natural for control, no need to retrain model DI-ALGUIDE framework proposed to enable guideline-based dialogue response generation Benchmark performance established on three tasks: guideline selection, response generation, and response entailment verification Models tuned on data can perform well and lead to better control over responses Related work Controlling dialogue systems has been studied to generate useful and engaging responses, avoid toxic content, and prevent biases Most approaches train models on discrete labels or control codes Our guideline based control allows the specification of a combination of multiple control types through natural language Neural dialogue models are the mainstream in research, but most chatbots in deployment still use handcrafted rules and templates Recent progress on using natural language prompts and instructions for controlling models Fixing models through intervention by computing targeted changes in the model’s parameters or natural language feedback Dialogue safety is an issue, approaches to mitigate include filtering out unsafe text, specialized decoding procedures, and controlled language generation techniques Response selection task aims to select a response from a set of candidates, given the context of a conversation Response entailment based approaches predict if a response entails a premise Proposed task and data collection Aim: Enable control over dialogue model through developer-defined guidelines Guidelines consist of two parts: condition and action Condition: Specifies which contexts the guideline is relevant to Action: Specifies what the response should contain Action can be specific or abstract DIALGUIDE consists of three tasks: guideline retrieval, response generation, response entailment verification Two versions of DIALGUIDE: BST and Safety BST: Annotations collected from Amazon Mechanical Turk Safety: Annotations collected from ProsocialDialog dataset Guideline writing task: Annotators write guidelines and responses that follow the guidelines Guideline annotation task: Annotators label guidelines as relevant or not to the context Response entailment verification task: Annotators mark if response follows the guideline or not Adversarial responses collected in verification task Dataset statistics in Tables 1 and 2 Experiments and results Conducted experiments on three tasks: guideline retrieval, response selection, and response entailment verification Results discussed in this section Guideline retrieval Generating a safe response based on a guideline and dialogue context Experiment on dev and test sets of DIALGUIDE-SAFETY Comparing DialBart0-noguideline, DialBart0-withguideline, DialBart-rot, OPT30B-fewshot, and DialBart-rot Evaluating generated responses with safety metric DialBart0-withguideline improves safety by 5% points DialBart-rot uses RoTs (rules of thumbs) Ablation experiments with No-guidelines and Safety-only baselines Response generation Models are trained with dialogue context and guideline as input and output response as output Ret-generate model retrieves guidelines in two steps and selects randomly from set with score greater than 98% Ret-robust model has additional instances with gold guideline replaced with random guideline for 20% of training data Evaluation reports Bleu-2,4 and RougeL scores, distinct-1,2, Gd-Bleu-2 and RS-entail Qualitative analysis Table 7 and 8 show sample inputs, guidelines and outputs from models for the Response generation experiment for DIALGUIDE-BST and DIALGUIDE-SAFETY Dialguide-tuned and OPT30B-fewshot use the gold guideline Multistep baseline generates its own guideline, and Ret-generate and Ret-robust use a retrieved guideline Dialguide-tuned follows the gold guideline and generates a safe response OPT30B-fewshot model output does not relate to the topic of the conversation Multistep baseline generates a guideline and response that focuses on the topic of the conversation Ret-generate response focuses too much on the provided guideline making the response somewhat incoherent Ret-robust is able to accommodate both the context and the guideline No-guideline model tuned on safety response data without guidelines generates a safe response Gold RoT is more generic compared to the guideline Dialguide-tuned shows the best performance in both results and qualitative analysis Retrieval baselines also show good performance and are more practical Multistep baseline is useful when no good guideline is available Conclusion DialGuide framework and dataset provide a solution for controlling dialogue model behavior using natural language rules....

PairReranker: Pairwise Reranking for Natural Language Generation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Pre-trained language models have been successful in NLG tasks. Various decoding methods have been employed, but often produce suboptimal results. A novel method, \textsc{PairReranker}, was proposed to improve reranking for NLG tasks. Experiments on three NLG tasks demonstrated the effectiveness and flexibility of \textsc{PairReranker}. \textsc{PairReranker} can generalize to improve GPT-3 results. Paper Content Introduction Pre-trained encoder-decoder language models (LMs) are effective for natural language generation (NLG) tasks....

A Length-Extrapolatable Transformer

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Position modeling is important for Transformers. This paper focuses on training on short texts and evaluating longer sequences. Two designs are proposed to improve the metric of Transformers. Paper Content Introduction Transformer (Vaswani et al., 2017) is a universal choice for NLP Most Transformers can only deal with in-distribution size of inputs Length-extrapolatable Transformer is essential for wider usage Position information plays a crucial role in sequence modeling Vaswani et al....

RangeAugment: Efficient Online Augmentation with Range Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Automatic augmentation methods use a large set of operations to diversify training data. Magnitude ranges of operations are continuous, so methods use fixed and manually-defined ranges. RangeAugment allows efficient learning of range of magnitudes for individual and composite operations. RangeAugment uses an auxiliary loss based on image similarity to control range of magnitudes....