arxiv-summary: AI-summarized AI papers

Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text detoxification can reduce the harms of toxicity by changing the text to remove offensive meaning. MaRCo is an algorithm that combines controllable generation and text rewriting methods to mask and replace words. MaRCo outperforms baselines on automatic metrics and is preferred 2.1 times more in human evaluation. MaRCo is especially useful for addressing subtle toxicity and online hate....

A Survey of Deep Learning for Mathematical Reasoning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Mathematical reasoning is important in many fields AI systems can solve math problems and prove theorems Mathematics is a testbed for challenging aspects of reasoning Advances in neural language models have opened up new opportunities for deep learning This paper reviews tasks, datasets, and methods at the intersection of mathematical reasoning and deep learning Paper Content Tasks and datasets Examining tasks and datasets for mathematical reasoning using deep learning methods Summary of commonly used datasets in this field found in Table 2 Math word problem solving Math word problems (MWPs) have been studied by NLP researchers for decades....

Trustworthy Social Bias Measurement

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Designing measures of social bias that can be trusted. Prior work has introduced several measures, but none have gained widespread trust. Cross-disciplinary theory of measurement modeling used to design bias measures. Explicitly define social bias, grounded in principles from social science research. Proposed a general bias measurement framework DivDist, with 5 concrete bias measures....

Is GPT-3 a Psychopath? Evaluating Large Language Models from a Psychological Perspective

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs like GPT-3 have been evaluated from a psychological perspective. Tests of personality traits show that LLMs have higher scores on SD-3 than the human average. Fine-tuning with safety metrics does not necessarily lead to more positive personalities. Well-being tests show an increase in scores from GPT-3 to InstructGPT. Instruction-finetune FLAN-T5 with positive answers can improve the model from a psychological perspective....

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent work has shown that large language models can generate natural language reasoning steps to answer multi-step questions. When the necessary knowledge is not available or up-to-date, an external knowledge source can be used to retrieve text and prepend it as context to the model’s input. A new approach, IRCoT, interleaves retrieval with CoT for multi-step QA, guiding the retrieval with CoT and using retrieved results to improve CoT....

Mini-Model Adaptation: Efficiently Extending Pretrained Models to New Languages via Aligned Shallow Training

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract It is possible to expand pretrained Masked Language Models (MLMs) to new languages by learning a new set of embeddings. Training the new embeddings requires a full forward and backward pass over the entire model. Mini-model adaptation is a compute-efficient alternative that builds a shallow mini-model from a fraction of a large model’s parameters....

A Measure-Theoretic Characterization of Tight Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language modeling is a task in natural language processing. Probability mass can “leak” onto infinite sequences in some cases. This paper offers a measure-theoretic treatment of language modeling. Popular language model families are tight and will not leak. Paper Content Introduction Language modeling is a core task in natural language processing It involves estimating a distribution over the set of strings over a given alphabet It has been used to estimate statistical properties of language and is essential for computational linguistics research It is also central to a wide range of natural language processing applications Language models are typically described as a distribution over the countably infinite set of all (finite) strings Some classes of autoregressive language models have parameter settings in which the generative process terminates with probability < 1 Transformer-based language models are always tight and recurrent neural language models are always tight when they employ a bounded activation function Motivating examples An alphabet is a finite set of symbols, including an end-of-sequence symbol A string is a finite sequence of symbols from the alphabet A language model is a distribution over all strings An autoregressive sequence model is a conditional probability distribution An ASM can match the conditional probabilities of a known language model The probability of all strings is ≈ 0....

Precise Zero-Shot Dense Retrieval without Relevance Labels

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Dense retrieval is effective and efficient across tasks and languages. It is difficult to create effective fully zero-shot dense retrieval systems when no relevance label is available. HyDE is proposed to pivot through Hypothetical Document Embeddings. HyDE significantly outperforms the state-of-the-art unsupervised dense retriever Contriever. HyDE shows strong performance comparable to fine-tuned retrievers across various tasks and languages....

LAMBADA: Backward Chaining for Automated Reasoning in Natural Language

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Automated reasoning with unstructured natural text has made progress with the help of large language models. Forward direction reasoning suffers from a combinatorial explosion of the search space. Backward direction reasoning is more efficient for proof-finding problems. LAMBADA is a backward chaining algorithm that decomposes reasoning into four sub-modules. LAMBADA achieves accuracy boosts over forward reasoning methods on two challenging logical reasoning datasets....

Controllable Text Generation with Language Constraints

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Task of text generation with constraints specified in natural language Created benchmark Cognac with knowledge-intensive constraints from databases State-of-the-art language models fail on this task Proposed solution CognacGen to leverage language model’s internal knowledge Three forms of guidance and prefix-tuning approaches to distill guidance Empirical evaluations demonstrate CognacGen can generalize to unseen instructions and outperform baselines Paper Content Introduction Language models are becoming increasingly good at generating text A key question is how to control them to produce what is required while preventing unwanted generations This is especially important for reducing issues of toxicity, bias, and misinformation Prior work has used special control codes or classifiers to modify the LM’s probability distribution This paper considers the problem of controlling generation in LMs with constraints specified in natural language A new benchmark called COGNAC is created with two datasets based on WordNet and Wikidata State-of-the-art LMs fail to follow simple language constraints A new language model generation method called COGNAC-GEN is developed to follow linguistic guidance without retraining COGNAC-GEN outperforms prior methods and other strong baselines by a significant margin It is able to improve generation even with imperfect guidance and can generalize to unseen instructions Task setup Problem of conditional text generation with topics and constraints provided in natural language Input includes a topic, example generations, and a constraint Goal is to train LMs to generate fluent on-topic content while respecting the constraint Model has to understand the topic and constraint specified in natural language Topics and constraints are knowledge-intensive Model has to respect both the topic and the constraint simultaneously Dataset collection Two new datasets based on WordNet and Wikidata created for COGNAC benchmark WordNet dataset constructed using five root nodes and their leaf nodes as topics and constraints Wikidata dataset constructed using five properties and their values as topics and constraints 35 unique templates created to reflect diverse nature of instructions 107,555 and 18,900 unique instructions created for WordNet and Wikidata respectively Evaluation metrics Evaluated different generation methods of LMs using metrics that test for correctness and fluency Main metric is whether the generation conforms to the constraint while staying on topic Reported on-topic score, constraint violation score, Copy-BLEU score, repetition, and perplexity Overview COGNAC model will benefit from querying itself Conditional probability is explicitly factorized Probability conditioned on demonstrations Probability evaluated if task is performed Probability evaluated if constraint is conformed Generation model can be modeled with pre-trained LM Guided generation COGNACGEN updates the next token prediction probability from the generation model by modifying the logits using guidance....