arxiv-summary: AI-summarized AI papers

Mu$^{2}$SLAM: Multitask, Multilingual Speech and Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Mu$^{2}$SLAM is a multilingual sequence-to-sequence model pre-trained on speech, text and supervised data. Mu$^{2}$SLAM uses a sequence-to-sequence masked denoising objective and a masked language modeling objective. Mu$^{2}$SLAM establishes a new state-of-the-art for models trained on public datasets. Mu$^{2}$SLAM matches the performance of an mSLAM model on Voxpopuli ASR. Mu$^{2}$SLAM improves by more than 6% over mSLAM on XNLI....

BLOOM+1: Adding Language Support to BLOOM for Zero-Shot Prompting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract BLOOM is a multilingual language model capable of zero-shot learning Previous works have only explored adapting small language models Language adaptation can improve zero-shot performance in new languages Adapter-based finetuning is more effective than continued pretraining for large models Prompting performance is determined by the size of the language adaptation data Including a new language in the multitask fine-tuning mixture is the most effective method to teach BLOOMZ a new language Language adaptation can generalize well to diverse languages with sufficient training data Paper Content Introduction Current multilingual language models have limited coverage of languages BLOOM (Scao et al....

Multi-View Knowledge Distillation from Crowd Annotations for Out-of-Domain Generalization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Difficult to select effective training signal for natural language processing tasks Expert annotations expensive, crowd-sourced annotations may not be reliable Recent work in machine learning shows learning from soft-labels acquired from crowd annotations can be effective Proposes new methods for acquiring soft-labels from crowd-annotations by aggregating distributions Results in best or near-best performance and uncertainty estimation across tasks Paper Content Introduction Supervised machine learning requires labels as training data Tradeoffs exist when deciding how to collect and use labels Recent work has focused on soft-labeling schemes to improve accuracy and uncertainty estimation Little work has compared how different soft-labeling schemes affect out-of-distribution performance and uncertainty estimation in NLP This paper seeks to fill this gap by providing an in-depth study into soft-labeling techniques It proposes four multiview aggregation methods to generate aggregated soft-labels It also looks at knowledge distillation to build compact but robust models It uses multi-view structure to improve test set performance of the final distilled model Methods Learning from crowd annotations is a topic with a rich literature....

StyleTRF: Stylizing Tensorial Radiance Fields

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recently, there has been much attention on stylized view generation of scenes captured casually using a camera. Previous work captures the geometry and appearance of the scene as neural point sets or neural radiance fields. StyleTRF is a compact, quick-to-optimize strategy for stylized view generation using TensoRF. StyleTRF decouples style-adaption from view capture and is much faster than previous methods....

Transferring General Multimodal Pretrained Models to Text Recognition

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a new method, OFA-OCR, to transfer multimodal pretrained models to text recognition Recast text recognition as image captioning and directly transfer a unified vision-language pretrained model OFA-OCR outperforms baselines and achieves state-of-the-art performance in Chinese text recognition benchmark Constructed an OCR pipeline with OFA-OCR and achieved competitive performance with product-level API Paper Content Introduction Optical Character Recognition (OCR) is used to extract text from images Text recognition is a key challenge in OCR Deep learning methods are used to improve accuracy Transformer encoder-decoder framework has been applied to text recognition Complex model and objective designs can hinder reproduction Multimodal pretraining can boost performance in text recognition Finetuning a unified multimodal pretrained model on text recognition datasets can achieve high accuracy Ablation studies demonstrate the effectiveness of the proposed method Finetuning with image captioning Text recognition can be recast as image captioning Finetune model with maximum likelihood estimation Input images are made square by resizing and padding Interpolation used to adapt to larger resolution images Multitask finetuning Experiments were conducted on a Chinese text recognition benchmark with 4 subtasks....

APOLLO: A Simple Approach for Adaptive Pretraining of Language Models for Logical Reasoning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Logical reasoning of text is an important ability that requires understanding of the information present in the text and their interconnections. Prior works on improving logical reasoning ability of language models require complex processing of training data. We propose APOLLO, an adaptively pretrained language model that has improved logical reasoning abilities. We select a subset of Wikipedia for continued pretraining of a language model....

Multi hash embeddings in spaCy

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Distributed representation of symbols is important in machine learning systems Traditional word embeddings associate a separate vector with each word Hash embeddings reduce memory footprint by representing each word as a summary of normalized word form, subword information and word shape Technical report introduces embedding methods in spaCy and evaluates hash embedding architecture with multi-embeddings on Named Entity Recognition datasets Paper Content Introduction SpaCy is a popular suite of Natural Language Processing software It provides algorithms and models for common NLP tasks It pays attention to stability, usability and documentation It offers a fine-grained API for customizing and controlling training It prioritizes run-time efficiency, long document efficiency, robustness to domain-shift, and the ability to fine-tune the model It uses the hashing trick to reduce the search space in a lookup table Word embeddings associate words with continuous vectors They encode useful syntactic and semantic information Collobert and Weston popularized the idea of using neural networks with pretrained word embeddings Mikolov et al....

Discovering Language Model Behaviors with Model-Written Evaluations

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models (LMs) need to be evaluated to understand their behaviors. Prior work used crowdwork or existing data sources to evaluate LMs. This paper proposes automatically generating evaluations with LMs. Crowdworkers rate the examples as highly relevant and agree with labels. Larger LMs show inverse scaling, expressing concerning goals and stronger political views....

Natural Language to Code Generation in Interactive Data Science Notebooks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Computational notebooks are used by data scientists to perform data wrangling and analytic tasks. ARCADE is a benchmark of 1082 code generation problems using the pandas data analysis framework. PaChiNCo is a 62B code language model for Python computational notebooks. Few-shot prompting strategies are explored to improve the diversity and explainability of model predictions....

Emergent Analogical Reasoning in Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large language models have reignited debate about whether human cognitive capacities can emerge from them. Of particular interest is the ability of these models to reason about novel problems without training. A comparison was made between human reasoners and a large language model on a range of analogical tasks. The language model displayed a strong capacity for abstract pattern induction, matching or surpassing human capabilities....