arxiv-summary: AI-summarized AI papers

Rethinking the Role of Scale for In-Context Learning: An Interpretability-based Case Study at 66 Billion Scale

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models perform better with increased scale In-context learning paradigm is used Investigated hypothesis that ability of large language model to in-context learn is not uniformly spread 66 billion parameter language model used across 14 tasks 70% of attention heads and 20% of feed forward networks can be removed with minimal decline in performance Overlap in set of attention heads important for in-context learning across tasks and number of examples Paper Content Introduction LLMs based on Transformer architecture have revolutionized NLP Zero/few-shot incontext learning paradigm is used Question: Are all LLM components needed to perform in-context learning?...

Language model acceptability judgements are not always robust to context

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Targeted syntactic evaluations ask models to make judgements with a single context-free sentence as input. This paper investigates the stability of language models’ performance on targeted syntactic evaluations when varying properties of the input context. Results show that model judgements are generally robust when placed in randomly sampled linguistic contexts, but unstable for contexts containing syntactic structures matching those in the critical test content....

Beyond the C: Retargetable Decompilation using Neural Machine Translation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Decompilation is an important tool in reverse engineering computer software. Researchers have proposed using techniques from neural machine translation to automate the process. Existing neural decompilers require language-specific domain knowledge to build an abstract syntax tree. This paper explores a different tradeoff that treats the assembly and source languages as plain text....

Are We Ready for Vision-Centric Driving Streaming Perception? The ASAP Benchmark

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Vision-centric perception has been used in autonomous driving tasks. Traditional benchmarks do not consider inference time delay. ASAP is the first benchmark to evaluate online performance of vision-centric perception. Annotation-extending pipeline is used to generate high-frame-rate labels. SPUR evaluation protocol is constructed to evaluate performance under different computational resources. Model rank alters under different constraints....

Improving Unsupervised Video Object Segmentation with Motion-Appearance Synergy

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract IMAS is a method for segmenting primary objects in videos without manual annotation. IMAS uses motion-appearance synergy to deal with motion-appearance conflicts. IMAS has two training stages: motion-supervised object discovery and refinement. IMAS proposes motion-semantic alignment as a model-agnostic annotation-free hyperparam tuning method. IMAS improves segmentation quality on several UVOS benchmarks. Paper Content Introduction Video object segmentation is a widely researched topic in computer vision....

Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Progress has been made in unifying table-to-text tasks using a single encoder-decoder model. Existing methods use a simple dataset name as a prefix to the encoder, limiting effectiveness and hindering generalization. We propose compositional task configurations to improve cross-task generalization. Task configurations explicitly specify task type, input and output types. Our method outperforms the UnifiedSKG baseline in both in-domain and zero-shot settings....

Point-E: A System for Generating 3D Point Clouds from Complex Prompts

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent work on text-conditional 3D object generation has shown promising results, but requires multiple GPU-hours to produce a single sample. Generative image models produce samples in seconds or minutes. This paper explores an alternative method for 3D object generation which produces 3D models in 1-2 minutes on a single GPU. The method generates a single synthetic view using a text-to-image diffusion model, and then produces a 3D point cloud using a second diffusion model....

Neural Story Planning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Automated plot generation is the challenge of creating a sequence of events that make a coherent story. Traditional symbolic planners create stories from a goal state, but rely on hand-crafted actions. Neural language models can generate stories with great diversity, but have trouble ending stories and maintaining coherence. This paper presents an approach to story plot generation that combines causal planning with neural language models....

'Rarely' a problem? Language models exhibit inverse scaling in their predictions following 'few'-type quantifiers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language Models perform poorly on quantification ‘Few’-type quantifiers pose a particular challenge for Language Models 960 sentences were presented to 22 autoregressive transformer models of differing sizes Performance of larger models decreased, suggesting they reflect online rather than offline human processing Paper Content Introduction Quantifiers can change the meaning of an utterance Sentences with the same content words can have opposite meanings Language models struggle to predict which quantifier is used in a given context Language models have poor performance at generating appropriate continuations following logical quantifiers Large language models are being used as general systems for multiple tasks It is important that language models can distinguish between sentences with different meanings This study evaluates how well language models take into account the meaning of a quantifier when generating text Investigates whether there is an inverse scaling relationship with model size Negation is challenging for language models This study focuses on quantifiers indicating typicality such as most and few Uses stimuli from a previously published N400 study Tests whether language models show the same pattern of insensitivity towards the quantifiers Language models Analyzed GPT-2, GPT-3, GPT-Neo, OPT, and InstructGPT language models Compared different training data and numbers of parameters Evaluation Calculated the surprisal of the critical word in each stimulus sentence Considered the surprisal of the critical word given its preceding context Converted probability of the target word to surprisal using Equation 1 Used single and multi-token words Compared which of the two possible critical words had a lower surprisal Calculated accuracy as fraction of stimulus pairs for which model predicted the appropriate critical word Analyzed model sensitivity to the quantifiers All code and data will be published online on acceptance Results Accuracy of models increases with size for most-type quantifiers, but decreases for few-type quantifiers Small exceptions to this pattern exist Sensitivity of models varies, but is generally low No clear pattern in sensitivity Discussion Inverse scaling with quantifiers Models increase in size, they tend to improve at predicting words following most-type quantifiers and get worse at predicting words following few-type quantifiers Larger models make predictions increasingly in accordance with typicality, overwhelming any sensitivity to quantifier type Sensitivity analysis shows all models have a poor and largely invariant sensitivity overall Further implications Models tend to perform better as they get larger and are trained on more data Evidence supports this idea Predictions of larger models and those trained on more data correlate with human incremental online predictions Easier for humans to process well-formed sentences with plausible semantics Predictions of larger models can align less with explicit human judgements Language models may struggle to make predictions in line with human offline judgements Tailoring training may be necessary to avoid specific known issues

Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Generative models have been studied in computer vision. Diffusion models have been used to generate high quality images. GANs have been found to have the ability to disentangle different attributes. This work explores whether diffusion models have the same capability. It was found that diffusion models can modify images towards a style without changing the semantic content....