arxiv-summary: AI-summarized AI papers

SODA: Million-scale Dialogue Distillation with Social Commonsense Contextualization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Present SODA: a million-scale high-quality social dialogue dataset Train COSMO: a generalizable conversation agent Dialogues in SODA are more consistent, specific, and natural than prior datasets COSMO is more natural and consistent than best-performing dialogue models Data, models, and code are made public Paper Content Introduction Progress on open-domain social dialogue agents has been hindered by lack of diversity, scale, and quality of training corpora Most dialogue agents are trained on large amounts of unfiltered conversations or highly curated/specialized crowdsourced dialogues Issues of unnaturalness, toxicity, incoherence, blandness, and lack of commonsense remain Introduce SODA, a million-scale dialogue dataset covering a wide variety of social interactions SODA is the largest publicly available open-domain conversation dataset Human evaluation shows SODA surpasses existing human-authored dialogue corpora Proposed CO 3 framework for distilling conversations from large pre-trained language models CO 3 adds context information to social commonsense knowledge step-by-step COSMO conversation model trained on SODA outperforms existing dialogue models Background Conversation is a form of social interaction Narratives and scripts are abstracted from social experiences Social experiences form our knowledge for explaining everyday events and inferring the mental states of others Attribution in social psychology has been studied in NLP as social commonsense Commonsense knowledge graph Start with a commonsense knowledge graph Represented by symbolic triples Use Atomic 10x as knowledge graph Retrieve triples with social commonsense relations Prompt PLM to rewrite commonsense into narrative PLMs known for writing capabilities, especially in narratives From narrative to conversation Inferring who is speaking in the dialogue is easier when the narrative contains person variables....

MULTI3NLU++: A Multilingual, Multi-Intent, Multi-Domain Dataset for Natural Language Understanding in Task-Oriented Dialogue

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Task-oriented dialogue systems have been used to help people achieve goals Systems are usually built for one language and don’t work well beyond that MULTI3NLU++ is a dataset that extends NLU++ to multiple languages and domains MULTI3NLU++ includes Spanish, Marathi, Turkish and Amharic MULTI3NLU++ is used to benchmark language models and machine translation systems Paper Content Introduction Task-oriented dialogue systems used to automate customer service tasks in travel, finance, and hotel booking Natural Language Understanding module performs intent detection and slot labelling Intent detection task is to identify goal of user’s utterance from pre-defined classes Slot labelling task is to label each token in an utterance with a label that describes the type of semantic information Existing datasets are limited to single intent, single domain, and small set of slot types Existing datasets are evaluated on small set of higher-resource languages Inability to handle multiple intents is a serious limitation MULTI 3 NLU ++ is a multilingual, multi-intent, multi-domain dataset for training and evaluating TOD systems MULTI 3 NLU ++ extends NLU++, a multi-intent, multi-domain dataset for BANKING and HOTELS domains MULTI 3 NLU ++ includes manual translations of 3,080 utterances in NLU++ to four languages MULTI 3 NLU ++ captures language diversity and allows for cross-domain and cross-lingual training and experimentation MULTI 3 NLU ++ enables systematic comparisons of dialogue NLU systems in few-shot setups for cross-lingual and cross-domain transfer for low-, medium-and high-resource languages Dataset collection Selected Spanish, Marathi, Turkish, and Amharic as languages Recruited professional translators to perform manual translation Instructed translators to treat task as creative writing and maintain colloquial nature of utterances Conducted pilot task with 50 sentences per domain In-house evaluation by native speakers to verify translations Automatic checker to ensure slot values present in translations Data collection process took 5 months and cost £7,611 Baseline experiments MULTI 3 NLU ++ is a multilingual dialogue NLU dataset It covers intent detection and slot labelling tasks It is used to provide reference points and demonstrate aspects of multilingual dialogue NLU systems It is tested using N-fold cross-validation It contains data for two domains: BANKING and HOTELS It allows for comparison of multilingual dialogue NLU systems on languages with different amounts of resources It is tested using in-language and cross-lingual setups Classification-based methods Evaluate two standard classification approaches to intent detection MLP-based with a fixed encoder and full-model fine-tuning Use a fixed efficient sentence encoder to encode sentences and train only the MLP classifier Use a sigmoid layer on top of the classifier Threshold of 0....

Recycling diverse models for out-of-distribution generalization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Foundation models are changing how AI systems are built Practitioners use a standard procedure to build machine learning solutions Internet is full of foundation models fine-tuned on many tasks These individual fine-tunings lack strong generalization and exist in isolation Model recycling leverages multiple fine-tunings of the same foundation model on diverse tasks Model recycling maximizes model diversity and achieves a new state of the art on the DomainBed benchmark Paper Content Introduction Framework of foundation models is fueling adoption of machine learning for real-world applications Pre-trained models are easy to adapt to downstream tasks Two-step transfer learning strategy is followed Practitioner downloads a copy of foundation model from authorities Practitioner fine-tunes weights on target task with limited in-house data Risk of latching onto specific patterns from training data Short-sighted models fail to generalize with out-of-distribution test examples Negative impact on human lives DomainBed OOD accuracy benchmark Different training strategies discussed in paper Model recycling proposed Compute parallelism throughout training Maximizes diversity in predictions State-of-the-art performance in DomainBed No inference or training overhead Increased OOD generalization enables responsible use of machine learning Averaging neural networks’ weights inspires modern fine-tuning approaches Fine-tuning for out-of-distribution generalization Learning setup involves deep learning model with two parts: featurizer and classifier Model is parametrized by weights θ = (w, φ) Aim is to maximize test accuracy acc te (θ) for out-of-distribution (OOD) generalization Vanilla fine-tuning Fine-tuning is a simple recipe for transferring knowledge from pre-trained models to target tasks....

HouseCat6D -- A Large-Scale Multi-Modal Category Level 6D Object Pose Dataset with Household Objects in Realistic Scenarios

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Estimating 6D pose of objects is a major field in 3D computer vision Research trends are heading towards category-level pose estimation New dataset HouseCat6D features multi-modality, diverse objects, high-quality pose annotation, large scale scenes, and checkerboard-free environment Benchmark results of state-of-the-art category-level pose estimation networks provided Paper Content Introduction 6D pose estimation is important for computer vision tasks Many methods have been proposed to solve this task Most methods focus on instance-level, but generalization is limited Recent methods focus on category-level, but lack of datasets HouseCat6D is a new category-level dataset with 194 objects from 10 categories Includes RGB, depth, and polarimetric images with 23....

Settling the Reward Hypothesis

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract The reward hypothesis suggests that goals and purposes can be thought of as maximizing the expected value of a reward. The paper aims to fully settle the hypothesis by specifying the requirements for it to hold. Paper Content Introduction Reward hypothesis posited by Sutton states that goals and purposes can be thought of as maximizing expected value of cumulative reward McCarthy’s claim that intelligence is the computational part of the ability to achieve goals Sutton’s hypothesis implies that to build AI, it is sufficient to solve RL Reward-is-enough hypothesis posits that intelligence can be understood as subserving the maximization of reward Abel et al....

Quantifying Local Extrinsic Curvature in Neural Manifolds

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural manifold hypothesis suggests activity of neural population forms a low-dimensional manifold Dimensionality reduction techniques don’t provide explicit parameterization of manifold or capture global structure Topological data analysis methods reveal shared topological structure between neural manifolds and task variables Leverage tools from Riemannian geometry and topologically-aware deep generative models to study geometry of neural manifolds Computes explicit parameterization and estimates local extrinsic curvature Paper Content Introduction Machine learning uses the manifold hypothesis to explain real-world data Neural population activity is hypothesized to form low-dimensional manifolds Dimensionality reduction techniques can reveal lower-dimensional structure Topological data analysis can reveal shared topological structure Few methods exist to explicitly quantify and parameterize the geometric structure of neural manifolds This paper introduces a novel approach to study the geometry of neural manifolds A riemannian approach to neural population geometry Topological methods are robust to continuous deformations of the neural manifold....

Reinforced Clarification Question Generation with Defeasibility Rewards for Disambiguating Social and Moral Situations

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Context is important for moral reasoning. Lying to a friend can be wrong or okay depending on the context. ClarifyDelphi is an interactive system that generates clarification questions to elicit missing contexts of a moral situation. Reinforcement Learning is used to generate questions that lead to diverging moral judgments. Human evaluation shows ClarifyDelphi generates more relevant, informative and defeasible questions....

Towards Reasoning in Large Language Models: A Survey

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have made progress in natural language processing and may exhibit reasoning abilities. This paper provides an overview of the current state of knowledge on reasoning in LLMs. Techniques, methods, benchmarks, findings, implications, and future directions are discussed. Paper Content Introduction Reasoning is a cognitive process It involves using evidence, arguments, and logic It is important in fields like psychology, philosophy, and computer science Large language models have made advancements in natural language processing These models exhibit emergent behaviors, including the ability to “reason” LLMs can answer questions with explicit reasoning steps Reasoning ability is a hallmark of human intelligence It is unclear whether LLMs are actually reasoning Different forms of reasoning may be used depending on the task Focus on “informal deductive reasoning” in large language models Towards reasoning in large language models Reasoning is seen as a weakness in language models and other NLP models....

Extrinsic Evaluation of Machine Translation Metrics

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Automatic machine translation metrics are used to measure translation quality across large test sets. It is unclear if these metrics are reliable at distinguishing good and bad translations at the sentence level. This paper investigates how useful MT metrics are at detecting the success of a machine translation component in a larger platform....

High-resolution canopy height map in the Landes forest (France) based on GEDI, Sentinel-1, and Sentinel-2 data with a deep learning approach

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Developed a deep learning model to create a high-resolution canopy height map of the “Landes de Gascogne” forest in France. Model uses multi-band images from Sentinel-1 and Sentinel-2 with composite time averages as input. Evaluation performed with external validation data from forest inventory plots and a stereo 3D reconstruction model. Paper Content Introduction Forest biomass is important for global carbon cycle High spatial resolution needed to capture difference in biomass Top canopy height can be a proxy for forest biomass estimation Forest inventories provide reliable information but not high-resolution maps Remote sensing data can provide accurate height estimations Airborne LiDAR and space-borne measurements have potential to map forest properties GEDI LiDAR mission provides accurate point-wise observations Sentinel-1 and Sentinel-2 provide measurements at 10 m resolution Global and local maps of forest biomass and height have been developed Machine learning and deep learning have been used for forest parameter estimations CNNs have increased accuracy for image interpretation tasks CNNs have been used for tree height mapping Materials & methods Used GEDI sensor to measure canopy height Used Sentinel-1 and Sentinel-2 images as predictors for a deep learning U-Net framework Produced a gridded map of the Landes de Gascogne area at 10 m resolution Study area Landes forest is located in South-West France Comprised of 90% maritime pine and 10% broadleaved forests Intensively managed with thinning every 5-10 years and clear-cuts every 35-50 years Understory vegetation includes woody shrubs and perennial herbaceous plants GEDI data used to model continuous height map at 10 m resolution GEDI data acquired from ISS between 51....