Empirical Investigation of Neural Symbolic Reasoning Strategies

Empirical Investigation of Neural Symbolic Reasoning Strategies

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural reasoning accuracy improves when generating intermediate steps Source of improvement is unclear Investigated benefit of generating intermediate steps for symbolic reasoning Decomposed reasoning strategy in terms of step granularity and chaining strategy Found that choice of reasoning strategies affects performance Certain configurations lead to nearly perfect performance Results indicate importance of exploring effective strategies for neural reasoning models Paper Content Introduction Artificial intelligence researchers have been attempting neural-symbolic integration for a long time....

February 16, 2023 · 527 words · Yoichi Aoki, Keito Kudo, Tatsuki Kuribayashi, Ana Brassard, Masashi Yoshikawa and 2 others
À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

À-la-carte Prompt Tuning (APT): Combining Distinct Data Via Composable Prompting

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduce A-la-carte Prompt Tuning (APT), a transformer-based scheme to tune prompts Prompts can be trained in isolation, on different devices, at different times, and on different distributions or domains During inference, models can be assembled based on arbitrary selections of data sources A-la-carte learning enables constructing bespoke models specific to each user’s individual access rights and preferences Models can be added or removed without retraining from scratch Models achieve accuracy within 5% of models trained on the union of the respective sources State-of-the-art performance on Split CIFAR-100 and CORe50 benchmarks Paper Content Introduction Related work Prompting originated from natural language processing We compare different methods of combining prompts We optimize “soft” prompts in the embedding space Prompt tuning applied to the continual learning problem Forgetting in deep networks is challenging We run a procedure to benchmark our APT approach Preliminaries Vision Transformer is used as the backbone architecture due to its accuracy and ease of prompting....

February 15, 2023 · 673 words · Benjamin Bowman, Alessandro Achille, Luca Zancato, Matthew Trager, Pramuditha Perera and 2 others
Topological Neural Discrete Representation Learning à la Kohonen

Topological Neural Discrete Representation Learning à la Kohonen

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Unsupervised learning of discrete representations from continuous ones in neural networks is used in several applications. Vector Quantisation (VQ) is a popular method to achieve such representations. EMA-VQ is often used, but here we study an alternative VQ algorithm based on the learning rule of Kohonen Self-Organising Maps. KSOM is known to offer two potential benefits over EMA-VQ: faster VQ and discrete representations that form a topological structure....

February 15, 2023 · 629 words · Kazuki Irie, Róbert Csordás, Jürgen Schmidhuber
The Expressive Power of Tuning Only the Norm Layers

The Expressive Power of Tuning Only the Norm Layers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Feature normalization transforms are essential for deep neural networks. Tuning the parameters of these transforms can achieve high accuracy. This work investigates the expressive power of tuning normalization layers of frozen networks. Tuning normalization layers of random ReLU networks can reconstruct any target network that is $O(\sqrt{\text{width}})$ times smaller. This holds even for randomly sparsified networks, under sufficient overparameterization....

February 15, 2023 · 1196 words · Angeliki Giannou, Shashank Rajput, Dimitris Papailiopoulos
Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

Denoising Diffusion Probabilistic Models for Robust Image Super-Resolution in the Wild

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Diffusion models have been successful on single-image super-resolution and other image-to-image translation tasks. Diffusion models have not outperformed GAN models on blind super-resolution tasks. This paper introduces SR3+, a diffusion-based model for blind super-resolution, establishing a new state-of-the-art. SR3+ uses self-supervised training, noise-conditioning augmentation, a large-scale convolutional architecture, and large-scale datasets. SR3+ outperforms SR3 and Real-ESRGAN, with a DRealSR FID score of 32....

February 15, 2023 · 774 words · Hshmat Sahak, Daniel Watson, Chitwan Saharia, David Fleet
Augmented Language Models: a Survey

Augmented Language Models: a Survey

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models (LMs) can be augmented with reasoning skills and the ability to use tools. Augmentations can be used separately or in combination. Augmented LMs (ALMs) can use external modules to expand their context processing ability. ALMs can learn to reason, use tools, and act while still performing standard natural language tasks....

February 15, 2023 · 1298 words · Grégoire Mialon, Roberto Dessì, Maria Lomeli, Christoforos Nalmpantis, Ram Pasunuru and 8 others
Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Tri-Perspective View for Vision-Based 3D Semantic Occupancy Prediction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Modern methods for autonomous driving perception use a bird’s-eye-view (BEV) representation to describe a 3D scene. A tri-perspective view (TPV) representation is proposed to better describe the 3D structure of a scene. A transformer-based TPV encoder (TPVFormer) is used to lift image features to the 3D TPV space. Experiments show that the model can effectively predict the semantic occupancy for all voxels....

February 15, 2023 · 979 words · Yuanhui Huang, Wenzhao Zheng, Yunpeng Zhang, Jie Zhou, Jiwen Lu
The Capacity for Moral Self-Correction in Large Language Models

The Capacity for Moral Self-Correction in Large Language Models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language models trained with reinforcement learning from human feedback have the capability to “morally self-correct” Experiments provide evidence of moral self-correction Capability emerges at 22B model parameters and improves with increasing model size and RLHF training Language models can follow instructions and learn complex normative concepts of harm like stereotyping, bias, and discrimination Results are cause for cautious optimism regarding the ability to train language models to abide by ethical principles Paper Content Introduction Large language models can exhibit harmful social biases Scaling model size can increase model performance Hypothesis: larger models may have the capability to morally self-correct Experiments measure propensity for large language models to use negative stereotypes or discriminate based on protected demographic attributes Capacity for moral self-correction emerges at 22B model parameters Models can be steered to avoid harmful outputs by instructing them to do so Bias Benchmark for QA and Winogender benchmark used to measure stereotype bias and occupational gender bias respectively New benchmark developed to test for racial discrimination Three simple prompt based interventions used Results show bias can be reduced with increasing model size Models can be steered to use gender pronouns that are uncorrelated or correlated with real world statistics Models can discriminate against or in favor of Black students depending on instruction Capacity for moral self-correction exists in models with more than 22B parameters and sufficient RLHF training Related work GPT-2 and T5 language models can self-diagnose stereotype bias and toxicity Self-diagnosis accuracy increases with model size An algorithm for self-debiasing has been proposed Natural language can be used to reduce bias RoBERTa-large does not produce less biased outputs when instructed to do so with natural language interventions Larger models trained with RLHF can produce less biased outputs Prompting GPT-3 can decrease bias on the BBQ benchmark Complex reasoning tasks emerge with model size Models RLHF is a popular technique for reducing harmful behaviors in large language models Amount of RLHF training can significantly change metrics on a wide range of personality, political preference, and harm evaluations Experiments Tests the effect of natural language instructions on two moral phenomena: stereotyping and discrimination Uses two well-known stereotyping benchmarks to measure stereotyping Constructs a new benchmark to measure discrimination based on race in a law school course admission question Bias benchmark for qa BBQ is a set of 58,492 questions designed to test for societal biases against people belonging to protected classes....

February 15, 2023 · 921 words · Deep Ganguli, Amanda Askell, Nicholas Schiefer, Thomas Liao, Kamilė Lukošiūtė and 43 others
How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

How to Train Your DRAGON: Diverse Augmentation Towards Generalizable Dense Retrieval

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Various techniques have been developed to improve dense retrieval. Existing DRs often suffer from effectiveness tradeoffs between supervised and zero-shot retrieval. A generalizable DR can be trained to achieve high accuracy in both supervised and zero-shot retrieval without increasing model size. Common data augmentation practices are often inefficient and sub-optimal. DRAGON is the first BERT-base-sized DR to achieve state-of-the-art effectiveness in both supervised and zero-shot evaluations....

February 15, 2023 · 1078 words · Sheng-Chieh Lin, Akari Asai, Minghan Li, Barlas Oguz, Jimmy Lin and 3 others
Energy Transformer

Energy Transformer

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transformers are the most popular models in machine learning and have achieved impressive performance. The theoretical understanding of transformer building blocks is limited. Dense Associative Memory models have a well-established theoretical foundation but have not achieved impressive practical results. Energy Transformer (ET) replaces the sequence of feedforward transformer blocks with a single large Associative Memory model....

February 14, 2023 · 1282 words · Benjamin Hoover, Yuchen Liang, Bao Pham, Rameswar Panda, Hendrik Strobelt and 3 others