arxiv-summary: AI-summarized AI papers

A System-Level View on Out-of-Distribution Data in Robotics

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Testing conditions can affect the reliability of black-box learned components in robot autonomy. Coping with out-of-distribution (OOD) data is an important challenge for trustworthy learning-enabled open-world autonomy. This paper aims to demystify OOD data and its associated challenges in the context of data-driven robotic systems. Reasoning about the overall system-level competence of a robot in OOD conditions is important....

Feature learning in neural networks and kernel machines that recursively learn features

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural networks have achieved impressive results on many tasks. This paper connects neural feature learning to the average gradient outer product. The paper introduces Recursive Feature Machines (RFMs) which are kernel machines that learn features. RFMs accurately capture features learned by deep fully connected neural networks. RFMs close the gap between kernel machines and fully connected networks....

Sparse Coding in a Dual Memory System for Lifelong Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Humans have neurophysiological mechanisms that enable efficient continual learning. Brain encodes information in non-overlapping sparse codes to facilitate learning of new associations. DNNs use activation sparsity and dropout to mimic sparse coding. Multiple-memory replay mechanism maintains a long-term semantic memory. Paper Content Introduction Standard DNNs are not designed for lifelong learning and exhibit catastrophic forgetting of previously learned knowledge....

NeRN -- Learning Neural Representations for Neural Networks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural Representations can be used to reconstruct a wide range of signals. NeRN is a Neural Representation for Neural Networks. Coordinates are assigned to each convolutional kernel in the network. Smoothness constraint is used to aid NeRN. Knowledge distillation is used to stabilize the learning process. NeRN is demonstrated on CIFAR-10, CIFAR-100, and ImageNet....

Building a Culture of Reproducibility in Academic Research

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Reproducibility is an ideal that all researchers agree with Reproducibility is difficult to achieve in practice The author’s research group has had success building a “culture of reproducibility” Reproducibility efforts should yield easy-to-use, well-packaged, and self-contained software artifacts The primary beneficiaries of reproducibility efforts are those making the investments Social processes and standardized tools are important for achieving reproducibility The dogfood principle ties these ideas together Paper Content Introduction Appeal to self interest instead of altruism Engineer social processes to promote virtuous cycles Build standardized tools to reduce technical barriers Reproducibility is an ideal that no researcher would dispute Today, it’s expected that each paper is accompanied by a code repository Many researchers make model checkpoints publicly available Voorhees et al....

A Generalization of ViT/MLP-Mixer to Graphs

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract GNNs have potential in graph representation learning Standard GNNs have two major limitations ViT/MLP-Mixer architectures can solve these limitations but increase computational cost Graph MLP-Mixer captures long-range dependency and mitigates over-squashing Graph MLP-Mixer is faster and more memory efficient than related models Graph MLP-Mixer is highly expressive and can distinguish non-isomorphic graphs Paper Content Generalizing vit/ml-mixer to graphs overcome mp-gnn limitations MLP-Mixer architecture is designed to capture long-range interaction while keeping low computational cost Generalizing MLP-Mixer from grids and sequences to arbitrary graph topology is challenging Main contribution is to design a novel GNN architecture that captures long-range interaction, keeps low computational complexity, and is isomorphically expressive GNNs have linear learning/inference complexities but low representation power and poor long-range dependency Graph MLP-Mixer overcomes the computational bottleneck of Graph Transformers and solves the issue of long-distance dependency Competitive results on multiple benchmarks Capacity to capture long-range dependency with SOTA performance while keeping low complexity Forms a bridge between CV, NLP and graphs under a unified architecture Generalization challenges MLP-Mixer is adapted from images to graphs Table 1 summarizes the differences between standard MLP-Mixer and Graph MLP-Mixer Graphs cannot be uniformly divided into similar patches across all examples in the dataset Graph patches need to be transformed into a fixed-length vectorial representation Graph patches are unordered and nodes in graph tokens are naturally unordered MLP-Mixer architectures are known to be strong overfitters Overview Graph MLP-Mixer is a computer science architecture Graph MLP-Mixer is composed of a patch extraction module, patch embedding module, mixer layers, global average pooling layer, and a fully-connected layer Graphs are represented by a set of nodes (V) and edges (E) Graphs have a pre-defined number of patches (P) Graph-level vectorial representation (h G ) and graph-level target (y G ) are used for prediction Patch extraction MLP-Mixer can be generalized to graphs by extracting patches....

The Forward-Forward Algorithm: Some Preliminary Investigations

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Aim of paper is to introduce new learning procedure for neural networks Procedure replaces forward and backward passes of backpropagation with two forward passes Each layer has its own objective function to have high goodness for positive data and low goodness for negative data Negative passes could be done offline, making learning simpler and allowing video to be pipelined Paper Content The forward-forward algorithm Forward-Forward algorithm is a greedy multi-layer learning procedure Replaces forward and backward passes of backpropagation with two forward passes Positive pass operates on real data and adjusts weights to increase goodness in hidden layers Negative pass operates on “negative data” and adjusts weights to decrease goodness in hidden layers Goodness function is sum of squared neural activities Aim is to correctly classify input vectors as positive or negative data Layer normalization prevents first hidden layer from being used as input to second hidden layer FF algorithm works in small neural networks with few million connections The backpropagation baseline Experiments use MNIST dataset of handwritten digits 50,000 images used for training, 10,000 for validation 10,000 images used to compute test error rate Simple neural networks trained with backpropagation get 0....

Structure-based drug discovery with deep learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract AI in the form of deep learning has potential for drug discovery and chemical biology AI can be used to predict protein structure and molecular bioactivity, plan organic synthesis, and design molecules Deep learning efforts in drug discovery have focused on ligand-based approaches Structure-based drug discovery has potential to tackle unsolved challenges Advances in deep learning methodologies and availability of accurate predictions for protein tertiary structure advocate for a renaissance in structure-based approaches for drug discovery guided by AI Paper Content Introduction Deep learning is a subfield of AI Uses multi-layer neural networks Used to advance mathematics, investigate galaxies, and generate realistic images Chemistry and biology have seen AI breakthroughs in protein structure prediction, chemical synthesis planning, and atomistic simulations Drug discovery has benefited from deep learning Deep learning can accelerate navigation of chemical space of drug-like molecules Most deep learning studies have focused on ligand-based approaches Structure-based deep learning approaches have not found parallel interest yet Deep learning can help address existing drug discovery challenges Deep learning does not require explicit feature engineering Accurate protein structure prediction efforts can accelerate computer-assisted SBDD Deep learning for SBDD is still in its infancy but is moving forward quickly Representing proteins for deep learning Deep learning approaches for SBDD are more intricate than ligand-based approaches....

Fully Differentiable RANSAC

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposes a fully differentiable $\nabla$-RANSAC Predicts inlier probabilities of input data points Exploits predictions in a guided sampler Estimates model parameters and quality while propagating gradients Random sampler based on Gumbel Softmax sampler Model quality function marginalizes over scores from all models Unlocks end-to-end training of geometric estimation pipelines Trained with LoFTR to find reliable correspondences Tested on real-world datasets for fundamental and essential matrix estimation Superior to state-of-the-art in terms of accuracy and speed Paper Content Introduction Direct optimization on test-time evaluation metric has been beneficial for deep learning in vision tasks Training model directly on evaluation metric is infeasible when metric is nondifferentiable, so training with a surrogate of the metric is used Examples of surrogate losses include average precision and recall@k for image retrieval, perceptual loss for image compression, intersection-over-union loss for object detection, and edit distance loss for scene text recognition RANSAC is widely used for robust estimation in vision pipelines RANSAC variants have been proposed to improve components of the original algorithm ∇-RANSAC is proposed to make RANSAC end-to-end differentiable ∇-RANSAC allows robust estimators to use test-time evaluation metrics to optimize end-to-end training ∇-RANSAC is trained with a detector-free feature matcher, LoFTR, to improve accuracy Fully differentiable ransac Input is set of tentative point correspondences with extra info from detector and matcher Consensus learning via pruning block from recent [89] ∇-RANSAC is an iterative random sampling of m data points Sampler is Gumbel Softmax Sampler using input probabilities as guidance Differentiable minimal solver estimates model parameters from drawn sample Model quality is computed in a supervised way using ground truth Gumbel softmax sampler ∇-RANSAC requires sampling m data points from a set of n total samples....

Large Language Models Encode Clinical Knowledge

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs have been used for natural language understanding and generation. There is no standard to evaluate model predictions and reasoning across tasks. MultiMedQA is a benchmark combining existing open question answering datasets. Human evaluation is proposed to assess model answers. PaLM and Flan-PaLM are evaluated on MultiMedQA and achieve state-of-the-art accuracy. Instruction prompt tuning is introduced to align LLMs to new domains....