arxiv-summary: AI-summarized AI papers

Regularised neural networks mimic human insight

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Humans can show sudden improvements in task performance linked to insight Artificial neural networks can also show insight-like behaviour Insight-like behaviour in neural networks is caused by noise, attentional gating and regularisation Paper Content Introduction Ability to learn from experience is common to animals and some artificial agents Neural networks trained with SGD are a current theory of human learning Humans may sometimes learn in an abrupt manner Insights occur when an agent finds a novel problem solution by restructuring an existing task representation Insights involve unconscious processes becoming conscious Insights can be accompanied by a feeling of relief or pleasure Insights are related to brain regions distinct from those associated with gradual learning Insight-like behaviour can emerge from gradual learning algorithms Insights trigger abrupt behavioural changes Insights occur selectively in some subjects Insights occur “spontaneously” without external cues Regularisation and gating can cause discontinuities in learning in neural networks Results 99 participants and 99 neural networks performed a decision task Task required a binary choice about circular arrays of moving dots Dots were characterized by two features with different degrees of noise Task provided a hidden opportunity to improve one’s decision strategy Initial training phase only motion direction predicted correct choice Later phase both features could be used to determine choice Post-experimental questionnaire asked if participants noticed a rule, how long it took, and if they paid attention to colour Human behaviour Participants learned the response mapping for the four motion directions well Noise was added to the motion, while the colour remained uncorrelated Performance was heavily diminished in the conditions with the largest amounts of motion noise Performance improvements largely beyond these low baseline levels can only be attributed to colour use Noise level continued to influence performance in the motion and colour phase Onset of the colour correlation triggered performance improvements across all coherence levels 57....

On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract ChatGPT is a chatbot service released by OpenAI Robustness of ChatGPT is unclear Evaluated from adversarial and out-of-distribution perspective Results show ChatGPT does not have consistent advantages ChatGPT performs well on translation tasks ChatGPT provides informal suggestions for medical tasks Paper Content Introduction Large language models (LLMs) have achieved significant performance on NLP tasks LLMs have in-context learning capability ChatGPT is a chatbot service released by OpenAI It has attracted over 100 million users Evaluating potential risks behind ChatGPT is important Robustness refers to the ability to withstand disturbances or external factors Robustness threats include OOD samples, adversarial inputs, long-tailed samples, and noisy inputs This paper evaluates ChatGPT’s adversarial and OOD robustness Zero-shot robustness evaluation is used Results show ChatGPT has consistent advantage on adversarial and OOD classification tasks Performance is far from perfection, indicating room for improvement Background Foundation models are used for natural language processing tasks ChatGPT is a generative foundation model in the GPT-3....

$PC^2$: Projection-Conditioned Point Cloud Diffusion for Single-Image 3D Reconstruction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Reconstructing 3D shape from single RGB image is a challenging problem in computer vision. Proposed method generates sparse point cloud via a conditional denoising diffusion process. Method takes input of single RGB image and camera pose. Projection conditioning process enables high-resolution sparse geometries that are well-aligned with input image. Method can generate multiple different shapes consistent with single input image....

RealFusion: 360° Reconstruction of Any Object from a Single Image

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of reconstructing a full 360° photographic model of an object from a single image. Fitting a neural radiance field to the image is severely ill-posed. Using an approach inspired by DreamFields and DreamFusion to fuse the given input view, the conditional prior, and other regularizers in a final, consistent reconstruction. Demonstrating state-of-the-art reconstruction results on benchmark images....

Optical Transformers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Renewed and growing interest in alternatives to digital computers to reduce energy cost of running neural networks Optical matrix-vector multipliers best suited to performing computations with large operands Small-scale optical experiments with prototype accelerator to demonstrate Transformer operations can run on optical hardware Simulations and experiments explored energy efficiency of optical implementations of Transformers Optical energy per multiply-accumulate (MAC) scales as $\frac{1}{d}$ where $d$ is the Transformer width With well-engineered, large-scale optical hardware, possible to achieve $100 \times$ energy-efficiency advantage for running some of the largest current Transformer models Assumptions about future improvements to electronics and Transformer quantization techniques could grow optical computers’ advantage to $>100,000\times$ Paper Content Introduction Deep learning models are becoming increasingly large, leading to concerns about energy usage, speed, and practicality....

Meta-World Conditional Neural Processes

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a model called Meta-World Conditional Neural Processes (MW-CNP) Model enables an agent to sample from its own “hallucination” Aim is to reduce the agent’s interaction with the target environment at test time Obtain a latent representation of the transition dynamics from a single rollout from the test environment Few-shot learning by interacting with the “hallucination” generated by the meta-world model Agent can adapt to an unseen target environment with fewer samples than baselines Agent does not have access to the task parameters throughout training and testing MW-CNP is trained on offline interaction data logged during meta-training Paper Content Introduction Fast adaptation under uncertainty has various applications in the real-world Robustness in these settings requires a diverse set of skills World model generation is an integral part of open-ended learning Meta-reinforcement learning is a promising direction for increasing robustness Propose generating world models from the agent’s experience to reduce sample efficiency Related work Meta-learning provides a framework for learning new tasks more efficiently from limited experience Meta-RL framework assumes tasks share a common structure but differ in transition dynamics Context-based meta-RL methods assume each task can be represented by a low-dimensional context variable Memory-augmented models and expert demonstrations are used to learn a contextual representation MAML is a gradient-based meta-learning algorithm NORML is an extension of MAML-RL for settings where the environment dynamics change CNP model is used to represent a family of functions using Bayesian Inference Preliminaries Conditional neural processes Parameter sharing encoders and a decoder (query network) make up the architecture of CNPs Random input and true output pairs are sampled from a function Encoder networks turn each pair into a latent representation Average of the representations is obtained for invariance Target input query is concatenated with the average representation and fed to the query network Query network outputs predicted mean and standard deviation for the queried input No-reward meta learning NORML learns a pseudo-advantage function to guide meta-policy adaptation The pseudo-advantage function is used to compute task-specific parameters from a set of state transitions without reward signal Proposed method: meta-world conditional neural process (mw-cnp) Problem setup: Few-shot learning, goal is to quickly adapt to unseen target task using few labeled data in target environment Meta-World Conditional Neural Processes (MW-CNP): Reduce number of samples required from target environment, generate world models from fewer samples, use to obtain inexpensive rollouts for finetuning at test time Offline meta-learning: Store transitions for each task as set of observations without task parameter MW-CNP training: Unlabeled batches of Markov Decision Process (MDP) tuples collected during online meta-learning MW-CNP: Obtain latent representation of hidden environment transition function, concatenate with [s q , a q ] to predict distribution parameters of next state, use to generate rollouts Finetuning: Feed experiences from generated world model to learned advantage function, finetune meta policy for fast adaptation to target task using estimated advantage values and combined set of MW-CNP generated rollouts and single target environment rollout Experiments MW-CNP requires less interaction with the environment than NORML MW-CNP can adapt quickly to unseen tasks 2d point agent with unknown artificial force field Goal of the point agent is to move to position [x=1,y=0] on a 2D plane Meta-RL setting used with same reward function across multiple tasks Different tasks created by generating different artificial force fields 5000 tasks defined over [-π, π] interval Agent initially trained across distribution of 5000 tasks Tested in unseen target task Oracle agent uses 25 actual rollouts from target environment NORML and MW-CNP use 1 rollout for fine-tuning meta-policy MW-CNP outperforms NORML when using same amount of actual rollouts MW-CNP uses 25 mixed rollouts, similar to Oracle NORML Samples generated from MW-CNP can be used for finetuning meta-policy Results not symmetric across meta-tasks due to gradient bias Walker-2d randomized agent dynamics parameters Evaluated MW-CNP in Walker-2D-Rand-Params environment Parameters sampled from uniform distribution range 40 tasks sampled for meta-training, 100 for meta-testing Figure 9 shows post-update reward in meta-testing Table 2 shows increased sample efficiency and meta-test adaptation performance Conclusion MW-CNP framework can be used to collect meaningful hallucinated rollouts MW-CNP performance matched ORACLE and outperformed it in complex tasks MW-CNP model generated samples from fewer MDP tuples, increasing sample efficiency Using generated data for meta-updates increases sample efficiency and can yield superior performance Extension of work to high-dimensional sensorimotor spaces is an interesting research direction

Over-Parameterization Exponentially Slows Down Gradient Descent for Learning a Single Neuron

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of learning a single neuron with ReLU activation under Gaussian input with square loss is revisited. Over-parameterization setting (student network has $n\ge 2$ neurons) is focused on. Global convergence of randomly initialized gradient descent with a $O\left(T^{-3}\right)$ rate is proven. $\Omega\left(T^{-3}\right)$ lower bound for randomly initialized gradient flow in the over-parameterization setting is presented....

A Large Scale Homography Benchmark

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Pi3D dataset consists of 1000 planes observed in 10,000 images from 1DSfM dataset HEB dataset consists of 226,260 homographies and 4M correspondences Applications of Pi3D dataset include training/evaluating monocular depth, surface normal estimation and image matching algorithms HEB dataset used to evaluate robust estimators and deep learning-based correspondence filtering methods Paper Content Introduction Planar homography is a projective mapping between images of co-planar 3D points Homography encodes intrinsic and extrinsic camera parameters and parameters of the underlying 3D plane Homography plays an important role in multiple view geometry, calibration, metric rectification, augmented reality, optical flow, video stabilization, and Structure-from-Motion Traditional approach of finding homographies in image pairs consists of two stages: feature points detection and matching, and robust estimation Existing datasets for evaluating homography estimators are Homogr, ExtremeView, and HPatches Proposed dataset is Pi3D, consisting of 1046 large planes in 3D, and HEB, containing 226 260 homographies Dataset can be used to evaluate uncertainty of partially or fully affine covariant feature detectors Planes in 3d dataset Dataset of 3D planes in scenes consisting of thousands of real-world photos Homography evaluation benchmark Tentative correspondences are obtained from mutually nearest RootSIFT matches Input information is a set of N correspondences Dataset is split into two disjoint parts: training set (2 scenes) and test set (9 scenes) 80% of homographies have at most 0....

TBPos: Dataset for Large-Scale Precision Visual Localization

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Image based localization is a computer vision challenge Datasets consist of a 3D database and query images Query images and 3D database are usually acquired with different cameras Ground truth poses between query images and 3D database are hard to acquire This paper proposes a dataset with accurate ground truth poses Dataset is evaluated with an image-based localization pipeline Paper Content Introduction Image based localization is used for autonomous vehicles, augmented reality, and robotics Advances in deep learning have improved the precision of image based localization Two main directions of development to increase the level of challenge for state-of-the-art visual localization algorithms: introducing more challenging datasets and reducing the pose correctness threshold Accurate ground truth is necessary for pose correctness assessment Proposed dataset uses 3D laser scanner data to generate query images and exact ground truth poses Benchmarking of the proposed dataset using a visual localization pipeline Results show that view synthesis enables generating challenging query images from viewpoints where traditional query image acquisition could not provide reliable ground truth Datasets Several datasets have been published for large-scale visual localization Ground truth poses for query images have been acquired in different ways Aachen Day-Night dataset v1....

Composer: Creative and Controllable Image Synthesis with Composable Conditions

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Generative models can create incredible images but lack control. This work offers a new paradigm that allows flexible control of the output image. Compositionality is the core idea, decomposing an image into factors and training a diffusion model. At inference, the representations work as composable elements, leading to a huge design space....