arxiv-summary: AI-summarized AI papers

Structural Multiplane Image: Bridging Neural View Synthesis and 3D Reconstruction

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Multiplane Image (MPI) is an effective and efficient representation for view synthesis from sparse inputs. Structural MPI (S-MPI) approximates 3D scenes concisely and bridges view synthesis and 3D reconstruction. Challenges include high-fidelity approximation, multi-view consistency, non-planar regions modeling, and efficient rendering. Transformer-based network proposed to predict compact and expressive S-MPI layers. Experiments show method outperforms previous state-of-the-art MPI-based view synthesis methods and planar reconstruction methods....

Product Jacobi-Theta Boltzmann machines with score matching

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Estimation of probability density functions is a difficult task. Machine learning techniques have been used to tackle this task. Boltzmann machine (BM) architecture has been used for successful applications. Product Jacobi-Theta Boltzmann machine (pJTBM) is a restricted version of the Riemann-Theta Boltzmann machine (RTBM). Score matching, based on the Fisher divergence, can be used to fit probability densities with the pJTBM more efficiently than with the original RTBM....

Self-NeRF: A Self-Training Pipeline for Few-Shot Neural Radiance Fields

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract NeRF is a method for synthesizing novel views from a dense set of images NeRF is limited by the need for numerous calibrated views and its accuracy decreases in a few-shot setting Self-NeRF is proposed to address this challenge, which iteratively refines the radiance fields with few input views Uncertainty-aware NeRF is constructed with specialized embeddings and cone entropy regularization to leverage the pseudo-views Self-NeRF is robust to input with uncertainty and outperforms existing methods when trained on limited data Paper Content Introduction Synthesizing novel camera views is an important task in computer vision....

An Overview on Language Models: Recent Developments and Outlook

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Language modeling studies probability distributions over strings of texts Used in text generation, speech recognition, machine translation Conventional language models (CLMs) predict probability of linguistic sequences in a causal manner Pre-trained language models (PLMs) cover broader concepts and can be used for both causal sequential modeling and fine-tuning PLMs have their own training paradigms and serve as foundation models in modern NLP systems Overview paper provides introduction to CLMs and PLMs from five aspects Discusses relationship between CLMs and PLMs and future directions of language modeling in pre-trained era Paper Content Introduction Language modeling studies probability distributions over sequences of words Used in many computational linguistic problems Two major approaches: statistical and data-driven Conventional language models (CLMs) predict probability of linguistic sequences in a causal manner Data-driven approach uses neural-network models, leading to pre-trained language models (PLMs) Five perspectives: linguistic units, structures, training methods, evaluation methods, applications CLMs attempt to predict next linguistic unit in a text sequence given its preceding contexts Represented by characters, words, phrases, etc....

Scaling up GANs for Text-to-Image Synthesis

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Text-to-image synthesis has been successful and captured public imagination GANs used to be the favored architecture for generative image models Auto-regressive and diffusion models have become the new standard Can GANs be scaled up to benefit from large datasets? GigaGAN is a new GAN architecture that is faster and can synthesize high-resolution images GigaGAN supports latent space editing applications Paper Content Introduction Recently released models have achieved high levels of image quality and model flexibility....

Users are the North Star for AI Transparency

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Widespread calls for transparent AI systems lack precision. Stakeholders often talk past each other. A clear ideal of AI transparency is needed. Literature survey identifies clusters of similar conceptions of transparency. Common threads across all clusters provide clearer common language. Paper Content Introduction Calls for greater transparency in AI systems Term “transparency” overloaded with distinct meanings AI’s suitcase words Concrete aims and advances must be expressed in more precise language Transparency invoked in connection with data collection, data processing, interpretable systems, fairness issues EU regulations (GDPR, Ethics Guidelines for Trustworthy AI) vague demands for “meaningful information” and “comprehensible language” Ideal AI transparency gives users and stakeholders tools to decide if AI system and decisions are trustworthy Three overarching factors with which transparency is invoked: data, systems, outputs Data-related transparency factors Focus on inputs required to produce AI system Explore ways to balance transparency and user concerns about data privacy and security Distinguish between works focused on information about training data and active use of user data Transparency on model training data Machine learning systems are influenced by their training data Policymakers are mandating disclosures about training data Record transparency is achieved by describing datasets Use transparency is communicating the specific purposes for which a dataset is appropriate Disambiguation of terminology, visualization, and logging systems are useful for disclosure/data-provisioning transparency Rules and norms are needed to ensure AI transparency for users Transparency on the handling of user data AI systems need user data to function Demand for transparency around user data use is natural US data policy is largely unregulated EU has taken a more active approach with GDPR Classic security research findings are applicable Companies address consumer privacy and security to win favor Tension between user desires and business realities Need strong societal norms and regulation to resolve tension Transparency tools needed to enable users to exercise rights System-centered transparency factors Practitioners need to debug models or reproduce results Users need a basic overview of a system’s function ML systems are black boxes, hindering explainability Neural networks are dominant in AI research Automated rationale generation is popular System function disclosure System function disclosure includes communications about system capabilities and limitations Target audiences include external developers, lay users, and regulatory bodies ACM considers disclosure required in its Code of Ethics Frameworks for concise communication of model strengths and limitations are important Qualitatively evaluate disclosure sufficiency with rubrics or automatically assess layperson-comprehensibility Experts don’t always know how black box systems work Clarity from data providers needed to communicate problems of systems Solicit direct user feedback to contextualize failure cases Different groups require different explanations and have different levels of expertise Explainable ai and causality Transparency is connected to explainable AI techniques Simple ML models are explainable, but lack flexibility Neural networks can be made more interpretable by converting their internal weight matrices or distillation Attention mechanisms are often presented as interpretable Input influence methods are often used as explainability techniques Counterfactual reasoning can be used to explain decisions without altering black-box models Explanations should establish a causal interpretation to be honest and user-centric Generated rationales Introduce automated rationale generation as an alternative form of explainability Rationales provide insight into language models’ reasoning abilities Rationales can help users fact-check outputs to mitigate potential for misinformation Multiple failure modes exist for these rationales, including hallucination Evaluating NLG explanations is a challenge, users must be educated on degree of trust Output-oriented transparency factors Output-oriented transparency is focused on providing good system performance for stakeholders....

Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Offline reinforcement learning can be used to obtain a policy initialization from existing datasets. Existing offline RL methods tend to have poor online fine-tuning performance. Online RL methods struggle to incorporate offline data. Calibrated Q-learning (Cal-QL) provides a lower bound on the true value function of the learned policy and an upper bound on the value of some other (suboptimal) reference policy....

Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract LLMs are used to generate content for a range of tasks Need to ensure models are aligned with human preferences and do not produce unsafe, inaccurate or toxic outputs Alignment techniques can mitigate safety concerns and improve model capabilities Personalising LLMs through micro-level preference learning processes may result in models better aligned with each user Normative challenges in defining bounds of societally-acceptable and safe degree of personalisation Paper Content Introduction LLMs have improved in recent years and are being used in a wide range of applications ChatGPT was released in 2022 and reached over 100 million users in two months LLMs are being aligned with human preferences using reward learning More reward learning can lead to LLMs having polarised views on certain topics Current implementations of reward learning are limited and rely on a small group of people This paper presents a taxonomy and policy framework for explicit personalisation of LLMs Personalised LLMs could provide tailored assistance and adapt to diverse groups Personalised LLMs could also reinforce biases and narrow information diets A policy framework is needed to govern personalised LLMs safely and ethically Background AI systems are being applied to complex tasks Alignment is desirable to avoid undesirable behaviours Alignment is a technical challenge Recent works have tried to align LLMs with human preferences Three axes of alignment: what, what, who Implicit personalisation is occurring to meet expectations of non-representative crowdworkers From implicit to explicit personalisation Given the diversity of human values and preferences, and the importance of pragmatics for contextual understanding, the aim to fully align models across human populations may be a futile one....

Kernel Regression with Infinite-Width Neural Networks on Millions of Examples

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Neural kernels have improved performance on different data modalities. Neural kernels require more compute, limiting their application to smaller datasets. This work massively parallelizes neural kernel computation across many GPUs. This approach enables kernel regression on large datasets (up to 5 million examples). Results on protein and small molecule prediction tasks are competitive with SotA methods....

Brain-Diffuser: Natural scene reconstruction from fMRI signals using generative latent diffusion

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Investigating how to use generative AI for neural decoding Two-stage scene reconstruction framework called “Brain-Diffuser” First stage reconstructs images with low-level properties and overall layout Second stage uses image-to-image framework of latent diffusion model Outperforms previous models on Natural Scenes Dataset benchmark Creates “ROI-optimal” scenes consistent with neuroscientific knowledge Paper Content Introduction Establishing neural encoding and decoding techniques is one way to discover how the brain and cognition work Recent developments in modeling and computation have opened up new ways of decoding information from brain signals Studies have used statistical techniques and machine learning to decode information from fMRI Deep generative models have been used to reconstruct entire images Variational Autoencoders (VAE), Generative Adversarial Networks (GAN), and Latent Diffusion Models (LDM) have been used Images with different levels of complexity have been reconstructed, such as faces, single-object-centered images, and complex scenes Two datasets have been used for natural scene reconstruction: Generic Object Decoding and Deep Image Reconstruction Brain-Diffuser model proposed to generate scene images with high fidelity Brain-Diffuser model uses two stages and is conditioned on both vision and language representations Brain-Diffuser model demonstrates superior performance compared to earlier models Materials and methods Dataset Used publicly available Natural Scenes Dataset (NSD) Collected from 8 subjects viewing images from COCO dataset Used 4 subjects who completed all trials Training set contained 8859 images and 24980 fMRI trials Test set contained 982 images and 2770 fMRI trials Averaged fMRI trials for images with multiple repetitions Used corresponding captions from COCO dataset Used Generalized linear models to estimate preprocessed single-trial beta weights Masked preprocessed fMRI signals using NSDGeneral ROI mask Low-level reconstruction of images using vdvae (first stage) VAE is a generative model used to capture an input distribution VDVAE is a hierarchical VAE model with several layers of conditionally dependent latent variables Equations 1 and 2 show the hierarchical dependence of the latent variables VDVAE is trained on a 64x64 resolution ImageNet dataset with 75 layers Latent variables from the first 31 layers are used for regression Ridge regression model is trained between fMRI training patterns and concatenated latent variables Test fMRI patterns are provided to the trained regression model to predict latent values Latent values are fed to the decoder part of the VDVAE to obtain reconstructed images Final reconstruction of images using versatile diffusion (second stage) Used VDVAE to reconstruct image layout Used Versatile Diffusion 28 model in second stage of reconstruction framework Versatile Diffusion is a latent diffusion model Autoencoder trained on large-scale image dataset to learn compressed representation of images Forward diffusion process applied to latent variables by adding Gaussian noise Reverse diffusion process learned via neural network to predict and remove noise Versatile Diffusion model allows for conditioning on text captions, images, and semantic maps Versatile Diffusion model trained on Laion2B-en 32 dataset CLIP network based on transformer architecture Two regression models trained between fMRI patterns and CLIP-Vision/Text features Image-to-image pipeline of latent diffusion model used at testing time CLIP-Vision and CLIP-Text used jointly in double-guided diffusion pipeline Code availability Code for project is publicly available Examples of reconstructions in Figure 3 Results from individual subjects and average of all subjects Reconstructed images capture layout and semantics of groundtruth images Differences in pixel-level details remain Reconstructed images are naturalistic and plausible alternate renditions of ground truth Results and analyses Comparison with state of the art Lin et al....