arxiv-summary: AI-summarized AI papers

We are Going to the Space -- Part 1: Which device to deploy in a satellite?

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Components used in satellites have become smaller, making them more widely available. Smaller organizations can now deploy satellites with data-intensive applications. Image analysis is a popular application used on satellites. Resource-constrained nature of devices on satellites creates challenges. This paper investigates performance of edge devices for deep-learning-based image processing in space. Hardware accelerators are necessary to meet latency requirements....

SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract A new dataset, SlideVQA, has been proposed for developing document VQA systems. SlideVQA contains 2.6k+ slide decks composed of 52k+ slide images and 14.5k questions. SlideVQA requires complex reasoning, including single-hop, multi-hop, and numerical reasoning. Annotated arithmetic expressions of numerical answers are provided to enhance numerical reasoning. A new end-to-end document VQA model has been developed....

Causal Abstraction for Faithful Model Interpretation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Explanations of AI models must be both human-intelligible and consistent with the model’s internal structure. Theory of causal abstraction provides the mathematical foundations for these explanations. Contributions include generalizing causal abstraction to cyclic structures, using multi-source interventions, defining approximate causal abstraction, and formalizing XAI methods. Paper Content Introduction XAI seeks to explain why deep learning models make the predictions they do Causal analysis is the gold standard for explaining model behavior and internal reasoning Low-level causal explanations of behavior and internal reasoning can be easily provided, but are not interpretable to humans High-level explanations are easier to interpret, but difficult to trust Causal abstraction provides a framework for analyzing a system at multiple levels of detail simultaneously Causal abstraction has been applied to deep learning AI models, weather patterns, and human brains This paper develops the theory of causal abstraction as a mathematical framework for XAI Low-level variables are partitioned into clusters, each associated with a high-level variable Approximate causal abstraction is explored, connecting interchange intervention analysis with existing definitions Faithful and interpretable causal explanations of ai Causal explanations are privileged when explaining how an artifact works Causal explanations allow for manipulation and control of the system Appropriate level of abstraction is important for causal explanations Intervention is a fundamental operation of causal explanations Causal abstraction supports interpretable explanations of AI Faithfulness is defined as the degree to which an explanation accurately represents the ’true reasoning process behind a model’s behavior' Methods for explaining ai behavior AI model behavior is a function from inputs to outputs Behavior can be represented by a two-variable causal model XAI methods learn interpretable models to approximate uninterpretable models XAI methods are model-agnostic and provide same explanations for models with same behavior Need to ground notions of faithfulness in causality to compare XAI methods Methods for explaining the internal structure of ai AI models have internal reasoning that can be represented as a program or algorithm Recent research aims to understand the causal mechanisms inside black box models Causal abstraction provides a mathematical foundation for understanding the high-level semantics of neural representations Interchange interventions are used to show that neural representations represent propositional content Iterative nullspace projection is used to evaluate whether neural representations encode concepts with ‘mental’ causes and effects Causal mediation analysis is used to analyze gender bias in pretrained language models Circuit-based explanations reverse engineer the mechanisms of a network at the level of individual neurons Probing is used to determine whether a concept is present in a neural representation Feature attribution methods ascribe scores to neural representations to capture their ‘impact’ on model behavior Causal models Notation: V denotes a set of variables, X denotes a variable, x denotes a value, Val(X) denotes the range of possible values for X No two variables can take on the same value Capital letters denote variables, lower case letters denote values, bold letters denote sets of variables/values Domain(f), Uniform(X), ½[ϕ] are useful constructs Projection: given a partial setting u for a set of variables U, Proj(u, X) is the restriction of u to the variables in X Definition 4: causal model is a pair (V, F) where V is a set of variables and F is a set of structural functions Remark 5: no explicit reference to a graphical structure defining a causal ordering on the variables Remark 6: acyclic model notation Definition 7: set of solutions is the set of all v ∈ Val(V) such that all equations v = f V (v) are satisfied Definition 8: intervention is a partial setting i ∈ Val(I) for I ⊆ V, M i is just like M except f X is replaced with constant function v → Proj(i, X) for each X ∈ I Example of causal models: a symbolic algorithm and neural network Two causal models are defined to demonstrate potential to model a variety of computational processes The first model is a tree-structured algorithm The second model is a fully-connected feed-forward neural network Both models solve the same task Hierarchical equality task Hierarchical equality task is to determine if two pairs of objects have identical relations Input is two pairs of objects, output is True if both pairs are equal or unequal, False otherwise Domain of objects consists of triangle, square, and pentagon Obvious tree-structured symbolic algorithm solves the task Equality reasoning is ubiquitous and has been studied for broader questions about relational reasoning Hierarchical equality serves as a case study for explaining how abstract tree-structured composition can be implemented by a fully-connected neural network Neural network is trained to implement the hierarchical equality task A tree-structured algorithm for hierarchical equality Algorithm A consists of four input variables and one output variable Acyclic causal graph is depicted in Figure 1a Each f Xi is a constant function Default total setting is [ , , , , True, True, True] Counterfactual result is [ , , △, , True, True, True] A fully connected neural network for hierarchical equality Neural network N consists of 8 input neurons Values for each variable are real numbers R 4 sets of variables for first 4 layers Constant function f R k for 1 ≤ k ≤ 8 Output neurons determined by network weights Network outputs True/False based on output logit values Causal abstraction and interchange intervention analysis Structural conditions must be in place for H to be a high-level abstraction of the low-level model L N and A must be present from the previous section Alignments between causal models Abstraction involves associating high-level variables with clusters of low-level variables Alignment between low-level and high-level causal models is introduced Alignment consists of a partition and a family of maps Alignment induces a unique translation Translation is a partial function from low-level interventions to high-level interventions Low-level interventions that correspond to high-level interventions are defined by cell-wise maps Causal consistency and constructive abstraction Definition 10: An alignment between two models is consistent if the high-level intervention corresponding to a low-level intervention results in the same high-level total settings....

Does progress on ImageNet transfer to real-world datasets?

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Investigating if progress on ImageNet transfers to real-world datasets Evaluating ImageNet pre-trained models with varying accuracy on six practical image classification datasets Datasets collected with the goal of solving real-world tasks Higher ImageNet accuracy does not consistently yield performance improvements Data augmentation can improve performance even when architectures do not Paper Content Introduction ImageNet is a widely used dataset in machine learning AlexNet’s success in 2012 re-popularized neural networks ImageNet is still a main benchmark for computer vision models Machine learning community has invested effort into increasing performance on ImageNet Question of whether the community is over-optimizing for this dataset Early methodological innovations transferred more broadly to other tasks Goal of paper is to investigate transfer of neural network architecture to real-world data Consider classification tasks derived from image data collected with goal of classification in mind Related work Previous work has investigated the effect of architecture on transferability of ImageNet-pretrained models to different datasets....

Dynamics of a data-driven low-dimensional model of turbulent minimal Couette flow

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Navier-Stokes equations are dissipative, meaning the long-time dynamics of a flow can be described in a low-dimensional coordinate system. Data-driven manifold dynamics modeling method can be used to describe the dynamics of turbulent Couette flow. Autoencoder used to find low-dimensional manifold coordinate system and neural network used to define ordinary differential equations....

Continual Few-Shot Learning Using HyperTransformers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Learning from multiple tasks sequentially HyperTransformer (HT) generates specialized task-specific CNN weights from support set Re-use generated weights as input to HT for next task Continual HyperTransformer method with prototypical loss can learn and retain knowledge about past tasks Paper Content Introduction Involves learning from a continuous stream of tasks with a small number of examples Retains previously learned information Useful for classifying a large number of classes with limited observations Can enable robots to adapt to changing environments Can allow for privacy-preserving learning Uses HyperTransformer (HT) to meta-learn from episodes HT decouples domain knowledge model from learner Proposed modification is Continual HyperTransformer (CHT) CHT updates CNN weights with new task information while retaining knowledge of previous tasks Uses prototypical loss to maintain and update set of prototypes Evaluated in three realistic scenarios Does not suffer from catastrophic forgetting Performance of given weight improves with more tasks Can be used to learn a larger number of tasks than originally trained for Related work Few-shot learning can be divided into two categories: metric-based learning and optimization-based learning Metric-based methods treat every task independently and have no memory of the past tasks Optimization-based methods learn an initial fixed embedding, which is later adapted to a specific task Continual learning methods can be grouped into three categories: rehearsal, regularization and architectural Rehearsal methods inject replay data from past tasks or distill a part of a network Regularization methods introduce an explicit regularization function when learning new tasks Architectural methods modify the network architecture with additional task-specific modules Our approach reuses the same principle that made HT work: decoupling the specialized representation model from the domain-aware Transformer model Incremental few-shot learning adapts a few-shot task to an existing base classifier without forgetting the original data Continual few-shot learning Problem of continual few-shot learning Given a series of tasks, each task has K classes and N labeled examples for each class Classes for different tasks can be chosen in different ways Learner needs to produce CNN weights based on support set and previously generated weights Task-incremental learning: identify class attribute given sample and task attribute Class-incremental learning: identify class and task attributes of samples Test performance of trained model on episodes from holdout set of classes HyperTransformer (HT) uses self-attention mechanism to generate CNN weights Continual HyperTransformer (CHT) extends HT to handle continual stream of tasks CHT uses generated weights from past tasks as input weight embeddings CHT generates weights suited for all tasks up to current task CHT can be trained on T tasks and run for any number of tasks τ < T CHT can be recurrent and generate weights for additional tasks beyond the ones it was trained Prototypical loss Algorithm 1 is used to perform class-incremental learning using HyperTransformer with Prototypical Loss Cross-entropy loss is not suitable for continual learning Domain-incremental learning is an option, but it causes collisions Accuracy decreases dramatically with increasing number of tasks Loss function needs to decouple class predictions while keeping embedding space fixed Solution is to learn location of class prototypes from support set Task-incremental and class-incremental learning scenarios Loss is minimized by negative log probability of chosen softmax Prototypes are frozen after being computed Privacy-preserving scenario possible with weights and prototypes Connection between prototypical loss and maml The core idea behind the prototypical loss is a special case of a 1-step MAML-like learning algorithm....

Quantifying the Technological Foundations of Economic Complexity

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Problem of inferring technological structure of production processes unsolved Empirical literature focuses on outputs instead of transformative processes Method developed to quantify technological sophistication Method measures degree of synergistic interaction between inputs Synergistic technologies more sophisticated Synergy scores predict export-based measures of economic complexity Paper Content Introduction Transformative process of turning inputs into outputs is present in various systems Different disciplines analyze it through distinct frameworks and tools Technological sophistication is a building block in the study of economic complexity Quantifying degree and structure of technological sophistication is important to understand production systems Developed a method to quantify technological sophistication of production processes Method empirically infers nature of input-input and input-output relationships Method facilitates quantification of technological sophistication and estimation of interaction networks Method is of general purpose and can be adapted to other contexts Knowledge gap Technological change can be described as a recombinant process Evidence supports this process Technological sophistication is related to this process Empirical evidence is domain-specific Generating reliable estimates requires big data Production models assume input-input and input-output interactions Limitations of this approach have been discussed Supply-chain surveys have been conducted to determine dependence on inputs Systems engineering has developed the concept of Design Structure Matrices Data-driven approach needed to infer interaction structures Economic complexity literature has emerged Economic Fitness Index and Economic Complexity Index are gold standard These frameworks provide proxy for technological sophistication Production processes remain obscure Quantification of technological sophistication is a problem for multiple disciplines Need for generalizable quantitative methods to address production process Framework and data Estimate mutual information between pairs of inputs in a given industry Decompose contribution to output into different modes of information sharing Focus on synergistic information, i....

Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Self-supervised learning in vision-language processing uses semantic alignment between imaging and text. Prior work in biomedical VLP mostly used single image and report pairs. This work uses a CNN-Transformer hybrid multi-image encoder. The model excels on downstream tasks and achieves state-of-the-art performance. A novel multi-modal temporal benchmark dataset was released to quantify the quality of vision-language representations....

ChatGPT is not all you need. A State of the Art Review of large Generative AI models

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Generative models are revolutionizing several sectors. Generative AI can transform texts to images, 3D images, audio, code, scientific texts, and create algorithms. Paper Content Introduction Generative AI is a type of artificial intelligence that can generate novel content. Expert systems used an if-else rule database to generate content. Generative AI models use a discriminator or transformer model trained on a corpus or dataset....

An Analysis of Quantile Temporal-Difference Learning

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract QTD is a distributional reinforcement learning algorithm QTD has been successful in large-scale applications QTD updates are non-linear and may have multiple fixed points This paper provides a proof of convergence to the fixed points of a related family of dynamic programming procedures with probability 1 Paper Content Introduction Distributional reinforcement learning predicts the full probability distribution of future returns A widely-used family of algorithms for this is based on learning quantiles of the return distribution This approach has been successful in combination with deep reinforcement learning Little is known about its behaviour from a theoretical viewpoint QTD updates rely on asymmetric L1 losses This paper proves the convergence of QTD The proof uses stochastic approximation theory with differential inclusions The paper also analyses the limit points of QTD and bounds the approximation error Background Introduce background concepts Introduce notation Markov decision processes Finite state and action spaces Transition kernel and reward distribution Discount factor Policy determines trajectory distribution Predicting expected returns and the return distribution Return is a random variable with sources of randomness from actions, state transitions, and rewards A single scalar summary of performance is given by the expectation of the return Distributional reinforcement learning is concerned with learning the probability distribution over returns Learning the probability distribution provides more signal for an agent to learn from Distributional reinforcement learning algorithms typically work with a subset of distributions that are amenable to parametrisation on a computer Monte carlo and temporal-difference learning TD learning is an approximation of Monte Carlo learning TD learning uses a bootstrapped approximation of the return TD learning is lower variance than Monte Carlo learning TD learning shares information across states Quantile temporal-difference learning and quantile dynamic programming Presenting main algorithms of study Exploring computer science paper Quantile regression Monte Carlo algorithm can be adapted to learn about the return distribution Probability distribution representation is used to approximate the return distribution Quantile temporal-difference learning uses an equally-weighted mixture of Dirac deltas Quantile-based approach aims to have particle locations approximate certain quantiles of η π (x) Generalised inverse CDF of ν is used to uniquely specify a quantile for each level τ Quantile regression loss is used to estimate τ-quantiles of the return distribution Negative gradient of the loss is used to motivate an update rule Update rule is essentially the application of stochastic gradient descent method for quantile regression Quantile temporal-difference learning Quantile temporal-difference learning algorithm (QTD) is a modification of the Monte Carlo algorithm QTD uses an approximate sample from the return distribution derived from an observed transition QTD updates its parameters on the basis of sample transitions QTD updates include a distinct term τ i QTD provides theoretical guarantees as to what the algorithm will converge to Motivating examples QTD can exhibit a variety of behaviours in small environments QTD can be used to estimate Gaussian random variables QTD can be used to estimate the median of a distribution QTD can converge to a point or a set depending on the environment QTD’s behaviour can be affected by the ‘smoothness’ of the reward distributions QTD can perform a random walk over a set in certain environments Quantile dynamic programming QTD update given in Equation (10) moves θ(x, i) in the direction of the τ i-quantiles of the distribution of the random variable R + θ(X , J)....