Link to paper The full paper is available here.
You can also find the paper on PapersWithCode here.
Abstract Bayesian Learning Rule provides a framework for generic algorithm design Difficult to use due to parameterization, gradients, and updates Extension based on Lie-groups simplifies difficulties New algorithm for deep learning with desirable attributes Exploits Lie-group structures for new algorithm design Paper Content Introduction Bayesian Learning Rule (BLR) provides a general framework to derive algorithms from optimization, deep learning, and graphical models BLR uses natural-gradient descent to find approximations of the generalized posterior distribution BLR has been used to design new algorithms for uncertainty estimation in deep learning BLR can be difficult to use for three reasons Extension of BLR based on Lie-groups proposed to address difficulties Lie-group BLR uses group’s exponential map to update candidate distributions Gradient computations simplified by reparameterization trick Update naturally stays within the manifold Use cases for algorithm design in deep learning using additive, multiplicative, and affine groups New algorithm with multiplicative group gives rise to networks with nodes that are forced to be either excitatory or inhibitory The bayesian learning rule BLR aims to find a posterior candidate in a space of candidate distributions Balancing the two terms requires an exploration-exploitation tradeoff Problem can be rewritten as an inference problem When the loss corresponds to the log-joint distribution of a Bayesian model, the solution coincides with the posterior distribution BLR is a natural-gradient descent algorithm BLR can recover many existing algorithms from a variety of fields Design of new algorithms is possible BLR can be difficult to use in many cases Computing the gradient with respect to µ is not always straightforward λ obtained by BLR may not always be valid natural parameters The lie-group bayesian learning rule Proposing a Lie-group based extension of the BLR Describing Lie groups and their actions Parameterization and exponential map Deriving the new learning rule Lie groups and their actions Lie-group is a set with a binary operation that satisfies associativity, identity element and inverses Smooth manifold is locally diffeomorphic to Euclidean space Examples of Lie-groups include (R, +) and (R >0 , ×) Cartesian product of two Lie-groups is also a Lie-group Action of Lie-group on parameter-space is a smooth map Example of action is (A, b) • θ = Aθ + b Lie group parametrization G is an action on a space of measures Pushforwards are used to define another action on the space of measures A base distribution q0 is given with positive density The space of candidate distributions Q is the orbit of q0 under the action of G Every q in Q can be parametrized by group elements g Examples of EFs that can be parameterized this way include Gaussian and Bernoulli distributions This parameterization is useful for using non-EF distributions such as the Laplace distribution The exponential map and lie group updates Goal is to find a group element g* that minimizes a given energy function Exponential map is used to move in the direction of fastest descent Exponential map is a smooth function that folds the tangent space at identity to the group For diagonal matrices, exponential map is given by Taylor series Update of the form g ← g exp(−αX) is used to move in the direction of X with a step-size of α Simplifying gradients through reparametrization We will use the group’s exponential map to derive a new learning rule....