EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Significant progress has been made in developing reinforcement learning training systems. Parallel environment execution is often the slowest part of the system but receives little attention. EnvPool improves the RL environment simulation speed across different hardware setups. EnvPool is compatible with existing RL training libraries. EnvPool allows researchers to iterate their ideas quickly....

June 21, 2022 · 918 words · Jiayi Weng, Min Lin, Shengyi Huang, Bo Liu, Denys Makoviichuk and 7 others
Global Context Vision Transformers

Global Context Vision Transformers

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a novel architecture to enhance parameter and compute utilization for computer vision tasks Model uses global and local self-attention modules to model long and short-range spatial interactions Addresses lack of inductive bias and improves modeling of inter-channel dependencies Achieves new state-of-the-art performance across image classification, object detection and semantic segmentation tasks Paper Content Introduction Transformers have achieved SOTA performance in NLP benchmarks....

June 20, 2022 · 768 words · Ali Hatamizadeh, Hongxu Yin, Jan Kautz, Pavlo Molchanov
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Autonomous agents have made progress in specialist domains Humans learn and adapt in the open world Three ingredients for building generalist agents: environment, knowledge base, and agent architecture MineDojo is a new framework built on Minecraft with thousands of tasks and an internet-scale knowledge base Novel agent learning algorithm uses pre-trained video-language models as reward function Agent is able to solve open-ended tasks without manually designed reward Paper Content Introduction Developing autonomous embodied agents to attain human-level performance is a long-standing goal for AI research Progress has been made in games and robotics Agents are typically trained tabula rasa in isolated worlds with limited complexity and diversity Humans inhabit an infinitely rich reality and can leverage large amounts of prior knowledge MINEDOJO is a framework to develop open-ended, generally capable agents MINEDOJO features a benchmarking suite with thousands of diverse open-ended tasks MINEDOJO also provides an internet-scale, multimodal knowledge base MINEDOJO’s simulator provides unified observation and action spaces MINEDOJO’s tasks are divided into programmatic and creative tasks Programmatic tasks can be automatically assessed Creative tasks do not have well-defined success criteria MINEDOJO uses a learned model to evaluate creative tasks Task suite i: programmatic tasks Formalize each programmatic task as a 5-tuple Leverage OpenAI’s GPT-3-davinci API to generate detailed guidance Initial conditions of the agent and the world Success criterion is a deterministic function Optional dense reward function 4 categories of programmatic tasks with 1,581 template-generated natural language goals Task suite ii: creative tasks Creative tasks are defined as a 3-tuple A novel task evaluation metric is designed based on a pre-trained contrastive video-language model Human evaluations show high agreement with the learned metric 216 Creative tasks are manually authored 1,560 Creative tasks are generated through two systematic approaches Approach 1 mines tasks from YouTube tutorial videos Approach 2 uses GPT-3 to generate new task ideas Internet-scale knowledge base Two approaches to train embodied agents include RL with reward functions or human-demonstrations Crafting reward functions is challenging for the task suite Turn to the open web as an ever-growing source of learning material Harvest domain knowledge by web scraping and filtering Collect 33 years of YouTube videos, 6K+ Wiki pages, and millions of Reddit comment threads Language is a key component of the database Take special measures to filter out low-quality and toxic contents 730K+ narrated Minecraft videos, 2....

June 17, 2022 · 971 words · Linxi Fan, Guanzhi Wang, Yunfan Jiang, Ajay Mandlekar, Yuncong Yang and 5 others
VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images

VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Proposed a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT uses voxel pseudo images as an input to a 2D-like Siamese SOT method. VPIT uses Bird’s-eye View (BEV) coordinates, so only object rotation can change in the new coordinate system. VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values....

June 6, 2022 · 1096 words · Illia Oleksiienko, Paraskevi Nousi, Nikolaos Passalis, Anastasios Tefas, Alexandros Iosifidis
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Transformers are slow and memory-hungry on long sequences. Approximate attention methods have attempted to reduce compute complexity, but often do not achieve wall-clock speedup. FlashAttention is an IO-aware exact attention algorithm that uses tiling to reduce memory reads/writes. FlashAttention trains Transformers faster than existing baselines. FlashAttention enables longer context in Transformers, yielding higher quality models....

May 27, 2022 · 816 words · Tri Dao, Daniel Y. Fu, Stefano Ermon, Atri Rudra, Christopher Ré
GIT: A Generative Image-to-text Transformer for Vision and Language

GIT: A Generative Image-to-text Transformer for Vision and Language

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Designed and trained a Generative Image-to-text Transformer (GIT) to unify vision-language tasks Simplified architecture with one image encoder and one text decoder Scaled up pre-training data and model size to boost performance Established new state of the arts on 12 challenging benchmarks Surpassed human performance on TextCaps Presented a new scheme of generation-based image classification and scene text recognition Paper Content Introduction Pre-training on large-scale image-text pairs Masked Language Modeling (MLM) and Image-Text Matching (ITM) tasks used Task-specific adaptation needed Unified generative models for pre-training Multi-modal encoder and text decoder with careful design Generative Image-to-text Transformer (GIT) proposed GIT achieves new state of the arts across numerous challenging benchmarks Image encoder is a Swin-like vision transformer Text decoder is a transformer network Language modeling task used for pre-training New generation-based scheme for ImageNet classification proposed Network architecture Image encoder based on contrastive pre-trained model Input is raw image, output is 2D feature map Extra linear layer and layernorm layer to project image features into D dimensions Text decoder is transformer module with self-attention layer and feed-forward layer Text tokenized and embedded into D dimensions Image features concatenated with text embeddings as input to transformer module Text decoder randomly initialized Alternative architecture is cross-attention-based decoder Self-attention-based decoder better with large-scale pre-training Pre-training Model is trained using language modeling (LM) loss Alternative choice is MLM, which predicts 15% of input tokens LM can predict all tokens, which is more efficient for large-scale pre-training data Number of epochs is limited to 2 due to computational resource limitation Model is similar to GPT3 in architecture wise Fine-tuning Applied same LM task to fine-tune GIT for image captioning For VQA, question and answer concatenated as new caption during fine-tuning Generative approach chosen over discriminative existing work No OCR engine used, model learns to read scene text with pre-training Simple architecture change for video domain Generation model applied to image classification task Experiments Setting 0....

May 27, 2022 · 866 words · Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin and 4 others
KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

KERPLE: Kernelized Relative Positional Embedding for Length Extrapolation

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Relative positional embeddings (RPE) have been studied for their ability to model the relative distance between tokens and enable length extrapolation. KERPLE is a framework that uses conditionally positive definite (CPD) kernels to generalize RPEs for length extrapolation. CPD kernels can be transformed into PD kernels by adding a constant offset, which is absorbed in the Softmax normalization during self-attention....

May 20, 2022 · 874 words · Ta-Chung Chi, Ting-Han Fan, Peter J. Ramadge, Alexander I. Rudnicky
Vectorized and performance-portable Quicksort

Vectorized and performance-portable Quicksort

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Recent works have shown that Quicksort implementations using vector CPU instructions can outperform non-vectorized algorithms. The proposed ‘vqsort’ algorithm integrates into the state-of-the-art parallel sorter ‘ips4o’, with a geometric mean speedup of 1.59. It works on seven instruction sets across four platforms, and supports floating-point and 16-128 bit integer keys. It is the fastest sort for non-tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries....

May 12, 2022 · 1132 words · Mark Blacher, Joachim Giesen, Peter Sanders, Jan Wassenberg
Fast Sampling of Diffusion Models with Exponential Integrator

Fast Sampling of Diffusion Models with Exponential Integrator

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Diffusion models (DMs) have been successful in generative modeling tasks. Sampling procedure of DMs is slow and requires hundreds to thousands of time discretization steps. Goal is to develop a fast sampling method for DMs with fewer steps while retaining high sample quality. Discretization method is most crucial factor affecting sample quality....

April 29, 2022 · 1042 words · Qinsheng Zhang, Yongxin Chen
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Introduction of Super-NaturalInstructions, a benchmark of 1,616 diverse NLP tasks and their expert-written instructions 76 distinct task types, including classification, extraction, infilling, sequence tagging, text rewriting, and text composition Tk-Instruct, a transformer model trained to follow a variety of in-context instructions Tk-Instruct outperforms existing instruction-following models Analysis of generalization as a function of various scaling parameters Paper Content Introduction NLP community has made progress in building models for generalization to unseen tasks Models like InstructGPT are successful, but the contribution of design choices is opaque Need for large-scale public benchmarks of NLP tasks and instructions to facilitate research Constructed meta-dataset of 1,616 NLP tasks and instructions Model Tk-INSTRUCT outperforms InstructGPT by 9....

April 16, 2022 · 740 words · Yizhong Wang, Swaroop Mishra, Pegah Alipoormolabashi, Yeganeh Kordi, Amirreza Mirzaei and 35 others