PanGu-ฮฃ: Towards Trillion Parameter Language Model with Sparse Heterogeneous Computing
Link to paper The full paper is available here. You can also find the paper on PapersWithCode here. Abstract Large language models have improved natural language understanding, generation, and reasoning. A system was developed that trained a trillion-parameter language model on a cluster of Ascend 910 AI processors and MindSpore framework. The language model was named PanGu-{\Sigma} and had 1.085T parameters. Random Routed Experts (RRE) was used to extend the dense Transformer model to a sparse one....