Switch Transformers - Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Links: https://arxiv.org/abs/2101.03961 “SWITCH TRANSFORMERS: SCALING TO TRILLION PARAMETER MODELS WITH SIMPLE AND EFFICIENT SPARSITY”,提出了一种可以扩展到万亿参数的网络,有两个比较大的创新,基于Transformer MoE网络结构,简化了MoE的routing机制,降低了计算量;进一步通过数据并行+模型并行+expert并行的方式降低了训练通信量,提升训练性能。 模型 Simplifying Sparse Routing Mixture of Expert Routing which takes as an input a token representation x and then routes this to the best deter- mined top-k experts Switch Routing: route to only a single expert, this simplification preserves model quality, reduces routing computation and performs better. Sparse routing通过参数Wr计算出一个在N个experts上的softmax分布,对每个token输入筛选概率最高的 top k 个 experts,对应的是MOE中的门控机制。这样对算力的需求并没有随着参数量的增加而大幅增长,使得这个模型更加容易训练。 EFFICIENT SPARSE ROUTING 并行Switch实现 tensor shapes are statically determined at compilation time computation is dynamic due to the routing decisions at training and inference....

2021-07-10 · 4 min · Cong Chan

Survey - Pre-Trained Models - Past, Present and Future

Links: https://arxiv.org/abs/2106.07139 最新出炉的 Pre-Trained Models 综述速览。 先确定综述中的一些名词的定义 Transfer learning:迁移学习,一种用于应对机器学习中的data hungry问题的方法,是有监督的 Self-Supervised Learning:自监督学习,也用于应对机器学习中的data hungry问题,特别是针对完全没有标注的数据,可以通过某种方式以数据自身为标签进行学习(比如language modeling)。所以和无监督学习有异曲同工之处。 一般我们说无监督主要集中于clustering, community discovery, and anomaly detection等模式识别问题 而self-supervised learning还是在监督学习的范畴,集中于classification and generation等问题 Pre-trained models (PTMs) :预训练模型,Pre-training是一种具体的训练方案,可以采用transfer learning或者Self-Supervised Learning方法 2 Background 脉络图谱 Pre-training 可分为两大类: 2.1 Transfer Learning and Supervised Pre-Training 此类可进一步细分为 feature transfer 和 parameter transfer. 2.2 Self-Supervised Learning and Self-Supervised Pre-Training Transfer learning 可细分为四个子类 inductive transfer learning (Lawrence and Platt, 2004; Mihalkova et al., 2007; Evgeniou and Pontil, 2007), transductive transfer learning (Shimodaira, 2000; Zadrozny,2004; Daume III and Marcu, 2006), self-taught learning (Raina et al....

2021-06-19 · 10 min · Cong Chan

综述 A Survey on Knowledge Graphs - Representation, Acquisition and Applications

Survey: https://arxiv.org/abs/2002.00388v4 A knowledge graph is a structured representation of facts, consisting of entities, relationships and semantic descriptions. Entities can be real-world objects and abstract concepts, Relationships represent the relation between entities, Semantic descriptions of entities and their relationships contain types and properties with a well-defined meaning G: A knowledge graph F: A set of facts (h, r, t): A triple of head, relation and tail $(\mathbf{h}, \mathbf{r}, \mathbf{t})$: Embedding of head, relation and tail...

2020-02-01 · 6 min · Cong Chan