2021 | Cong's Log

Switch Transformers - Scaling to Trillion Parameter Models with Simple and Efficient Sparsity

Links: https://arxiv.org/abs/2101.03961 “SWITCH TRANSFORMERS: SCALING TO TRILLION PARAMETER MODELS WITH SIMPLE AND EFFICIENT SPARSITY”，提出了一种可以扩展到万亿参数的网络，有两个比较大的创新，基于Transformer MoE网络结构，简化了MoE的routing机制，降低了计算量；进一步通过数据并行+模型并行+expert并行的方式降低了训练通信量，提升训练性能。模型 Simplifying Sparse Routing Mixture of Expert Routing which takes as an input a token representation x and then routes this to the best deter- mined top-k experts Switch Routing: route to only a single expert, this simplification preserves model quality, reduces routing computation and performs better. Sparse routing通过参数Wr计算出一个在N个experts上的softmax分布，对每个token输入筛选概率最高的 top k 个 experts，对应的是MOE中的门控机制。这样对算力的需求并没有随着参数量的增加而大幅增长，使得这个模型更加容易训练。 ...

Survey - Pre-Trained Models - Past, Present and Future

Links: https://arxiv.org/abs/2106.07139 最新出炉的 Pre-Trained Models 综述速览。先确定综述中的一些名词的定义 Transfer learning：迁移学习，一种用于应对机器学习中的data hungry问题的方法，是有监督的 Self-Supervised Learning：自监督学习，也用于应对机器学习中的data hungry问题，特别是针对完全没有标注的数据，可以通过某种方式以数据自身为标签进行学习（比如language modeling）。所以和无监督学习有异曲同工之处。一般我们说无监督主要集中于clustering, community discovery, and anomaly detection等模式识别问题而self-supervised learning还是在监督学习的范畴，集中于classification and generation等问题 Pre-trained models (PTMs) ：预训练模型，Pre-training是一种具体的训练方案，可以采用transfer learning或者Self-Supervised Learning方法 2 Background 脉络图谱 Pre-training 可分为两大类： 2.1 Transfer Learning and Supervised Pre-Training 此类可进一步细分为 feature transfer 和 parameter transfer. 2.2 Self-Supervised Learning and Self-Supervised Pre-Training Transfer learning 可细分为四个子类 inductive transfer learning (Lawrence and Platt, 2004; Mihalkova et al., 2007; Evgeniou and Pontil, 2007), transductive transfer learning (Shimodaira, 2000; Zadrozny,2004; Daume III and Marcu, 2006), self-taught learning (Raina et al., 2007; Dai et al., 2008) unsupervised transfer learning (Wang et al., 2008). inductive transfer learning 和 transductive transfer learning 的研究进展主要集中以imageNet为labeled source data资源的图像领域 ...

综述 A Survey on Knowledge Graphs - Representation, Acquisition and Applications

Survey: https://arxiv.org/abs/2002.00388v4 A knowledge graph is a structured representation of facts, consisting of entities, relationships and semantic descriptions. Entities can be real-world objects and abstract concepts, Relationships represent the relation between entities, Semantic descriptions of entities and their relationships contain types and properties with a well-defined meaning G: A knowledge graph F: A set of facts (h, r, t): A triple of head, relation and tail $(\mathbf{h}, \mathbf{r}, \mathbf{t})$: Embedding of head, relation and tail ...