Matching the Blanks - Distributional Similarity for Relation Learning

2019, ACL data: KBP37, SemEval 2010 Task 8, TACRED task: Entity and Relation Extraction Build task agnostic relation representations solely from entity-linked text. 缺陷 文章认为网页中, 相同的的实体对一般指代相同的实体关系, 把实体不同的构建为负样本. 这个在单份文件中可能大概率是对的. 但是实体不完全一直不代表这个两对实体的关系不同. 所以这个作为负样本是本质上映射的是实体识别而不是关系. 比较好的方式是把实体不同但是关系一样的也考虑进来. 方法 Define Relation Statement We define a relation statement to be a block of text containing two marked entities. From this, we create training data that contains relation statements in which the entities have been replaced with a special [BLANK]...

2021-04-21 · 3 min · Cong Chan

A Frustratingly Easy Approach for Joint Entity and Relation Extraction

2020, NAACL data: ACE 04, ACE 05, SciERC links: https://github.com/princeton-nlp/PURE task: Entity and Relation Extraction 提出了一种简单但是有效的pipeline方法:builds on two independent pre-trained encoders and merely uses the entity model to provide input features for the relation model. 实验说明: validate the importance of learning distinct contextual representations for entities and relations, fusing entity information at the input layer of the relation model, and incorporating global context. 从效果上看, 似乎是因为cross sentence的context加成更大 方法 Input: a sentence X consisting of n tokens x1, ....

2021-04-20 · 2 min · Cong Chan

Two are Better than One - Joint Entity and Relation Extraction with Table-Sequence Encoders

2020, EMNLP data: ACE 04, ACE 05, ADE, CoNLL04 links: https://github.com/LorrinWWW/two-are-better-than-one. task: Entity and Relation Extraction In this work, we propose the novel table-sequence encoders where two different encoders – a table encoder and a sequence encoder are designed to help each other in the representation learning process. 这篇ACL 2020文章认为, 之前的Joint learning方法侧重于learning a single encoder (usually learning representation in the form of a table) to capture information required for both tasks within the same space....

2021-03-27 · 2 min · Cong Chan

Improving Event Detection via Open-domain Trigger Knowledge

2020, ACL data: ACE 05 task: Event Detection Propose a novel Enrichment Knowledge Distillation (EKD) model to efficiently distill external open-domain trigger knowledge to reduce the in-built biases to frequent trigger words in annotations. leverage the wealth of the open-domain trigger knowledge to improve ED propose a novel teacher-student model (EKD) that can learn from both labeled and unlabeled data 缺点 只能对付普遍情况, 即一般性的触发词; 但触发词不是在任何语境下都是触发词. 方法 empower the model with external knowledge called Open-Domain Trigger Knowledge, defined as a prior that specifies which words can trigger events without subject to pre-defined event types and the domain of texts....

2021-03-25 · 3 min · Cong Chan

Cross-media Structured Common Space for Multimedia Event Extraction

2020, ACL Task: MultiMedia Event Extraction Introduce a new task, MultiMedia Event Extraction (M2E2), which aims to extract events and their arguments from multimedia documents. Construct the first benchmark and evaluation dataset for this task, which consists of 245 fully annotated news articles Propose a novel method, Weakly Aligned Structured Embedding (WASE), that encodes structured representations of semantic information from textual and visual data into a common embedding space. which takes advantage of annotated unimodal corpora to separately learn visual and textual event extraction, and uses an image-caption dataset to align the modalities...

2021-03-24 · 4 min · Cong Chan

DQN, Double DQN, Dueling DoubleQN, Rainbow DQN

深度强化学习DQN和Natural DQN, Double DQN, Dueling DoubleQN, Rainbow DQN 的演变和必看论文. DQN的Overestimate DQN 基于 Q-learning, Q-Learning 中有 Qmax, Qmax 会导致 Q现实 当中的过估计 (overestimate). 而 Double DQN 就是用来解决过估计的. 在实际问题中, 如果你输出你的 DQN 的 Q 值, 可能就会发现, Q 值都超级大. 这就是出现了 overestimate. DQN 的神经网络部分可以看成一个 最新的神经网络 + 老神经网络, 他们有相同的结构, 但内部的参数更新却有时差. Q现实 部分是这样的: $$Y_t^\text{DQN} \equiv R_{t+1} + \gamma \max_a Q(S_{t+1}, a; \theta_t^-)$$ 过估计 (overestimate) 是指对一系列数先求最大值再求平均,通常比先求平均再求最大值要大(或相等,数学表达为: $$E(\max(X_1, X_2, …)) \ge \max(E(X_1), E(X_2), …)$$ 一般来说Q-learning方法导致overestimation的原因归结于其更新过程,其表达为: $$Q_{t+1} (s_t, a_t) = Q_t (s_t, a_t) + a_t(s_t, a_t)(r_t + \gamma \max a Q_t(s_{t+1}, a) - Q_t(s_t, a_t))$$...

2021-03-09 · 3 min · Cong Chan

DeepPath - A Reinforcement Learning Method for Knowledge Graph Reasoning

2017, EMNLP data: FB15K-237, FB15K task: Knowledge Graph Reasoning Use a policy-based agent with continuous states based on knowledge graph embeddings, which reasons in a KG vector space by sampling the most promising relation to extend its path. 方法 RL 系统包含两部分, 第一部分是外部环境,指定了 智能体 和知识图谱之间的动态交互。环境被建模为马尔可夫决策过程。 系统的第二部分,RL 智能体,表示为策略网络,将状态向量映射到随机策略中。神经网络参数通过随机梯度下降更新。相比于 DQN,基于策略的 RL 方法更适合该知识图谱场景。一个原因是知识图谱的路径查找过程,行为空间因为关系图的复杂性可能非常大。这可能导致 DQN 的收敛性变差。另外,策略网络能学习梯度策略,防止 智能体 陷入某种中间状态,而避免基于值的方法如 DQN 在学习策略梯度中遇到的问题。 关系推理的强化学习 行为 给定一些实体对和一个关系,我们想让 智能体 找到最有信息量的路径来连接这些实体对。从源实体开始,智能体 使用策略网络找到最有希望的关系并每步扩展它的路径直到到达目标实体。为了保持策略网络的输出维度一致,动作空间被定义为知识图谱中的所有关系。 状态 知识图谱中的实体和关系是自然的离散原子符号。现有的实际应用的知识图谱例如 Freebase 和 NELL 通常有大量三元组,不可能直接将所有原子符号建模为状态。为了捕捉这些符号的语义信息,我们使用基于平移的嵌入方法,例如 TransE 和 TransH 来表示实体和关系。这些嵌入将所有符号映射到低维向量空间。在该框架中,每个状态捕捉 智能体 在知识图谱中的位置。在执行一个行为后,智能体 会从一个实体移动到另一个实体。两个状态通过刚执行的行为(关系)由 智能体 连接。第 t 步的状态向量:...

2020-03-11 · 2 min · Cong Chan

Knowledge-Graph-Embedding的Translate族(TransE,TransH,TransR,TransD)

data: WN18, WN11, FB15K, FB13, FB40K task: Knowledge Graph Embedding TransE Translating Embeddings for Modeling Multi-relational Data(2013) https://proceedings.neurips.cc/paper/2013/file/1cecc7a77928ca8133fa24680a88d2f9-Paper.pdf 这是转换模型系列的第一部作品。该模型的基本思想是使head向量和relation向量的和尽可能靠近tail向量。这里我们用L1或L2范数来衡量它们的靠近程度。 损失函数 $\mathrm{L}(h, r, t)=\max \left(0, d_{\text {pos }}-d_{\text {neg }}+\text { margin }\right)$使损失函数值最小化,当这两个分数之间的差距大于margin的时候就可以了(我们会设置这个值,通常是1) 但是这个模型只能处理一对一的关系,不适合一对多/多对一关系,例如,有两个知识,(skytree, location, tokyo)和(gundam, location, tokyo)。经过训练,“sky tree”实体向量将非常接近“gundam”实体向量。但实际上它们没有这样的相似性。 with tf.name_scope("embedding"): self.ent_embeddings = tf.get_variable(name = "ent_embedding", shape = [entity_total, size], initializer = tf.contrib.layers.xavier_initializer(uniform = False)) self.rel_embeddings = tf.get_variable(name = "rel_embedding", shape = [relation_total, size], initializer = tf.contrib.layers.xavier_initializer(uniform = False)) pos_h_e = tf....

2020-03-05 · 5 min · Cong Chan

综述 A Survey on Knowledge Graphs - Representation, Acquisition and Applications

Survey: https://arxiv.org/abs/2002.00388v4 A knowledge graph is a structured representation of facts, consisting of entities, relationships and semantic descriptions. Entities can be real-world objects and abstract concepts, Relationships represent the relation between entities, Semantic descriptions of entities and their relationships contain types and properties with a well-defined meaning G: A knowledge graph F: A set of facts (h, r, t): A triple of head, relation and tail $(\mathbf{h}, \mathbf{r}, \mathbf{t})$: Embedding of head, relation and tail...

2020-02-01 · 6 min · Cong Chan

Open-Domain Targeted Sentiment Analysis via Span-Based Extraction and Classification

2019, ACL data: SemEval 2014, SemEval 2014 ABSA, SemEval 2015, SemEval 2016 task: ABSA propose a span-based extract-then-classify framework, where multiple opinion targets are directly extracted from the sentence under the supervision of target span boundaries, and corresponding polarities are then classified using their span representations. 优点: 用指针网络选取target,避免了序列标注的搜索空间过大问题 用span边界+极性的标注方式,解决多极性的target问题 方法 Input: sentence x =(x1,..., xn) with length n, Target list T = {t1,..., tm}: each target ti is annotated with its start, end position, and its sentiment polarity...

2020-01-24 · 2 min · Cong Chan