Early Rumour Detection

2019, ACL data: TWITTER, WEIBO links: https://www.aclweb.org/anthology/N19-1163, https://github.com/DeepBrainAI/ERD task: Rumour Detection 这篇文章采用GRU编码社交媒体posts stream,作为环境的状态表示;训练一个分类器以GRU的状态输出为输入,对文本做二分类判断是否是rumor。用DQN训练agent,根据状态做出是否启动rumor分类器进行判断,并根据分类结果对错给予奖惩。目标就是尽可能准尽可能早地预测出社交媒体posts是否是rumor。 Focuses on the task of rumour detection; particularly, we are in- terested in understanding how early we can detect them. Our model treats social media posts (e.g. tweets) as a data stream and integrates reinforcement learning to learn the number minimum num- ber of posts required before we classify an event as a rumour. Let $E$ denote an event, and it consists of a series of relevant posts $x_i$, where $x_0$ denotes the source message and $x_T$ the last relevant message....

2021-05-01 · 3 min · Cong Chan

DQN, Double DQN, Dueling DoubleQN, Rainbow DQN

深度强化学习DQN和Natural DQN, Double DQN, Dueling DoubleQN, Rainbow DQN 的演变和必看论文. DQN的Overestimate DQN 基于 Q-learning, Q-Learning 中有 Qmax, Qmax 会导致 Q现实 当中的过估计 (overestimate). 而 Double DQN 就是用来解决过估计的. 在实际问题中, 如果你输出你的 DQN 的 Q 值, 可能就会发现, Q 值都超级大. 这就是出现了 overestimate. DQN 的神经网络部分可以看成一个 最新的神经网络 + 老神经网络, 他们有相同的结构, 但内部的参数更新却有时差. Q现实 部分是这样的: $$Y_t^\text{DQN} \equiv R_{t+1} + \gamma \max_a Q(S_{t+1}, a; \theta_t^-)$$ 过估计 (overestimate) 是指对一系列数先求最大值再求平均,通常比先求平均再求最大值要大(或相等,数学表达为: $$E(\max(X_1, X_2, …)) \ge \max(E(X_1), E(X_2), …)$$ 一般来说Q-learning方法导致overestimation的原因归结于其更新过程,其表达为: $$Q_{t+1} (s_t, a_t) = Q_t (s_t, a_t) + a_t(s_t, a_t)(r_t + \gamma \max a Q_t(s_{t+1}, a) - Q_t(s_t, a_t))$$...

2021-03-09 · 3 min · Cong Chan

Deep Q Networks

Combining reinforcement learning and deep neural networks at scale. The algorithm was developed by enhancing a classic RL algorithm called Q-Learning with deep neural networks and a technique called experience replay. Q-Learning Q-Learning is based on the notion of a Q-function. The Q-function (a.k.a the state-action value function) of a policy $\pi$,$Q^{\pi}(s, a)$ ,measures the expected return or discounted sum of rewards obtained from state $s$ by taking action $a$ first and following policy $\pi$ thereafter....

2019-03-10 · 3 min · Cong Chan