Programming Language

深入理解word2vec

Word2vec Mikolov et al. How to represent meanings? 如何在数学上表达词义？ Vector space models (VSMs) 表示把单词映射到(嵌入)连续的矢量空间, 而且理论上语义相似的单词会映射到空间中临近的位置。VSMs是一个历史悠久的NLP理论，但所有实现方法都不同程度依赖于Distributional Hypothesis, 即出现在相同（相似）的上下文中的单词具有相同（相似）的语义意义。利用此原则的方法大致可以分为两类: Count-based methods (例如, Latent Semantic Analysis))和Predictive models(例如 neural net language models (NNLM))。具体的区别详见Baroni et al.. 但总的来说，Count-based methods 统计词汇间的共现频率，然后把co-occurs matrix 映射到向量空间中；而Predictive models直接通过上下文预测单词的方式来学习向量空间（也就是模型参数空间）。 Word2vec 是一种计算特别高效的predictive model, 用于从文本中学习word embeddings。它有两种方案, Continuous Bag-of-Words model (CBOW) 和 Skip-Gram model (Section 3.1 and 3.2 in Mikolov et al.). 从算法上讲, 两种方案是相似的, 只不过 CBOW 会从source context-words ('the cat sits on the')预测目标单词(例如"mat"); 而skip-gram则相反, 预测目标单词的source context-words。Skip-gram这种做法可能看起来有点随意. 但从统计上看, CBOW 会平滑大量分布信息(通过将整个上下文视为一个观测值), 在大多数情况下, 这对较小的数据集是很有用的。但是, Skip-gram将每个context-target pair视为新的观测值, 当数据集较大时, 这往往带来更好的效果。 ...

Python Digest

What you will get from this Python digest: 1, Learn advanced python programming. 2, Learn new concepts, patterns, and methods that will expand your programming abilities, helping move you from a novice to an expert programmer. 3, Practice going from a problem description to a solution, using a series of assignments. Operator Emulating numeric types In-place operation: One modifies the data-structure itself object.__iadd__(self, other) object.__isub__(self, other) object.__imul__(self, other) object.__imatmul__(self, other) object.__itruediv__(self, other) object.__ifloordiv__(self, other) object.__imod__(self, other) object.__ipow__(self, other[, modulo]) object.__ilshift__(self, other) object.__irshift__(self, other) object.__iand__(self, other) object.__ixor__(self, other)¶ object.__ior__(self, other) These methods are called to implement the augmented arithmetic assignments. These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If x is an instance of a class with an __iadd__() method, x += y is equivalent to x = operator.iadd(x, y) ...

Python之奇技淫巧

FBI WARNING 这不是python入门函数 Fundamentally, the qualities of good functions all reinforce the idea that functions are abstractions. 函数作为一种机制, 提供了用于抽象数值运算的模式, 使其独立于所涉及的特定值。文档 code is written only once, but often read many times. docstring def pressure(v, t, n): """Compute the pressure in pascals of an ideal gas. Applies the ideal gas law: http://en.wikipedia.org/wiki/Ideal_gas_law v -- volume of gas, in cubic meters t -- absolute temperature in degrees kelvin n -- particles of gas """ >>> help(pressure) Python docstring guidelines ...