深入理解word2vec

Word2vec Mikolov et al. How to represent meanings? 如何在数学上表达词义? Vector space models (VSMs) 表示把单词映射到(嵌入)连续的矢量空间, 而且理论上语义相似的单词会映射到空间中临近的位置。VSMs是一个历史悠久的NLP理论,但所有实现方法都不同程度依赖于Distributional Hypothesis, 即出现在相同(相似)的上下文中的单词具有相同(相似)的语义意义。利用此原则的方法大致可以分为两类: Count-based methods (例如, Latent Semantic Analysis))和Predictive models(例如 neural net language models (NNLM))。 具体的区别详见Baroni et al.. 但总的来说,Count-based methods 统计词汇间的共现频率,然后把co-occurs matrix 映射到向量空间中;而Predictive models直接通过上下文预测单词的方式来学习向量空间(也就是模型参数空间)。 Word2vec 是一种计算特别高效的predictive model, 用于从文本中学习word embeddings。它有两种方案, Continuous Bag-of-Words model (CBOW) 和 Skip-Gram model (Section 3.1 and 3.2 in Mikolov et al.). 从算法上讲, 两种方案是相似的, 只不过 CBOW 会从source context-words ('the cat sits on the')预测目标单词(例如"mat"); 而skip-gram则相反, 预测目标单词的source context-words。Skip-gram这种做法可能看起来有点随意....

2018-06-22 · 6 min · Cong Chan

Python Digest

What you will get from this Python digest: 1, Learn advanced python programming. 2, Learn new concepts, patterns, and methods that will expand your programming abilities, helping move you from a novice to an expert programmer. 3, Practice going from a problem description to a solution, using a series of assignments. Operator Emulating numeric types In-place operation: One modifies the data-structure itself object.__iadd__(self, other) object.__isub__(self, other) object.__imul__(self, other) object.__imatmul__(self, other) object....

2018-05-08 · 14 min · Cong Chan

Machine Learning with Scikit-learn (Sklearn) 机器学习实践

Scikit-learn 提供一套实用的工具,用于解决机器学习中的实际问题,并配合适当的方法来制定解决方案。 涉及数据和模型简介,决策树,误差的作用,最小化误差,回归拟合,逻辑回归,神经网络,感知器,支持向量机,朴素贝叶斯,降维,K均值,简单高斯混合模型,分层聚类,模型评估。 实验和代码在GitHub; 练习作业答案可以参考GitHub

2017-12-01 · 1 min · Cong Chan

Python之奇技淫巧

FBI WARNING 这不是python入门 函数 Fundamentally, the qualities of good functions all reinforce the idea that functions are abstractions. 函数作为一种机制, 提供了用于抽象数值运算的模式, 使其独立于所涉及的特定值。 文档 code is written only once, but often read many times. docstring def pressure(v, t, n): """Compute the pressure in pascals of an ideal gas. Applies the ideal gas law: http://en.wikipedia.org/wiki/Ideal_gas_law v -- volume of gas, in cubic meters t -- absolute temperature in degrees kelvin n -- particles of gas """ >>> help(pressure) Python docstring guidelines...

2017-02-22 · 2 min · Cong Chan