Percolations problem

Union-find applications: Percolation Problem discriptions Percolation data type. To model a percolation system, create a data type Percolation with the following API: public class Percolation { public Percolation(int n); // create n-by-n grid, with all sites blocked public void open(int row, int col); // open site (row, col) if it is not open already public boolean isOpen(int row, int col); // is site (row, col) open? public boolean isFull(int row, int col); // is site (row, col) full? public int numberOfOpenSites(); // number of open sites public boolean percolates(); // does the system percolate? } Monte Carlo simulation. To estimate the percolation threshold, consider the following computational experiment: ...

2018-07-03 · 2 min · Cong Chan

Inf Course Note - Accelerated Natural Language Processing

爱丁堡大学信息学院课程笔记 Accelerated Natural Language Processing, Informatics, University of Edinburgh References: Accelerated natural language processing ANLP revision guide Lecture Slides from the Stanford Coursera course Natural Language Processing, by Dan Jurafsky and Christopher Manning 概率模型 Probability Model 概率模型是随机现象的数学表示,由样本空间,样本空间内的事件以及与每个事件相关的概率定义。目标是模拟给一个事件发生的概率 估算概率(Probability Estimation)一般使用最大似然估计(MLE,相关频率): $$p(x_i) = \frac{Count(x_i)}{\sum_{i=0}^nCount(x_i)}$$平滑Smoothing 一般用于处理0概率的问题,比如在训练集中看不到, 但出现在测试集中的词。 Language modeling To compute the probability of sentence /sequence of words $P(w_1, w_2, w_3...)$, or to predict upcomming words $P(w|w_1, w_2, w_3...)$… a language model is also a probability model. ...

2018-06-30 · 31 min · Cong Chan

Inf Course Note - Natural Language Understanding

爱丁堡大学信息学院课程笔记 Natural Language Understanding, Informatics, University of Edinburgh References: Natural language understanding CS224n: Natural Language Processing with Deep Learning Lecture Slides from the Stanford Coursera course Natural Language Processing, by Dan Jurafsky and Christopher Manning Meaning representations 意思的表达有很多方法。一种有效的表示单词的含义的方法是 distributional semantic. Semantics (from Ancient Greek: σημαντικός sēmantikos, “significant”) is the linguistic and philosophical study of meaning, in language, programming languages, formal logics, and semiotics. 语义学 Semantics 在语言学中的研究目的在于找出语义表达的规律性、内在解释、不同语言在语义表达方面的个性以及共性;与计算机科学相关的语义学研究在于机器对自然语言的理解。 Tradition solution of usable meaning in a computer: Use e.g. WordNet, a resource containing lists of synonym sets and hypernyms. ...

2018-06-30 · 28 min · Cong Chan

Inf Course Note - Parallel Programming Language and Systems

爱丁堡大学信息学院课程笔记 Parallel Programming Language and Systems, Informatics, University of Edinburgh Reference: http://www.inf.ed.ac.uk/teaching/courses/ppls/ CMU 15213: Introduction to Computer Systems (ICS) Computer Systems: A Programmer’s Perspective A Comprehensive MPI Tutorial Resource A chapter on MPI from Ian Foster’s online Book Designing and Building Parallel Programs Introduction to parallel computer architecture Covering some of the nasty issues presented by the shared memory model, including weak consistency models and false sharing in the cache, and some architectural issues for the multicomputer model. ...

2018-06-30 · 63 min · Cong Chan

Inf Course Note - Software Architecture, Process, and Management

爱丁堡大学信息学院课程笔记 Software Architecture, Process, and Management, Informatics, University of Edinburgh Reference: microsoft IBM Software Architecture in Practice (3rd edition), Bass, Clements, and Kazman What is Software Architecture? Software architecture is often described as the organization or structure of a system, where the system represents a collection of components that accomplish a specific function or set of functions. grouping components into areas of concern (layers): For example, the UI, business processing, and data access. focus on interaction between the components and how different components work together. 在书中的定义: ...

2018-06-30 · 45 min · Cong Chan

Inf Course Note - Software Testing

爱丁堡大学信息学院课程笔记 Software Testing, Informatics, University of Edinburgh Reference: http://www.inf.ed.ac.uk/teaching/courses/st/2017-18/index.html Pezze and Young, Software Testing and Analysis: Process, Principles and Techniques, Wiley, 2007. Why Software Testing? 1, 软件的漏洞, 错误和失效 Software Faults, Errors & Failures The problem start with Faults, Fault(BUG): latent error, mistakes in programming. e.g add(x, y) = x * y. With the Faults in programs, if and only if executing add(x, y) = x * y, the fault being activated, and generate an Errors. ...

2018-06-30 · 49 min · Cong Chan

深入理解word2vec

Word2vec Mikolov et al. How to represent meanings? 如何在数学上表达词义? Vector space models (VSMs) 表示把单词映射到(嵌入)连续的矢量空间, 而且理论上语义相似的单词会映射到空间中临近的位置。VSMs是一个历史悠久的NLP理论,但所有实现方法都不同程度依赖于Distributional Hypothesis, 即出现在相同(相似)的上下文中的单词具有相同(相似)的语义意义。利用此原则的方法大致可以分为两类: Count-based methods (例如, Latent Semantic Analysis))和Predictive models(例如 neural net language models (NNLM))。 具体的区别详见Baroni et al.. 但总的来说,Count-based methods 统计词汇间的共现频率,然后把co-occurs matrix 映射到向量空间中;而Predictive models直接通过上下文预测单词的方式来学习向量空间(也就是模型参数空间)。 Word2vec 是一种计算特别高效的predictive model, 用于从文本中学习word embeddings。它有两种方案, Continuous Bag-of-Words model (CBOW) 和 Skip-Gram model (Section 3.1 and 3.2 in Mikolov et al.). 从算法上讲, 两种方案是相似的, 只不过 CBOW 会从source context-words ('the cat sits on the')预测目标单词(例如"mat"); 而skip-gram则相反, 预测目标单词的source context-words。Skip-gram这种做法可能看起来有点随意. 但从统计上看, CBOW 会平滑大量分布信息(通过将整个上下文视为一个观测值), 在大多数情况下, 这对较小的数据集是很有用的。但是, Skip-gram将每个context-target pair视为新的观测值, 当数据集较大时, 这往往带来更好的效果。 ...

2018-06-22 · 6 min · Cong Chan

循环神经网络

循环神经网络 当人类阅读时,会根据对之前单词的理解和记忆来辅助理解当前看到的每个单词。也就是人能够很好地处理语言的长距离依赖特性(long-term dependency)。在自然语言处理任务中,很多传统的模型无法做到这一点,比如前馈神经网络;而传统的n-gram模型固然可以通过把把n系数增大来捕捉长距离依赖,但带来的非常巨大的内存消耗。 循环神经网络(Recurrent Neural Networks, RNNs)可以看做是多个共享参数的前馈神经网络不断叠加的结果 ![](http://colah.github.io/posts/2015-08-Understanding-LSTMs/img/RNN-unrolled.png “A recurrent neural network and the unfolding in time of the computation involved in its forward computation. “image from: http://colah.github.io”) 这里的核心是想办法解码历史信息, 即通过递归方程$s_i = R(x_i, s_{i−1})$让$s_i$解码序列$x_{1:n}$. 比如把所有历史信息累加就是一种非常简单粗暴的方式, 这样得到的是连续词袋模型(continuous-bag-of-words model)$s_i = R_{CBOW}(x_i, s_{i-1}) = x_i + s_{i−1}$, 虽然简单,但这种RNN其实忽略了数据的时序性质。 一般意义上的RNN是指Elman Network or Simple-RNN (S-RNN)(Elman [1990]), $s_i = R_{SRNN}(x_i, s_{i-1}) = g(x_iW^x + s_{i−1}W^s + b)$, 也就是把历史信息先进行线性变换(乘以矩阵), 再和bias加起来, 再通过一个非线性激活函数(tanh或ReLU). 添加了线性变换再进行非线性激活, 使网络对输入的顺序变得敏感。 在使用时, 给定输入序列(单词序列或语音)得出输出序列的过程如下: ...

2018-05-15 · 3 min · Cong Chan

Python Digest

What you will get from this Python digest: 1, Learn advanced python programming. 2, Learn new concepts, patterns, and methods that will expand your programming abilities, helping move you from a novice to an expert programmer. 3, Practice going from a problem description to a solution, using a series of assignments. Operator Emulating numeric types In-place operation: One modifies the data-structure itself object.__iadd__(self, other) object.__isub__(self, other) object.__imul__(self, other) object.__imatmul__(self, other) object.__itruediv__(self, other) object.__ifloordiv__(self, other) object.__imod__(self, other) object.__ipow__(self, other[, modulo]) object.__ilshift__(self, other) object.__irshift__(self, other) object.__iand__(self, other) object.__ixor__(self, other)¶ object.__ior__(self, other) These methods are called to implement the augmented arithmetic assignments. These methods should attempt to do the operation in-place (modifying self) and return the result (which could be, but does not have to be, self). If x is an instance of a class with an __iadd__() method, x += y is equivalent to x = operator.iadd(x, y) ...

2018-05-08 · 14 min · Cong Chan

Computer Systems - A Programmer's Perspective (CSAPP) - CMU 15213

CSAPP 非常巧妙的把程序设计及优化、数字电路基础、指令集体系、汇编语言、存储器体系结构、链接与装载、进程、虚存等来自不同学科的核心知识点和在一起,并以程序员的视角呈现; 告诉我们作为一个程序员,究竟需要对计算机的硬件了解到什么程度? 本笔记是 CMU CSAPP 的学习笔记, 使用 CMU 15-213, UW CSE351 的课程视频, lab, 作业, project 辅助练习. Computer Systems: A Programmer’s Perspective (csapp), 豆瓣-深入理解计算机系统 卡内基梅隆大学 CMU 15-213 Introduction to Computer Systems (ICS) 华盛顿大学 UW CSE351: The Hardware/Software Interface 信息的表达与操作 Information is Bits + Context. Study systems by tracing the lifetime of the hello program, from the time it is created by a programmer, until it runs on a system, prints its simple message, and terminates. ...

2018-01-29 · 14 min · Cong Chan