Java | Cong's Log

Find All Collinear Points - A Pattern Recognition Problem

The Line Patterns Recognition A basic but important application of pattern recognition is to recognize line patterns in a given set of points. http://coursera.cs.princeton.edu/algs4/assignments/collinear.html. This blog will give a breif introduction to this problem and provide an enfficient solution. Codes available in algs4/collinear/src/ The problem could be described as: Given a set of n distinct points in the plane, find every (maximal) line segment that connects a subset of 4 or more of the points.. ...

Randomized Queue with Reservoir Sampling

This blog explains an apllication of randomized queue algorithms. Permutation client memory challenge A client program Permutation.java that takes an integer k as a command-line argument; reads in a sequence of strings from standard input using StdIn.readString(); and prints exactly k of them, uniformly at random. Print each item from the sequence at most once. More detail could be found at programming assignment specification and checklist, codes available in algs4/queues/src/. Randomized queue For a randomized queue, the item removed is chosen uniformly at random from items in the data structure. ...

Percolations problem

Union-find applications: Percolation Problem discriptions Percolation data type. To model a percolation system, create a data type Percolation with the following API: public class Percolation { public Percolation(int n); // create n-by-n grid, with all sites blocked public void open(int row, int col); // open site (row, col) if it is not open already public boolean isOpen(int row, int col); // is site (row, col) open? public boolean isFull(int row, int col); // is site (row, col) full? public int numberOfOpenSites(); // number of open sites public boolean percolates(); // does the system percolate? } Monte Carlo simulation. To estimate the percolation threshold, consider the following computational experiment: ...

Inf Course Note - Parallel Programming Language and Systems

爱丁堡大学信息学院课程笔记 Parallel Programming Language and Systems, Informatics, University of Edinburgh Reference: http://www.inf.ed.ac.uk/teaching/courses/ppls/ CMU 15213: Introduction to Computer Systems (ICS) Computer Systems: A Programmer’s Perspective A Comprehensive MPI Tutorial Resource A chapter on MPI from Ian Foster’s online Book Designing and Building Parallel Programs Introduction to parallel computer architecture Covering some of the nasty issues presented by the shared memory model, including weak consistency models and false sharing in the cache, and some architectural issues for the multicomputer model. ...

Inf Course Note - Software Testing

爱丁堡大学信息学院课程笔记 Software Testing, Informatics, University of Edinburgh Reference: http://www.inf.ed.ac.uk/teaching/courses/st/2017-18/index.html Pezze and Young, Software Testing and Analysis: Process, Principles and Techniques, Wiley, 2007. Why Software Testing? 1, 软件的漏洞, 错误和失效 Software Faults, Errors & Failures The problem start with Faults, Fault(BUG): latent error, mistakes in programming. e.g add(x, y) = x * y. With the Faults in programs, if and only if executing add(x, y) = x * y, the fault being activated, and generate an Errors. ...

Algorithms - Princeton

Algorithms, Part I, https://online.princeton.edu/course/algorithms-part-i Algorithms, Part II, https://online.princeton.edu/course/algorithms-part-ii Algorithms, 4th Edition by Robert Sedgewick and Kevin Wayne https://algs4.cs.princeton.edu/ Union−Find Considering the dynamic connectivity problem, modeling of multiple objects connected in a space/network. Applications involve manipulating objects of all types. ・Pixels in a digital photo. ・Computers in a network. ・Friends in a social network. ・Transistors in a computer chip. Given a set of N objects. union(a, b): connect two objects. connected(p, q): is two objects connected? find(p): Find component identifier for p (0 to N – 1) Modeling the objects: array. ...

Java BitMap 和 Bloom Filter

Bit Map Bit-map用一个bit位来标记某个元素对应的Value，而Key即是该元素。由于采用了Bit为单位来存储数据，因此在存储空间方面，可以大大节省。假设我们要对0-7内的5个元素4,7,2,5,3排序（假设这些元素没有重复）。那么我们就可以采用Bit-map的方法来达到排序的目的。要表示8个数，我们就只需要8个Bit（1Bytes），首先我们开辟1Byte的空间，将这些空间的所有Bit位都置为0，0 0 0 0 0 0 0 0. 然后遍历这5个元素，首先第一个元素是4，那么就把4对应的位置设为1, p+(i/8)|(0x01<<(i%8)), 这里默认为Big-ending, 0 0 0 0 1 0 0 0. 然后再处理第二个元素7，将第八位置为1,，接着再处理第三个元素，一直到最后处理完所有的元素，将相应的位置为1，这时候的内存的Bit位的状态0 0 1 1 1 1 0 1 遍历一遍Bit区域，把1的索引依次输出（2，3，4，5，7），这样就达到了排序的目的。算法的关键是如何确定十进制的数映射到二进制bit位的map图。算法占用很少内存，比如N=10000000；只需占用内存为N/8=1250000Byte=1.25M。缺点是不能有重复数据。 Map映射表假设需要排序或者查找的总数N=10000000，那么我们需要申请内存空间的大小为int a[1 + N/32]，其中：a[0]在内存中占32位, 可以对应十进制数0-31，依次类推： bitmap表为： a[0]--------->0-31 a[1]--------->32-63 a[2]--------->64-95 a[3]--------->96-127 .......... 十进制数需要转换为对应的bit位位移转换将十进制数转换为对应的bit位, 申请一个int一维数组，作为32列的二维数组， int a[0] |0000000000000000000000000000000000000| int a[1] |0000000000000000000000000000000000000| ……………… int a[N] |0000000000000000000000000000000000000| 例如十进制0，对应在a[0]第一位： 00000000000000000000000000000001 求十进制0-N对应在数组a的索引：十进制0-31，对应a[0]，先由十进制数n转换为与32的余可转化为对应在数组a中的索引0。比如n=24,那么 n/32=0，则24对应a[0]。又比如n=60, 那么n/32=1，则60对应a[1]。求0-N对应0-31中的数：十进制0-31就对应0-31，而32-63则对应也是0-31，即给定一个数n可以通过模32求得对应0-31中的数。利用移位0-31使得对应32bit位为1. 找到对应0-31的数为M, 左移M位：即2 ^ M, 置1. Bloom Filter 为了降低键值冲突的概率，Bloom Filter使用了多个哈希函数：创建一个m位BitSet，先将所有位初始化为0，然后选择k个不同的哈希函数。第i个哈希函数对字符串str哈希的结果记为h(i, str)，且h(i, str)的范围是0到m-1 。 ...

众数问题 - Boyer–Moore majority vote algorithm

数组中有一个数字出现的次数超过数组长度的一半，例如输入一个长度为9的数组1,2,3,2,2,2,5,4,2。由于数字2在数组中出现了5次，超过数组长度的一半，因此输出2。如果不存在则输出0。因为这个数出现次数超过了数组长度一半以上, 那么它就是数组中出现次数最多的数, 故谓之众数. class Solution: def MoreThanHalfNum_Solution(self, numbers): # write code here most = numbers[0] count = 1 for item in numbers: if item == most: count += 1 else: count -= 1 if count < 0: most = item count = 1 return 0 if numbers.count(most) <= len(numbers) / 2 else most 众数问题众数问题可以推广泛化：给定大小为n的整数数组，找到所有出现超过n / m次的元素。这种问题可以使用 Boyer-Moore 算法解决. The Boyer–Moore majority vote algorithm is an algorithm for finding the majority of a sequence of elements using linear time and constant space. It is named after Robert S. Boyer and J Strother Moore, who published it in 1981, and is a prototypical example of a streaming algorithm. ...

Dynamic Programming 06 - Knapsack背包问题

Knapsack背包问题背包问题（Knapsack problem）是一种组合优化的NP完全问题。问题可以描述为：给定一组物品，每种物品都有自己的重量和价格，在限定的总重量内，我们如何选择，才能使得物品的总价格最高。问题的名称来源于如何选择最合适的物品放置于给定背包中。也可以将背包问题描述为决定性问题，即在总重量不超过W的前提下，总价值是否能达到V。 0-1背包最基础的背包问题：有N件物品和一个体积为V的背包, 每种物品均只有一件, 第i件物品的大小/重量是s[i]，价值是v[i]. 求将哪些物品装入背包可使这些物品的体积总和不超过背包体积，且价值总和最大. 对于每一个物品，只有两种结果，放入或者不放入背包，那么kn(i, j)则表示背包容量剩余j时, 前i个物品能够达到的最大值: kn1 = kn(i-1, j-s(i)) + v(i)表示物品i放入背包后的总价值, 为前i-1物品在第i个物品占用了背包容量s(i)后的的最优解加上第i个物品的价值v(i). kn2 = kn(i-1, j)表示物品i并没有放入背包, 等于前i-1个物品在相同背包容量的最优价值. 归纳出来的大小子问题间的关系(转移方程)为: kn(i, j) = max(kn1, kn2) = max(kn(i-1, j-s(i)) + v(i), kn(i-1, j)). 初始状态是对于不同背包剩余容量, 当没有物品可放时, 返回的最大价值一定是0. 所以背包问题, 就是二维的动态规划问题. 需要确定初始状态, 和哪些信息需要记忆. 可以简单地用一个二维数组记忆所有kn(i, j), 但要考虑到当容量非常大, 物品非常多时, 这个二维数组是很大的, 比如当(i, j) = (2000, 2000000), 会抛出java.lang.OutOfMemoryError: Java heap space. 特别是, 当每个物品的价值也比较大时, 二维数组的j维度其实利用率很低. 所以存在很多优化的空间. 优化的关键点在于减少记忆点. 注意到转移方程中: kn(i, *)只需要用到kn(i-1, *)的值, 但我们又清楚地知道，物品在这里是没有顺序的意义的，所以这里的i仅仅是表示迭代的步骤, 只是为了遍历所有物品, 至于具体的顺序是不重要的, 所以不需要记录所有i对应的kn(i, *), 仅仅记录最近一次计算值即可. 所以我们只需要至多两个数组用来记录i-1和i对应的kn值. kn(i, j)要用到kn(i-1, k), k<=j的值, 具体要用到哪些k是取决于i. 所以j维度的值必须都要记录下来, 以防后续需要用到. 结合起来发现只需要一个一维数组kn = new int[size + 1]即可, i对应的值可以直接在数组上更新, 不需要额外的数组记录上一次迭代的值. 在实现中, 因为kn(i, j)要用到kn(i-1, <=j)的值, 也就是kn[<j]的值不能先于kn[j]更新, 所以kn的计算要从右往左(j = size; j--). 每次决定是否加入i物品之前, 如果剩余容量j小于s[i], 那么肯定无法放入, 这个判断可以融合进j的遍历中, 因为j本身代表了剩余容量. static int[] values; static int[] sizes; public static int knapsack(int size) { int n = values.length; int[] vs = new int[size + 1]; for (int i = 0; i < n; i++) { // items for (int j = size; j >= sizes[i]; j--) { vs[j] = Math.max(vs[j - sizes[i]] + values[i], vs[j]); } } return vs[size]; } 优化以后空间复杂度由$\theta(NS)$降到$\theta(S)$。但时间复杂度不变. ...

Dynamic Programming 05 - 跳台阶

跳台阶跳上一个n级的台阶总共有多少种跳法，先后次序不同算不同的结果，限制条件是每次只能跳1级或者2级。抽象出来的模型是：给定正整数N，有多少种累加方案，不同顺序当做不同方案，限制条件可以是给定的整数$n_0, n_1, ..., n_k$作为可选累加元素. 对于限制条件为只有两种跳法, 即1阶或者2阶的, 问题可以分解为: 假定第一次跳的是1阶，那么就剩下n-1个台阶，剩余跳法是f(n-1); 假定第一次跳的是2阶，则剩下n-2个台阶，剩余跳法是f(n-2) 可以归纳出通用的公式: f(n) = f(n-1) + f(n-2), 只有一阶的时候f(1) = 1, 只有两阶的时候可以有f(2) = 2, 刚好就是斐波那契数列. 所以这个简单的跳台阶问题就是计算斐波那契数列的问题。反过来思考, 比如对于8个台阶, 有多少种回滚方案? 只有两种: 回滚1个台阶, 就到了7; 回滚2个台阶, 就到了6. 等于说: 假如有f(7)种方案跳到7, 有f(6)种方案跳到6，那么就有f(7) + f(6)种方案到达8 从树结构来理解: 如果节点代表台阶数n对应的跳法f(n), 节点与节点间的枝代表单次可以跳的阶数, 父节点的值就是其所有子节点的值和. 对于只有两种跳法限制问题, 父节点f(n)就只有两个子节点, 分别为f(n-1)和f(n-2). 斐波那契数列举例：Fibonacci sequence: ${\displaystyle 0,\;1,\;1,\;2,\;3,\;5,\;8,\;13,\;21,\;34,\;55,\;89,\;144,\;\ldots }$ $$F_0 = 0, F_1 = 1, F_2 = 1, F_n = F_{n-1} + F_{n-2} (n>2) $$Fibonacci numbers grow almost as fast as the powers of 2. ...