Java | Cong's Log

Dynamic Programming 04 - 丑数

丑数把只包含质因子2、3和5的数称作丑数（Ugly Number）。例如6、8都是丑数，但14不是，因为它包含质因子7。习惯上我们把1当做是第一个丑数。要判断一个数是不是丑数, 不断地分别除以2, 3, 5，然后检查num是否到达1: public boolean isUgly(int num) { //(除数包括`4`可以让代码更简洁) for (int i=2; i<6 && num>0; i++) while (num % i == 0) num /= i; return num == 1; } 如果要返回第n个丑数(leetcode原题), 情况就稍微复杂点. 从动态规划的角度考虑, 对于一个较大的丑数N, 必定是由某个更小的丑数M乘以2, 3, 5其中一个得来的. 所以可以从小到大不断生成丑数. 为了避免在循环中每一次计算都从头开始检查每一个数k对应的2*k, 3*k, 5*k, 需要用三个变量last2, last3, last5来分别记录最近一次用到的丑数的索引, 下一次计算时就直接从上一次停止的地方开始运行. /** return the nth ugly number */ public static int unglyNumber(int n) { final int INIT = 5; int[] uglys = new int[n + INIT]; for (int i = 0; i < 5;) { uglys[i] = ++i; } int last2, last3, last5, m2, m3, m5; last2 = last3 = last5 = 0; m2 = m3 = m5 = 1; for (int i = INIT; i < n; i++) { for (int j = last2 + 1; j < i; j++) { if (m2 <= uglys[i - 1] && uglys[j] * 2 > uglys[i - 1]) { m2 = uglys[j] * 2; last2 = j; } } for (int j = last3 + 1; j < i; j++) { if (m3 <= uglys[i - 1] && uglys[j] * 3 > uglys[i - 1]) { m3 = uglys[j] * 3; last3 = j; } } for (int j = last5 + 1; j < i; j++) { if (m5 <= uglys[i - 1] && uglys[j] * 5 > uglys[i - 1]) { m5 = uglys[j] * 5; last5 = j; } } uglys[i] = Math.min(Math.min(m2, m3), m5); } return uglys[n - 1]; } 这里提供了另一个理解这个问题的思路，并由此得出了一个更快的的算法(O(n))：根据前面算法的原理，可以知道下一个丑数一定是前面某一个丑数乘以2,3,5中的一个，所以可以把问题转换为从以下三组数据中不断取最小值的问题： ...

Dynamic Programming 03 - 最长公共子序列

最长公共子序列对于一个字符串, 它的子序列，就是将给字符串中任意个元素去掉之后剩余的字符串, 所以子序列不要求是连续的, 但是维持原来的顺序. 在文本相似度比较中，常用到最长公共子序列（longest common sequence）。同时遍历两个字符串, 如果x[i] == y[j], 则x[i]和y[j]参与了最长公共子序列z[k]的构建. 如果用lcs[i, j]表示遍历到x[0-i]和y[0-j]时的LCS长度, 那么现在就需要判断x[i]和y[j]的关系, 分两种情况: 如果二者相等, 那么lcs1 = lcs[i - 1, j - 1] + 1 若不相等, 那么只能在x和y中选择一个进行推进, 选择依据就是取较大值, lcs2 = max(lcs[i - 1, j], lcs[i, j - 1]) 初始状态自然是lcs[0, 0] = 0. static int[][] lcs; public static int longestCS(String x, String y) { char[] xList = x.toCharArray(); char[] yList = y.toCharArray(); for (int i = 1; i <= xList.length; i++) { for (int j = 1; j <= yList.length; j++) { if (xList[i - 1] == yList[j - 1]) { lcs[i][j] = lcs[i - 1][j - 1] + 1; } else { lcs[i][j] = Math.max(lcs[i - 1][j], lcs[i][j - 1]); } } } return lcs[x.length()][y.length()]; } 最长公共子串最长公共子串（longest common substring）, 要求的是任意连续的子字符串。设定LCS(i, j)为包含当前字符a[i]和b[j]的最长lcs. 假如当前满足a[i] == b[j], 那么LCS(i, j) = LCS(i - 1, j - 1) + 1, 否则为0. ...

Dynamic Programming 02 - 最大子序列

最大子序列 Maximum subarray problem: In computer science, the maximum subarray problem is the task of finding the contiguous subarray within a one-dimensional array of numbers which has the largest sum. For example, for the sequence of values −2, 1, −3, 4, −1, 2, 1, −5, 4; the contiguous subarray with the largest sum is 4, −1, 2, 1, with sum 6. The problem was first posed by Ulf Grenander of Brown University in 1977, as a simplified model for maximum likelihood estimation of patterns in digitized images. A linear time algorithm was found soon afterwards by Jay Kadane of Carnegie Mellon University (Bentley 1984). ...

Dynamic Programming 01 - 理解动态规划

我们已经看到了一些优雅的设计原则，例如分而治之，图探索和贪婪的选择，它们为各种重要的计算任务提供了确定的算法。这些工具的缺点是只能用于非常特定类型的问题。现在我们来谈谈算法工艺的两个大锤，即动态编程和线性编程，这两种适用性非常广泛的技术可以在更专门的方法失败时调用。可以预见的是，这种普遍性通常会带来效率上的损失。很多经典的方法，如 divide-and-conquer, graph exploration, and greedy等, 为各种重要的计算任务提供了确定性的解法。但是这些算法只能用于特定类型的问题。这里介绍两个算法大杀器: Dynamic programing 和 linear programming. 这两种适用性非常广的算法可以在黔驴技穷时考虑调用（如 the knapsack problem, sequence alignment, and optimal binary search trees）。当然，普遍性往往会带来效率上的损失。动态规划动态规划作为一种编程范例，可以从一个例子着手理解：求数列的 maximum-weighted independent sets (MIS, 最大非连续非相邻子集)和, 对于a = [1, 4, 5, 4], 其MIS为{a[1], a[3]} = 8. 如果使用贪心法, 每次都在可选范围内取最大值, 那么就会得到{a[2], a[0]} = 6. 如果使用分而治之法, 把数组分为两半a1 = [1, 4], a2 = [5, 4], 则分别得到MIS{a1[1]}, {a2[0]}, 合并后发现是相邻的, 与要求相悖. 要解决这个问题，关键的步骤是找到基于子问题最优解的最优解：想办法把缩小最优解备选方案的数量，在这个较小的空间中可以直接采取暴力搜索寻找最优解。对于a = [1, 4, 5, 4], 假设其MIS为S, 假如从最右边的元素开始考虑, a[3] = 4只有属于S和不属于S两种情况 ...

Algorithms 03 - Memory 内存

Memory Bit. 0 or 1. Byte. 8 bits. Megabyte (MB). 1 million or $2^{20}$ bytes. Gigabyte (GB). 1 billion or $2^{30}$ bytes. 64-bit machine. We assume a 64-bit machine with 8 byte pointers (References). ・Can address more memory. ・Pointers use more space (some JVMs “compress” ordinary object pointers to 4 bytes to avoid this cost). Typical memory usage for primitive types and arrays primitive types (bytes): boolean 1 byte 1 char 2 int 4 float 4 long 8 double 8 ...

Algorithms 02 - Amortized Analysis 平摊分析

假如有两种交税方式：每天付 3 金币每次付的金币呈指数级增长，但通知付款频率呈指数级下降第1天：付 1 第2天：付 2 (累计 3) 第4天：付 4 (累积 7) 第8天：付 8 (累积 15) 哪种付的钱比较少？第二种比较划算，本质上等同于每天付 2，就是amortized constant。 A more rigorous examination of amortized analysis is done here, in three steps: Pick a cost model (like in regular runtime analysis) Compute the average cost of the i’th operation Show that this average (amortized) cost is bounded by a constant. 类似的应用在Array list 扩容中提到的 geometric resizing 方法(实际也是Python list 使用的方法)有体现, 所以使用一个因数来扩容数组, 可以让 ArrayList 的 add操作变为 amortized constant time. ...

Algorithms 01 - Asymptotic Analysis 渐进分析

Resource and Reference: CS61B Berkeley - Josh Hug Algorithms Princeton - ROBERT SEDGEWICK, KEVIN WAYNE 效率来源两个方面: 编程成本: 开发程序需要多长时间？代码是否容易阅读，修改和维护（大部分成本来自维护和可扩展性）？运行成本: 程序需要多长时间运行 (Time complexity)？需要多少内存 (Space complexity)？ Asymptotic Analysis Care about what happens for very large N (asymptotic behavior). We want to consider what types of algorithms would best handle scalability - Algorithms that scale well have better asymptotic runtime behavior. Simplification Summary Only consider the worst case. Pick a representative operation (aka: cost model) Ignore lower order terms Ignore multiplicative constants. Simplified Analysis Process ...

Stanford CS106A/B Programming Intro 斯坦福大学编程入门课

Stanford CS106B Programming Abstractions 和 CS106A 的学习笔记. 课程作业(cs106b spring 2017)实现代码见 https://github.com/ShootingSpace/cs106b-programming-abstraction Topics: A: Intro (by Java) B: Recursion, algorithms analysis (sort/search/hash), dynamic data structures (lists, trees, heaps), data abstraction (stacks, queues, maps), implementation strategies/tradeoffs Purposes become acquainted with the C++ programming language learn more advanced programming techniques explore classic data structures and algorithms and apply these tools to solving complex problems Reference Text Book: Data Structures & Algorithm Analysis in C++, 4th ed, by Mark A. Weiss Text Book: Programming Abstractions in C++ 1st Edition by Eric Roberts Text Book: Algorithms, 4th Edition Blog: Red Blob Games, Amit’s A* Pages Coding style Why writing clean, well-structured code ...

Java Hash Table

Hash Tables Save items in a key-indexed table. Index is a function of the key - Hash function, method for computing array index from key. 要实现哈希表, 需要解决几个问题: 如何定义/计算哈希函数。相等判定：如何检查两个键是否相等。冲突解决：寻找能够处理哈希到同一索引的两个密钥的算法和数据结构。时空权衡设计问题: 如果没有空间限制, 那么可以使用非常简单的哈希函数, 极端情况就是给每一种键分配一个索引。如果没有时间限制, 那么对于键冲突问题可以使用简单的顺序搜索。而现实中, 哈希表就是解决同时存在空间和时间限制的问题。哈希函数最理想的目标, 生成均匀分布的索引, 这样计算高效. 比如电话号码, 使用前三个数字作为索引是一种比较草稿的设计, 因为前三个数字一般代表区号, 且区号都是有限的, 这样同一个区的号码都会挤在同一个索引位置. 较好的方式是使用后三位数字. 身份证号同理. 在实际设计过程中, 不同的数据类型适用不同的方法. 所有Java类都继承了int hashCode()方法. 该方法的基本要求是: If x.equals(y), then (x.hashCode() == y.hashCode()) 最好(但不是必须的)能够满足: If !x.equals(y), then (x.hashCode() != y.hashCode()) 默认的实现方式是利用内存位置. Java integers, booleans, and doubles: ...

Java HashMap

HashMap HashMap 是基于哈希表的 Map 接口的非同步实现。此实现提供所有可选的映射操作，并允许使用 null 值和 null 键。此类不保证映射的顺序，特别是它不保证该顺序不是stable的。 HashMap 底层就是一个数组结构，数组中的每一项又是一个链表。当新建一个 HashMap 的时候，就会初始化一个数组table = new Entry[capacity];, 每个 Entry 是一个 key-value 对，有一个指向下一个元素的引用，这就构成了链表。 public HashMap(int initialCapacity, float loadFactor) { if (initialCapacity < 0) throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity); if (initialCapacity > MAXIMUM_CAPACITY) initialCapacity = MAXIMUM_CAPACITY; if (loadFactor <= 0 || Float.isNaN(loadFactor)) throw new IllegalArgumentException("Illegal load factor: " + loadFactor); // Find a power of 2 >= initialCapacity int capacity = 1; while (capacity < initialCapacity) capacity <<= 1; this.loadFactor = loadFactor; threshold = (int)Math.min(capacity * loadFactor, MAXIMUM_CAPACITY + 1); table = new Entry[capacity]; useAltHashing = sun.misc.VM.isBooted() && (capacity >= Holder.ALTERNATIVE_HASHING_THRESHOLD); init(); } static class Entry<K,V> implements Map.Entry<K,V> { final K key; V value; Entry<K,V> next; final int hash; …… } HashMap 底层数组的长度总是 2 的 n 次方，这是 HashMap 在速度上的优化, 所以有这段代码保证初始化时 HashMap 的容量总是 2 的 n 次方，即底层数组的长度总是为 2 的 n 次方。 ...