Data Compression

避免歧义的编码在构建压缩编码的对应关系时，我们使用不同的数量的位来编码不同的字符. 比如摩斯密码. 如果单纯使用这种对应关系，会出现一些问题，如•••−−−•••会产生歧义: SOS? V7? IAMIE? EEWNI? 所以在实际使用中, 密码使用一些间隔来分隔代码字。那么对于不同的压缩编码, 有什么常用方法来避免歧义？方法是确保没有一个编码是另一个编码的前缀。比如使用固定长度编码。为每个编码添加特殊的stop char。使用一种具备广泛使用性的prefix-free编码。用什么数据结构来设计prefix-free编码? 用Trie构造编码一个二叉(0, 1)Trie: 叶节点是字符, 根节点到叶节点的路径就是编码. 压缩: 方法1：从叶开始; 按照路径到达根; 反向打印bits。方法2：创建键-值对的符号表。解压: 从根节点开始, 根据位值是0还是1在Trie图上游走, 直到走到叶节点，则解压出一个字符返回根节点, 继续第一步, 直到跑完所有编码. private static class Node implements Comparable<Node> { private final char ch; // used only for leaf nodes private final int freq; // used only for compress private final Node left, right; public Node(char ch, int freq, Node left, Node right) { this.ch = ch; this.freq = freq; this.left = left; this.right = right; } public boolean isLeaf() { return left == null && right == null; } // compare Nodes by frequency public int compareTo(Node that) { return this.freq - that.freq; } // Runtime - Linear in input size N public void expand() { Node root = readTrie(); // read in encoding trie int N = BinaryStdIn.readInt(); // read in number of chars for (int i = 0; i < N; i++) { Node x = root; while (!x.isLeaf()) { if (!BinaryStdIn.readBoolean()) x = x.left; else x = x.right; } BinaryStdOut.write(x.ch, 8); } BinaryStdOut.close(); } } 如何读取一个Trie：根据Trie的前序遍历序列重构. ...

数据压缩压缩数据以节省储存空间，节省传输时间。同时很多文件都有很多冗余信息，这为压缩提供了很多可能性。通用文件压缩 ·文件：GZIP，BZIP，7z ·Archivers：PKZIP ·文件系统：NTFS，HFS +，ZFS 多媒体 ·图像：GIF，JPEG ·声音：MP3 ·视频：MPEG，DivX™，HDTV 通讯 ·ITU-T T4 Group 3 Fax ·V.42bis调制解调器 ·Skype 数据库压缩率 Compression ratio = Bits in Compressed B / bits in B. 自然语言的压缩率为50-75％或更高. 读写二进制 public class BinaryStdIn { boolean readBoolean() // read 1 bit of data and return as a boolean value char readChar() // read 8 bits of data and return as a char value char readChar(int r) // read r bits of data and return as a char value // similar methods for byte (8 bits); short (16 bits); int (32 bits); long and double (64 bits) boolean isEmpty() // is the bitstream empty? void close() // close the bitstream } public class BinaryStdOut { void write(boolean b) // write the specified bit void write(char c) // write the specified 8-bit char void write(char c, int r) // write the r least significant bits of the specified char // similar methods for byte (8 bits); short (16 bits); int (32 bits); long and double (64 bits) void close() // close the bitstream } 比如使用三种方法表达12/31/1999 1, A character stream (StdOut), ...

Data Compression

信息处理 - 数据压缩 - 哈夫曼编码

信息处理 - 数据压缩