各种树的变种

为了适应不同的应用场景, 人们使用不同的树结构来实现符号表.

九宫格输入法

对于手机的九宫格输入法, 简单的实现方式是多次敲击: 通过反复按键输入一个字母,直到出现所需的字母。

http://www.t9.com/ 的 T9 texting 支持更高效的输入方法: ・Find all words that correspond to given sequence of numbers. ・Press 0 to see all completion options. Ex. hello ・多次敲击: 4 4 3 3 5 5 5 5 5 5 6 6 6 ・T9: 4 3 5 5 6

可以使用 8-way trie 来实现.

三元搜索Trie

R较大的R-way trie的空间效率不高,读取比较大的文件往往导致内存不足。但弊端是开辟出的数组内存利用率其实不高。现在很多系统都使用Unicode,分支可高达65,536. 所以需要更高效的方法。

Ternary search tries: ・Store characters and values in nodes (not keys). ・Each node has 3 children: smaller (left), equal (middle), larger (right). Search in a TST: Follow links corresponding to each character in the key. ・If less, take left link; if greater, take right link. ・If equal, take the middle link and move to the next key character.

public class TST<Value>
{
    private Node root;
    private class Node
    {
        private Value val;
        private char c;
        private Node left, mid, right;
    }

    public void put(String key, Value val)
    { root = put(root, key, val, 0); }

    private Node put(Node x, String key, Value val, int d)
    {
        char c = key.charAt(d);
        if (x == null) { x = new Node(); x.c = c; }
        if (c < x.c) x.left = put(x.left, key, val, d);
        else if (c > x.c) x.right = put(x.right, key, val, d);
        else if (d < key.length() - 1) x.mid = put(x.mid, key, val, d+1);
        else x.val = val;
        return x;
    }

    public boolean contains(String key)
    { return get(key) != null; }

    public Value get(String key)
    {
        Node x = get(root, key, 0);
        if (x == null) return null;
        return x.val;
    }

    private Node get(Node x, String key, int d)
    {
        if (x == null) return null;
         char c = key.charAt(d);
         if (c < x.c) return get(x.left, key, d);
         else if (c > x.c) return get(x.right, key, d);
         else if (d < key.length() - 1) return get(x.mid, key, d+1);
         else return x;
    }
}

TSTs比hashing更快(特别是对于搜索缺失键的情况)。

基数树

Radix Tree, 也叫 Patricia trie (Practical Algorithm to Retrieve Information Coded in Alphanumeric), crit-bit tree, 压缩前缀树: ・Remove one-way branching. ・Each node represents a sequence of characters. ・Implementation: one step beyond this course. ![](/images/radix_trie.png “put(“shells”, 1); put(“shellfish”, 2); image from: https://algs4.cs.princeton.edu/") 对于基数树的每个节点,如果该节点是唯一的子树的话,就和父节点合并。

Applications. ・Database search. ・P2P network search. ・IP routing tables: find longest prefix match. ・Compressed quad-tree for N-body simulation. ・Efficiently storing and querying XML documents.

后缀树

后缀树(Suffix tree)指字符串后缀的基数树: 一个String S的后缀树是一个边(edge)被标记为字符串的树。因此每一个S的后缀都唯一对应一条从根节点到叶节点的路径。这样就形成了一个S的后缀的基数树。

Applications. ・Linear-time: longest repeated substring, longest common substring, longest palindromic substring, substring search, tandem repeats, …. ・Computational biology databases (BLAST, FASTA).

字符符号表总结

Red-black BST. ・Performance guarantee: log N key compares. ・Supports ordered symbol table API.

Hash tables. ・Performance guarantee: constant number of probes. ・Requires good hash function for key type.

Tries. R-way, TST. ・Performance guarantee: log N characters accessed. ・Supports character-based operations.

You can get at anything by examining 50-100 bits