Tries
A trie, pronounced “try” and also called digital tree or prefix tree, is a special data structure used to store strings that can be visualized like a graph. Comes from the word retrieval. It consists of nodes and edges. Each node consists of at max 26 children and edges connect each parent node to its children.
For a trie data structure, we need to store extra data such as character link pointers and complete node flags. Therefore, the trie requires more memory to store the string data. However, if there are many common prefixes, the memory requirement becomes smaller as we can share the prefix nodes.
Although the hash table has a relatively faster lookup speed, it only supports the exact match of the whole string. The trie solution is more flexible to support more applications, such as auto-complete. Also, we can easily print all the words in the dictionary in alphabetic order with a trie.
Therefore, if we want a full-text lookup application, the hash table is better as it has a faster lookup speed. However, if we want to return all the words that match a prefix, the trie is our solution.
Tries: implementation
class TrieMap(object):
"""Trie implementation of a map. Associating keys (strings or other sequence type) with values. Values can be any type. """
def __init__(self, kvs):
self.root = {}
for (k,v) in kvs:
self.add(k,v)
def add(self,k,v):
""" Add a key-value pair """
cur = self.root
for c in k: # for each character in the string
if c not in cur:
cur[c] = {} # if not there, make new edge on character c
cur = cur[c]
cur['value'] = v # at the end of the path, add the value
def query(self, k):
""" Given key, return associated value or None """
cur = self.root
for c in k:
if c not in cur:
return None
return cur.get('value')
The complexity of creating a trie is O(W*L), where W is the number of words, and L is an average length of the word: you need to perform L lookups on the average for each of the W words in the set.
Applications:
1. Auto Complete (Web Browser, Email,Search Engines( More advanced Levenshtein Algorithm),Source Code Editors,Database Query Tools,Word processors, Command Line interpreters)
2. Spell Checkers
3. Longest Prefix Matching
4. Browser History
5. Bioinformatics
6. Internet routing (Luleå algorithm)
("Compressed variants of tries, such as databases for managing Forwarding Information Base (FIB), are used in storing IP address prefixes within routers and bridges for prefix-based lookup to resolve mask-based operations in IP routing.")
7. Full-text search
Suffix Trie
Suffix trie is a space-efficient data structure to store a string that allows many kinds of queries to be answered quickly.By building Trie of all suffixes, we can find any substring in linear time. Every suffix is ending with string terminating symbol. From each node if there is any path, it moves forward, otherwise returns that pattern is not found.
For this algorithm, the time complexity is O(m+k), where the m is the length of string and k is the frequency of the pattern in the text.
Building a Trie of Suffixes
1) Generate all suffixes of given text.
2) Consider all suffixes as individual words and build a trie.
Suffix Trie: implementation
class SuffixTrie(object):
def __init__(self, t):
""" Make suffix trie from t"""
t += '$' # special terminator symbol
self.root = {}
for i in xrange(len(t)): # for each suffix
cur = self.root
for c in t[i:]: # for each character in i'th suffix
if c not in cur:
cur[c] = {} # add outgoing edge if necessary
cur = cur[c]
def followPath(self, s):
""" Follow path given by characters of s.Return node at end of path, or None if we fall off."""
cur = self.root
for c in s:
if c not in cur:
return None
cur = cur[c]
return cur
def hasSubstring(self, s):
""" Return true iff s appears as a substring of t"""
return self.followPath(s) is not None
def hasSuffix(self, s):
"""Return true iff s is a suffix of t"""
node = self.followPath(s)
return node is not None and '$' in node