Trie – Prefix Tree

1 (Basic), 2 (autocomplete), 9, 11, 12

  1. Trie Youtube Video and Java Code(add, search, delete) Gist(add, search, frequency).

  2. ***** Implement a phone directory using Trie /  Autocomplete  
    Algo:
    Save all the words in Trie. Now while searching we check for the first entered character in the prefix.
    If found, then we start suggesting all the words from that node. Suppose our Trie has – “anil”, “anu”, “anupam”, “anpket” and prefix is an. We check if root has ‘a’. Now we start iterating for each character found in the HashMap children. So we print each word starting with ‘a’.  There will be two methods, one for checking of the current char is present in Trie. If found, we pass the TrieNode to another method while iterates the HashMap.
    Now when the second character is entered. If the first char wasn’t found in our Tire root, then we would have already broken out of the loop. Else we would have saved the last node of the first matched character. So now we check for the second character, in the new HashMap being pointed(in the HashMap fetched from the previous char). The same process repeats.


  3. I’m making a search engine called MillionGazillion from Interview Cake. 
    Algo:
    1. We can store the data using a hash table mapping. This though takes a lot of space.  To prevent that we use a Trie(though it has it’s own memory issues but hey it’s better here).
    2. We can also use Bloom Filter here to check if a url has already been indexed.


  4. Find last unique URL from long list of URLs in single traversal. 
    Algo: This can be done in a number of ways.
    1. The implementation in GFG is with trie with doubly linked list.
    We can solve this problem in one traversal by using Trie with a Doubly Linked List (We can insert and delete in O(1) time). The idea is to insert all URLs into the Trie one by one and check if it is duplicate or not. To know if we have previously encountered the URL, we need to mark the last node of every URL as leaf node. If we encounter a URL for the first time, we insert it into the doubly Linked list and maintain a pointer to that node in linked list in leaf node of the trie. If we encounter a URL that is already in the trie and has pointer to the url in the linked list, we delete the node from the linked list and set its pointer in the trie to null. After all URLs are processed, linked list will only contain the URLs that are distinct and the node at the beginning of the linked list will be last unique URL
    2. We can implement the same with hash as well. Actually since we are not using the prefix properties of URLs  I would go for Hashing. Use the same logic as above just use HashMap instead of Trie. To save space, do a hash like md5 on the url before saving.


  5. Find shortest unique prefix for every word in a given list
    Algo:
    The basic idea is to save the frequency of each node visited. So for each character in the HashMap we also save the frequency.
    1) Construct a Trie of all words. Also maintain frequency of every node (Here frequency is number of times node is visited during insertion). Time complexity of this step is O(N) where N is total number of characters in all words.2) Now, for every word, we find the character nearest to the root with frequency as 1. The prefix of the word is path from root to this character. To do this, we can traverse Trie starting from root. For every node being traversed, we check its frequency. If frequency is one, we print all characters from root to this node and don’t traverse down this node.


  6. This is not a question actually. The links shows how to use a different DS within HashMap in Java. This link will be useful to make the changes in the current HashMap. HashMap: One Key, multiple Values

  7. Find Word Positions in Text –  Given a text file and a word, find the positions that the word occurs in the file. We’ll be asked to find the positions of many words in the same file.
    Algo: The problem is like creating a inverted index for a search
    1. We can use a HashMap here.
    2. Hash table will use more memory and provides for only exact match. Trie will take less memory and provides for partial search as well.   Along with endofword boolean, we also save all the file position where the word starts in the given file.


  8. Word formation using concatenation of two dictionary words – Given a dictionary find out if given word can be made by two words in the dictionary.
    Algo: We will use trie here. We first search for the first half of the word. If we find the partial match we start from the root again for the remaining part of the word.

    1- Store all the words of the dictionary in a Trie.
    2- Start searching for the given word in Trie.
       If it partially matched then split it into two
       parts and then search for the second part in
       the Trie.
    3- If both found, then return true.
    4- Otherwise return false.

  9. *** Given a dictionary and a character array, print all valid words that are possible using characters from the array.  GFG
    Algo: The idea is to use Trie data structure to store dictionary, then search words in Trie using characters of given array.

    1. Create an empty Trie and insert all words of given dictionary into the Trie.
    2. After that, we have pick only those characters in ‘Arr[]’ which are a child of the root of Trie.
    3. To quickly find whether a character is present in array or not, we create a hash of character array

  10. *** Minimum XOR Value Pair  – Code
    Algo:
    1. Sort the numbers.
    2. Calculate bit-wise xor for every adjacent pair in the sorted array. Proof
    3. Output the minimum of these xor you calculated.
    Proof: If answer is a xor b then there does not exist c such that a<=c<=b.


  11. *** Longest prefix matching – Given a dictionary of words and an input string, find the longest prefix of the string which is also a word in dictionary. GFG – Code
    Algo: We build a Trie of all dictionary words. Once the Trie is built, traverse through it using characters of input string. If prefix matches a dictionary word, store current length and look for a longer match. Finally, return the longest match.


  12. *** Longest Common Prefix (with Trie)Code  GFG
    Algo: We are using a trie here. Index all the words in the Trie. The first node where the nodes diverge will be the common prefix.
    Insert all the words one by one in the trie. After inserting we perform a walk on the trie. In this walk, go deeper until we find a node having more than 1 children(branching occurs) or 0 children (one of the string gets exhausted).
    This is because the characters (nodes in trie) which are present in the longest common prefix must be the single child of its parent, i.e- there should not be a branching in any of these nodes.Time Complexity : Inserting all the words in the trie takes O(MN) time and performing a walk on the trie takes O(M) time, where N = Number of strings M = Length of the largest string string
    Auxiliary Space: To store all the strings we need to allocate O(26*M*N) ~ O(MN) space for the Trie.
    Another Approach with same complexity. No extra space. We do a character by character matching here. Find the smallest string as the common prefix can’t be larger then that. Then start checking each character of every string. If all the string at index i has the same char, then proceed to the next char else break.


  13. ** Find Duplicate rows in a binary matrix –
    Algo:
    1. Insert each row in the trie and keep checking if two row end at the same point.
    2. Convert the binary row into decimal and insert it into a Map and check if it already exists.


  14. Print all words matching a pattern in CamelCase Notation Dictionary –
    Algo: So the idea here is to store all the Capital Characters Only. Then how do we retrieve the original word ? We store the original word in the last leaf node!!. Now this tweaking of the original Trie structure can be really helpful. If we didn’t use that we would have had to maintain a new Hash table.


  15. Given a huge set of strings with duplicate strings present, find the maximum occurring word in it. If two words have same count, return any one of them.
    Same as this
    Algo:
    1. The idea is to use Trie to solve this problem. We start by inserting each key into the trie and store its count so far in the leaf nodes. While inserting we maintain a count of the max word inserted.
    2. We can convert the string into a unique hash(using there ASCII and position). Now we insert these number into a max heap. In the node we also store the string. The max-heap will maintain the most frequent word on the top. This way we can keep track of the next frequent word one the first one is popped.
    3. Hash all words one by one in a hash table. If a word is already present, then increment its count. Finally, traverse through the hash table and return the k words with maximum counts.


  16. Given a sequence of words, print all anagrams together.
    For example, if given word sequence is –
    [“car”, “ape”, “meal”, “pea”, “male”, “arc”, “lame”, “dog”]
    Then output of the program should be –
    (car, arc) , (ape, pea), (lame, meal, male) dog
    Order of groups and order of group members between themselves could be different but group members must be same.
    Link
    Algo:
    We will sort each word before inserting into the Trie. So all the anagrams will end at the same last TrieNode. We store these last TrieNode in the HashMap.
    Either the last TrieNode or the Map can contain the original words. 
    The time complexity of this algorithm is O(m.nlogn) where m is total number of words in given sequence and n is the average length of each word in given sequence.


  17. [Pending]***** How to find list of possible words from a letter matrix [Boggle Solver] Geeksforgeek Video

  18. [Pending] [ Suffix Trie] Count of distinct substrings of a string – this can be solved in linear time using a suffix tree. We are not covering suffix tree as of now. But it’s good to know such an algo exists.

  19. Skip – Palindrome pair in an array of words (or strings) .

  20. Skip – Lexicographic sorting of given set of keys in Trie
Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s