In this task, you will implement a program that creates an English dictionary. You will implement a hash table to store the dictionary. The hash table will be indexed via the words in the English language. The dictionary will store the definition of the words for doing lookups after the dictionary is built.
Here are some of the key components of your implementation:
- Word class
- holds a word
- holds the word’s definition
- a key() function which returns an integer based upon converting the word to an int
- See Figure 5.4 from the book for this algorithm
- Dictionary class
- Implements a hash table (a vector)
- The hash table will start with size 101
- The Dictionary class will keep track of how many Words it holds (which is size N)
- The hash table uses separate chaining to resolve collisions (Chapter 5.3)
- Each chain is a vector holding Words
- The hash table needs to implement:
- void insert(Word)
- Hashes the Word’s key() and adds it to the hash table if it isn’t already there
- Updates (overwrites) old entry if there’s a already the same Word in the hash table
- bool contains(string word)
- Searches the hash table by a string.
- Will need to convert the string to a key, then hash it and search the vector of Words to see if
- Word delete(string word)
- Finds an entry using the passed in word, erases it (vector.erase()) from the hash table, then returns it
- Returns nullptr if word is not in the table
- void rehash()
- Is called if the table gets above a load factor (N / Table Size) of 1.0 during an insert of a new Word
- At least doubles the table size then finds the next prime number for the new table size (See figure 5.22 for an example)
- Example code for finding the next prime above a given integer can be
Parse input file:
- I have provided a file called: dictionary.json
- This filename needs to be passed on the command line via argv
- Example command line: ./a.out dictionary.json
That will make argv be a char* to the string “dictionary.json”
- Your program should detect if the filename was passed on the command line and print out help if the dictionary database wasn’t specified. For an example of this, you can go to a Linux/OSX command line and just run “ssh” with no options. It will print out a usage message.
- It is in JSON format and includes our English words and definitions, one per line
- You may either write your own parser, or use a JSON parsing library
Once your Dictionary class is full of our Words, it needs to print out:
- How many words are in the dictionary
- The current table size
- The current load factor
After that, it goes into query mode. This will output a prompt for the user such as the one below.
Word to define:
You should be able to enter a word, hit enter and have it either print out the definition or “Word not found” if it’s not in the dictionary. I won’t ask you to do fuzzy matching for misspellings like Google does, though. Note: the dictionary.json file’s words are all uppercase, but users are going to be entering queries in whatever case they feel like. Make sure that Word == word == wOrD.
The program should exit if the user enters End of File (EOF), which is Control-D (^D). An example of detecting EOF can be found at:
Your program will be tested first by running without passing in the dictionary.json filename on the command line – usage/help should be printed out. Then, it will be run with the dictionary.json filename. That should load the dictionary, and go into the query mode. I will be using a test file of words to search for. It should print out the words and their definitions or “Word not found” if it’s an invalid search.
It is very easy to run on the command line with a pipe (or a redirect):
cat testcases.txt | ./a.out dictionary.json
The program should build the hash from the dictionary, then do the queries for the words listed in that testcases.txt file and quit.
You must upload your program through Blackboard no later than midnight on Friday, November 18, 2016.
Your assignment will be judged by the following criteria:
-  Your code compiles on the EECS servers using g++ -std=c++0x *.cpp, passes inspection with a set of test cases (words), and also handles outputs a usage message if it isn’t given the dictionary.json filename on the command line (via argv)
-  Data structure usage. Your program only uses vectors, plus your Word and Dictionary classes.
-  Your code is well documented and generally easy to read.