Please submit an archive of your solution (including code) on Gradescope by 11:59pm on the due date.
Please document your code where necessary.
All les that are necessary to do the assignment are contained in a tarball which you can get from:
You need to unpack this tarball (tar -zxvf cs447 HW3.tar.gz) to get a directory that contains the code and data you will need for this homework.
For this assignment you will need the Natural Language Toolkit (http://nltk.org/).You can use the following commands:
sudo pip3 install nltk==3.4.5
Part 1: Hidden Markov Model Part of Speech Taggers (8 points)
The rst part of the assignment asks you to implement a Hidden Markov Model (HMM) model for part of speech tagging. Speci cally, you need to accomplish the following:
implement the Viterbi algorithm for a bigram HMM tagger
train this tagger on labeled training data (train.txt), and use it to tag unseen test data (test.txt).
write another program that compares the output of your tagger with the gold standard for the test data (gold.txt) to compute the over overall token accuracy of your tagger. In addition, you need to compute the precision and recall for each tag and produce a confusion matrix.
You will use three data les to train your tagger, produce Viterbi tags for unlabeled data, and evaluate your tagger’s Viterbi output.
The le train.txt consists of POS-tagged text:
Kemper_NNP Financial_NNP Services_NNPS Inc._NNP ,_, charging_VBG that_DT program_NN trading_NN is_VBZ ruining_VBG the_DT stock_NN market_NN ,_, cut_VBD off_RP four_CD big_JJ Wall_NNP Street_NNP firms_NNS from_IN doing_VBG any_DT of_IN its_PRP$ stock-trading_NN business_NN ._.
The_DT move_NN is_VBZ the_DT biggest_JJS salvo_NN yet_RB in_IN the_DT renewed_VBN outcry_NN against_IN
program_NN trading_NN ,_, with_IN Kemper_NNP putting_VBG its_PRP$ money_NN –_: the_DT millions_NNS of_IN
dollars_NNS in_IN commissions_NNS it_PRP generates_VBZ each_DT year_NN –_: where_WRB its_PRP$ mouth_NN is_VBZ ._.
Words are separated by spaces. Tags are attached to words, with a single underscore between them. In the actual le, each line is a single sentence. You will use this le to train a bigram HMM tagger (i.e. estimate its transition, emission and initial probabilities). For the purposes of this assignment, you need to use an UNK token to allow unseen words; use a threshold of 5. Additionally, you need to smooth the transition probabilities using Laplace smoothing (Lecture 4, slides 17-22). This allows your model to assign non-zero probability to unseen tag bigrams (which do exist in the test corpus).
The le test.txt consists of raw text. Each line is a single sentence, and words are separated by spaces. Once you nish your implementation, run your tagger over test.txt and create a le out.txt that contains the Viterbi output.
For evaluation purposes, you need to compare the output of your tagger with gold.txt.
1.3 Provided Code/What you need to implement
Your solution will consist of two les, hw3 hmm.py and hw3 eval hmm.py.
1.3.1 HMM tagger
You should implement your HMM tagger in the module hw3 hmm.py. You are free to implement your HMM tagger using whatever data structures you wish, but all of your code should be your own1. We provide a skeleton implementation for the HMM class with training and testing methods:
train(self, corpus): estimates the counts for the bigram HMM on a labeled training corpus, which is a nested list of TaggedWord objects2 (0.5 points)
test(self, corpus): given unlabeled corpus (a nested list of word strings), print each sentence with its Viterbi tag sequence (0.5 points)
The tagging method requires you to implement the Viterbi algorithm for bigram HMMs:
viterbi(self, sen): given sen (a list of words), returns the Viterbi tag sequence for that sentence as a list of strings (2 points)
Implementation Hint: If you have a lexicon that tells you for each word which tags it can have, you don’t have to go through all cells in each column. Similarly, if you rst check whether a word has zero probability given a tag, you don’t have to do all the computations to ll that cell. All your computations should again be in log space.
You may implement these functions however you wish, but please preserve the method signatures for auto-grading.
Note: The gold labels for the test data contain a transition (“sharper JJR $ $”) that is not present in the training data when using the default unknown word threshold of 5.3 In order to prevent your tagger from assigning zero probability to a test sentence, you can use add-one smoothing on the transition distributions to ensure that a sequence with non-zero probability exists. Make sure that you still restrict the space of possible tags for each word token according to your tag lexicon.
Feel free to re-use your distribution code from HW1, if you like.
2A TaggedWord simply stores a word token and a POS tag (e.g. from train.txt or gold.txt) as two separate strings.
Both ‘sharper’ and ‘$’ appear at least 5 times, and JJR and $ are the only tags they appear with, respectively; however, nowhere in the training data is there a bigram tagged as JJR-$.
1.3.2 Evaluation program
You should implement your evaluation program in hw3 eval hmm.py. We ask that you implement the following methods for the Eval class:
getTokenAccuracy(self): returns the percentage of tokens that were correctly labeled by the tagger (1 points)
getSentenceAccuracy(self): returns the percentage of sentences where each word in a sentence was correctly labeled (1 points)
getPrecision(self, tagTi): return the tagger’s precision when predicting tag ti (1 points)
Precision indicates how often ti was the correct tag when the tagger output ti
getRecall(self, tagTj): return the tagger’s recall for predicting gold tag tj (1 points)
Recall indicates how often the tagger output tj when tj was the correct tag.
writeConfusionMatrix(self, outFile): writes a confusion matrix to outFile (1 point)
We’d like you to create a confusion matrix (Lecture 6, slide 21) that indicates how often words that are supposed to have tag ti were assigned tag tj (when ti = tj , a token is correctly labeled). The rst line of your confusion matrix should be a comma- or tab-separated list of POS tags (NN, NNP, etc.); the jth entry in this list indicates that the jth column in the matrix stores the number of times that another tag was tagged as tj . The rst entry in the ith following row should be tag ti, followed by the counts for each tj . Thus, the diagonal of the confusion matrix indicates the number of times that each tag was labeled correctly. The entries in this confusion matrix should be raw frequencies (i.e., you don’t need to normalize the counts). You can use the counts in your confusion matrix to calculate precision and recall for each tag.
1.3.3 Sanity check
The Python script hmm sanity check.py will run your code and evaluate your HMM tagger on a sample sentence. It will check that your tagger correctly predicts the POS tags for that sentence, and that the probabilities of the model are reasonably close to the TA implementation. The script will also test your evaluation program to make sure that the methods you implement there behave correctly. Note that passing the script does not mean that your code is 100% correct (but it’s still a good sign!).
1.4 What to submit for Part 1
For this portion, you should submit your two program les along with your confusion matrix:
hw3 hmm.py: your completed Python module for POS tagging using a bigram HMM(see section 1.3.1)
hw3 eval hmm.py: your completed Python module for evaluating your HMM’s output (see section 1.3.2)
confusion matrix.txt: a text le containing the confusion matrix you produced from your HMM’s output
We will provide a copy of the corpora/input les to test your program during grading.
1.5 What you will be graded on
You will be graded according to the correctness of your HMM tagger4 (3 points), as well as the correctness of your token and sentence accuracy functions (2 point), precision and recall functions (2 point) and confusion matrix (1 point).
Your tagger should not get 100% accuracy; correctness will be judged by comparing your Viterbi predictions against the predictions of the TA’s tagger, allowing for noise due to ties and such.
Part 2: Writing a Context-Free Grammar (2 points)
Your rst task is to write a context-free grammar that can parse a set of short sentences. You will be using parsing code from the NLTK for this portion of the assignment, and your grammar does not need to be in Chomsky Normal Form.
The le sentences.txt contains an evaluation set of 25 short sentences.
2.3 Provided code
We have provided a script (hw3 nltkcfg.py) that reads your grammar from mygrammar.cfg and tries to parse the evaluation set. The script expects the sentences.txt le to be in the same directory, and will write its output to a new le, hw3 cfg out.txt. Call the script from the command line with:
For each sentence, the script will provide a brief message in the console (whether the parse was a SUCCESS or a FAILURE), along with the number of successful parse trees. More importantly, each successful parse tree will be written to the output le in a bracketed, tabbed format. When evaluating your grammar, you should look at this output to nd ways to improve your grammar.
Additionally, you can provide –gui as an optional argument for debugging:
python hw3_nltkcfg.py –gui
With –gui enabled, the script will display a pop-up window with each successful parse as a tree. In order to view the next parse tree, you must close the window. We provide minimal debugging support for looking at partial charts in the event of a parse failure. If you feel that you need to see this information to debug your grammar, you are free to add additional debugging statements to hw3 nltkcfg.py.
2.4 What you need to implement
You don’t need to write (or submit) any actual Python code for this part of the assignment. Instead, your task is to de ne the rules for your CFG in mygrammar.cfg. This le already contains the lexical rules required for these sentences (i.e., words are mapped to their POS tags), along with a start symbol for the grammar that generates the nonterminal symbol S.
Please do not change or delete the existing rules.
Instead, you will need to add rules such as:
You may de ne your own inventory of nonterminals, but please do not change the existing rules. Your grade will depend in part on your grammar’s ability to successfully parse the evaluation sentences; however, we care more about your ability to de ne and explain a linguistically meaningful grammar (see below).
While it’s possible to de ne a very simple grammar that parses every sentence, such as:
LEX -> NN | NNS | COMMA | VB | …
that’s not the point of this part of the assignment, and such a solution would receive zero credit. Similarly, a set of rules like:
S -> NN CC NN VBD FS
S -> PRP VBP NNS CC NNS FS
where each (extremely at) rule generates the POS tag sequence for a particular sentence in the evaluation data is also not allowed. Instead, your grammar should use a set of nonterminals similar to those of the Penn Treebank parse trees you’ve seen in class. You can nd a description of the POS tags used in this assignment at https://www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html.
2.5 What to submit
Along with your completed grammar le (mygrammar.cfg), you should submit your output le (hw3 cfg out.txt).
2.6 What you will be graded on
You will be graded on your grammar’s ability to successfully parse the evaluation sentences (1.0 pts) and on the quality5 of your grammar (1.0 pts).
In conclusion, you need to submit the following les for grading purposes to gradescope and remember not to change any function name or lename we give you:
hw3 eval hmm.py
hw3 cfg out.txt.
5See the Implementation hints in section 1.4