# Homework 2 Solution

\$35.00 \$30.80

Category:

## Description

This homework will involve conceptual exercises and coding assignments. Instructions below detail how to turn in the conceptual part on Gradescope and codes via turnin.

• Coding Assignment: Deep Maximum Entropy Markov Model (40 Points)

1.1 Problem

In this homework, you are required to implement a Deep Maximum Entropy Markov Model (DMEMM) for the targeted sentiment task using the given dataset.

1.2 Dataset Description: Open Domain Targeted Sentiment

Named Entity Recognition is an NLP task to identify important prede ned named entities in the text, including people, places, organizations, and other categories.

Targeted sentiment analysis aims to capture sentiment expressed towards an entity in a given sentence. The task consists of two parts, Named Entity Recognition (identifying the entity) and identifying whether there is a sentiment directed towards the entity. For the purpose of this homework, we will focus on doing them in a collapsed way (combining sentiment and named entity recognition into one label sequence, and trying to predict that sequence).

Here is an example of the Targeted Sentiment Task:

Input (words) I love Mark Twain

Output (labels) O O T-POS T-POS

In this example, the two T-POS tagged words are the target, a Named Entity with a positive sentiment (since I is expressing love towards the target, Mark Twain). In this case, T- means the term (or target) and O means outside of any prede ned terms. In this homework, we just consider predicting if a word is a target with a sentiment (T-POS, T-NEG, T-NEU) or Other. As seen in this example, in NLP, targeted sentiment problems can often be casted as sequence tagging problems.

For more details regarding the Targeted Sentiment Task, you can look at the following paper:

https://www.aclweb.org/anthology/D13-1171/

In this assignment, you have to build a Deep Maximum Entropy Markov Model for the targeted sentiment task.

You are provided with a folder called data, that contains the train and test les. You must use the training set to build your model and predict the labels of the test set. You must not use the test set while training the models. You are provided with a le main.py where you will nd the starter codes for reading the data. Feel free to de ne any additional functions needed in this le or change this le in any way. Make sure that you maintain the output format (printing results) de ned in Section 1.6 for your predictions.

1

1.3 Model

The Deep Maximum Entropy Markov Model (DMEMM) extends the Maximum Entropy Markov Model (MEMM) seen in class by using a neural network to build the conditional probability.

The formulation for MEMM is as follows:

 N Xi P (yjx) = P (y0)P (y1jy0; x1):::P (ynjyn  1; x1) = P (yijyi  1 ; xi) =1 where P (yijyi  1; xi) = exp(wT  (xi;yi;yi  1)) Pyi exp(wT  (xi;yi;yi  1))

Use from lecture 12. It will be Product not Sum.

We can use the neural network as a probability generator for this function. In this case, the model takes yi 1 and xi, and predicts yi. Thus, an embedding for each word and an embedding for each tag are required for a deep version of MEMM (DMEMM).

In addition to the neural network, for inference, you need to implement the Viterbi algorithm (using Dynamic Programming) from scratch for decoding and nding the best path. For the details of this algorithm, please refer to the course slides.

1.4 Features + Word Representations

You are required to use word and tag embeddings of lower dimension as features:

1. You are required to represent your text using ALL of the following models:

1. Randomly initialized embeddings (you must initialize them and train them in your code).

1. Pre-trained Word2Vec

1. Bi-LSTM (for this you can combine with either option (1) or (2) above, your choice.

Options 1 and 2, require feature functions that can capture the relevant context when making the prediction for the i-th word.

For Option 2 (pre-trained Word2Vec), you must not submit the Word2Vec model with your submission. Instead, we have created a directory with a Word2Vec model that is accessible by everyone via ssh to data.cs.purdue.edu. The Word2Vec model is located at the following path on data.cs.purdue.edu:

/homes/cs577/hw2/w2v.bin

Please make sure that your code loads the Word2Vec model from this path and then creates the embeddings. An example of how to do this is also provided in the starter code. DO NOT submit a Word2Vec model with your submission.

1. You also need to use tag embeddings for each tag. For tag embeddings, you can use either bit-map embedding (a vector of the length as the total number of di erent tags, with all positions lled up with \0″, except one position with \1″, like [0, 0, 0, . . . , 1, . . . , 0, 0, 0]), or initialize them as low dimension vector similarly as word embeddings and train them.

2

1.5 Model Tuning

You can improve your model in various ways. Here are some ideas:

Add a regularizer to the loss function. Here, you can also play with the coe cient of the regularizer function.

Di erent learning/optimization methods Hyper-parameter tuning:

{ Learning rate

{ Hidden layer size

{ Word embeddings dimension size

{ Number of iterations (epochs) to train the model And so on …

1.6 Running The Code

Your code should be runnable with the following command:

In this command, your code should train a Deep MEMM using the le data/twitter1_train.txt, test it data/twitter1_test.txt, and use Randomly Initialized Word Embeddings. Option 2 would correspond to using Pre-trained Word2Vec embeddings, and Option 3 would correspond to using Bi-LSTM.

For example, for Pre-trained Word2Vec, we would run:

And for Bi-LSTM:

Your code should output the Precision, Recall, and F1 Score. You may use a library to calculate this. For example (note that these are just random numbers):

Precision: 51.89

Recall: 45.05

F1: 48.11

1.7 Packages

For this assignment, you will be using the PyTorch package to implement the Neural Network. Make sure that you do not use GPU. You must implement the Viterbi Dynamic programming algorithm yourself (no packages allowed for this part).

If you choose, you can run your code on a conda environment. Here is a command to create an environment that has all the packages that will be used when grading.

conda create –prefix deepmemm_env python=3.6 pip anaconda pytorch nltk torchvision gensim cudatoolkit=10.0 numpy=1.16.1 -c pytorch

3

1.8 Tips

Some sample code can be found here: https://pytorch.org/tutorials/beginner/deep_learning_ nlp_tutorial.html. But there is a di erence between CRF and MEMM: MEMM is locally normalized while CRF is globally normalized. Thus, you do not need to run the forward algorithm in the training to calculate the partition function, but only the Viterbi algorithm in the inference for decoding.

1.9 Time Limit

There is no strict time limit for this assignment. However, your code should run in a reasonable amount of time (hours, not days).

4

• Conceptual Questions (20 Points)

Now, answer the following questions based on your implementation of the DMEMM.

1. State one hyper-parameter you tuned in case of the Neural Network. How did tuning these hyper-parameters change the result? Explain with learning curves. You may have tuned more than one hyper-parameter. In this section explain only one. For this question, you are not allowed to use embedding type as a hyper-parameter. (4 Points)

1. Evaluate your best neural network con guration with learning curves. Plot your train, validation, and test set performance on the same plot as the epochs increase and training progresses. What does this tell you about the model? (4 points)

1. Evaluate the di erence between the three types of word representations used (Randomly Initial-ized Embeddings, Word2Vec, and Bi-LSTM).

• How do they change the result? Explain with learning curves. Be sure to evaluate the accuracy of the neural network models with each word representation type. (4 points)

• Why would you want to use one over the other? Be sure to state the advantages and disadvantages of each. (4 points)

1. If the model is only allowed to use the embedding of the current word when making the prediction over it, what would he model performance be? Explain. (4 points)

5

2.1 Submission Instructions:

2.1.1 Conceptual Part

For your pdf le, use the naming convention username hw#.pdf. For example, your TA with username roy98 would name his pdf le for HW2 as roy98 hw2.pdf.

To make grading easier, please start a new page in your pdf le for each question. Hint: use a nnewpage command in LaTeX after every question ends. For example, for HW2, use a nnewpage command after each of the questions.

Follow the above convention and instruction for future homeworks as well.

2.1.2 Coding Part

You need to submit your codes via Turnin. Log into data.cs.purdue.edu (physically go to the lab or use ssh remotely) and follow these steps:

Place all the les in a folder named username hw#. For example, your TA with username roy98 would name his folder for HW2 as roy98 hw2. This naming convention is important. If the folder is not named correctly, there’s no way to identify whose submission is that. Hence, may result in no grading.

Change directory to outside of username hw# folder (run cd .. from inside username hw# folder)

Execute the following command to turnin your code: turnin -c cs577 -p hw2 username hw# To overwrite an old submission, simply execute this command again.

To verify the contents of your submission, execute this command: turnin -v -c cs577 -p hw2.

Do not forget the -v option, else your submission will be overwritten with an empty submission.

6