Inspired by Wang and Ling 2016 .
Importance estimation: is a component of this approach in which each premise gets a score that reflects its importance in relation to the conclusion.
Given a set of premises, generate for each an importance score that reflects its importance in relation to the conclusion.
1. Data preparation
2. Feature Representation
3. Model Training
4. Evaluation
Install Jupyter via pip command:
pip install jupyter
Run Jupyter:
jupyter notebook
Access the notebook on under localhost:8888
Any library can be downloaded by using the pip tool:
pip install library-name # library-name = matplotlib, numpy, nltk ....
In this lecture we will be using the following libraries:
notebook.ipynb
to present the results.importance_estimation.py
where you will add your code.idebate.json
contains the dataset.Execute the following cell (shift+enter
) to initialize an instance of the ImportanceEstimationModel class (the class resides in importance_estimation.py
where you will add our code).
from importance_estimation import ImportanceEstimationModel
Task: Load data from the json file idebate.json
and split it based on the debate_id
field into 80% training and 20% testing.
load_data
function (in the importance_estimation.py
file):path
to the data filetrain_dataset
and test_dataset
where each item is a tuple of a list of premises and a conclusion.json
library load the data into a json object._argument_sentences
and _claim
(conclusion) fields.Initializing the ImportanceEstimation
model and loading the data:
model = ImportanceEstimationModel()
train_data, test_data = model.load_data('./idebate.json')
print('Number of train arguments:', len(train_data))
print('Number of test arguments:', len(test_data))
Number of train arguments: 1805 Number of test arguments: 454
Print a random sample of the train_data
:
#we only print the first 5 premises from the instance
model.print_train_sample(train_data)
Conclusion: The Japanese people do not want the bases on their soil. Premises: 1 . Without reason to be there , and unwanted by the people , the United States should remove its forces from Japan . 2 . This is demonstrated in every opinion poll and is reflected in the fact that current ruling party in the Japanese parliament , the Democratic Party of Japan was elected partly on the basis of its promise to remove the bases . 3 . For all of these reasons , the Japanese people have resoundingly stated their desire for the United States to withdraw its forces and close its bases on their soil . 4 . Most of the soldiers who commit these crimes never see justice since American soldiers stationed in Japan enjoy partial extraterritorial status , granting them a degree of immunity from prosecution by Japanese authorities . 5 . The presence of American military personnel is particularly onerous in light of the multitude of crimes committed by soldiers over the years ; since the 1950s , more than 200,000 accidents and crimes have been committed , and more than 1000 Japanese civilians have been killed , and a number of others have been the victims of assault and rape .
Since we don't have ground truth scores for the premises to reflect their relevancy to the conclusion, we take the token overlap between an argument's premise and its conclusion as a measure to reflect the relevancy.
Task: For each premise compute the token overlap with the conclusion (after excluding stopwords).
instance_scores
function:Ground truth scores (number of tokens shared between a premise and a conclusion) distribution:
from matplotlib import pyplot as plt
instances_scores = [model.instance_scores(instance[0], instance[1]) for instance in train_data]
all_scores = [score for x in instances_scores for score in x]
plt.hist(all_scores)
plt.xlabel('Score')
plt.ylabel('Number of premises')
plt.show()
Task: For each premise construct a features vector containing the following features:
_num_of_words_feature
function to return number of words in the claim. You may use the nltk.word_tokenize
for this.Avg./Max. tf-idf scores: For this
_build_tfidf_model
that builds a tfidf model (use scikit-learn for this) over a corpus of texts. Consider each set of claims (you might concatenate the claims as one string) as one document._tfidf_features
that uses the tfidf_model to compute for each claim the average tf-idf value of its tokens as well as the maximum tf-idf.Number of positive/negative/neutral words: Implement the function _sentiment_features
that uses sentiwordnet
lexicon (you might use the implementation by NLTK).
Encoding the train_data
into features vectors as well as computing the corresponding ground truth scores:
train_X, train_Y = model.feature_representation(train_data)
print('train_X shape :', train_X.shape)
print('train_Y shape :', train_Y.shape)
train_X shape : (13901, 6) train_Y shape : (13901,)
Task: Train a support vector regression (SVR) model using a grid search (over the cost parameter C) and cross validation of 5.
train_grid_search_svr
:train_X
and train_Y
best_svr
the best SVR
model and best_score
the mean absolute error of the best SVR
model.mean absolute error
for scoring.train_data
and save the best_svr
as a property in the ImportanceEstimation
model.best_svr, mean_absolute_error = model.train_grid_search_svr(train_X, train_Y)
print('Mean absolute error:', mean_absolute_error)
Mean absolute error: -0.9507702650510839
Task: Evaluate the best_svr
model by computing the mean reciprocal rank (MRR) on the test_data
.
mrr_evaluation
function:test_data
mrr_value
test_data
, predict the scores for its premises using the best_svr
model and compute the overlap with the conclusion (you may call instance_score
).Computing the MRR
value on the test dataset:
model.mrr_evaluation(test_data)
0.8242828543214002
MRR
score.