Sentimental Analysis

This section groups papers that tackle the subject of sentimental analysis

Recursive deep models for semantic compositionality over a sentiment treebank

Paper link: Recursive deep models for semantic compositionality over a sentiment treebank
Citation: Socher, Richard, et al. "Recursive deep models for semantic compositionality over a sentiment treebank." Proceedings of the 2013 conference on empirical methods in natural language processing. 2013.

Paper resources' webpage

This work makes available a new data set for sentimental analysis called the Stanford Sentimental Treebank. It was built on a phrasing corpus from Rotten Tomatoes that counts several movie reviews. Each phrasing in the reviews are parsed with the Stanford Parser and are later manually labeled by human annotators from Amazon Mechanical Turk.

This work also introduces a new sentiment classification model called: Recursive Neural Tensor Networks. The model is tested on the new STT dataset and compared against competitive models. The RNTT model beats all the others and increases the state-of-art results by 9 points in some cases. The model is particularly good in detect changes in meaning that usually follows negations or conjunctions such as but.

Semantic and Compositional Semantics

GloVe is an example of what the authors call a semantic embedding; and the SST an example of a compositional semantic method.

A neural network that is not a black box

The output of the recursive neural network proposed here is a fully labeled tree for all subphrasings in the sentence. That is very important in machine learning applications. The fully-labeled tree gives us an idea about the reasoning of the algorithm (although it is not a complete explanation).

Baseline comparison

Standard RNN
Naive Bayes
bi-gram NB
SVM

A criticism on semantic vector spaces

They do not capture well the differences between antonyms.

When I first read this I though: but it was made to solve exactly this problem. But indeed, a standard VSM may have difficulties to put word as good and bad far apart from each other. The reason is that these words can be exchanged while keeping a sentence correct, that is, both can be used in the same context, with theoreticaly the same co-occurrence of words.

Of course, this problem is atenuated if we use a larger dataset, with plenty of examples, so the tendance is that distinct nuances start to appear between the use of words good and bad. Another remedy for this is to use a very large window to compute the co-occurrence matrix.

From a linguistic or cognitive standpoint, ignoring word order in the treatment of a semantic task is not plausible.

The Stanford Parser

This is the tool used to create the trees. It would be nice to discover how it works.

The tree generated by the parser facilitates the subdivision of the main sentence in sub-sentences. For example, in the figure below, we may consider as a complete phrase all the subtrees defined in the parser.

least compeling
least compeling variations
the least compeling variations
this theme
on this theme
the least compeling variations on this theme

RNN (word relations are not well captured)
Matrix-Vector RNN (Linguisticaly motivated, but too expensive)
RNTN: Recursive Neural Tensor Network (proposed)

Loss functions

KL-divergence
Cross-entropy

Missing gaps in knowledge to understand section 4

I can't read section 4 without a better comprehension of machine learning concepts.

Comparison with other models

The model proposed here show its power in phrasings where meaning varies along subphrasings, such as

There are slow and repetitive parts but it has just enough spice to keep it interesting.

The proposed model correctly classify this as positive, while the competitive models classified as negative due to the higher number of negative words.

The model also learns that a negation of a negative word does not necessary changes the sentiment to positive, but to less negative, instead. For example,

The movie was terrible
The movie was not terrible

The first is clearly negative but the second is not positive. Is kind of neutral.

Other example

It is just incredibly dull (negative)
It is definitely not dull (neutral)

Most positive and most negative n-grams (some examples)

Which n-grams the model classified as most positive or negative?

Positive:

engaging; best
excellent performances; A masterpiece
an amazing performance; wonderfull all-ages triumph
nicely acted and beautifully shot
one of the best films of the year

Negative:

bad; dull
worst movie; very bad
a lousy movie; a complete failure
silliest and most incoherent movie
a trashy, exploitative, thoroughly unpleasant experience

Accurate Unlexicalized Parsing

Paper link: Accurate Unlexicalized Parsing
Citation: Klein, Dan, and Christopher D. Manning. "Accurate unlexicalized parsing." Proceedings of the 41st annual meeting of the association for computational linguistics. 2003.

AKA: The Stanford Parser.

This paper discuesses how an unlexicalized PCFG parser can achieve performance comparable with its lexicalized counterparts. The advantage being that:

an unlexicalized PCFG is much more compact, easier to replicate, and easier to interpret than more complex lexical models, and the parsing algorithms are simpler, more widely understood, of lower asymptotic complexity, and easier to optimize.

Given a natural language sentence, the output of the parser is a tree in which tokens are leaves and higher nodes represent the grammar expansion that originates the sentence.

PCFG: Probabilistic Context-Free Grammar
Generalized CYK parser.

Dynasent: A dynamic benchmark for sentiment analysis

Paper link: Dynasent: A dynamic benchmark for sentiment analysis
Citation: Potts, Christopher, et al. "Dynasent: A dynamic benchmark for sentiment analysis." arXiv preprint arXiv:2012.15349 (2020).

This is a new dataset for sentiment analysis (from Potts et al. 2020).

Development pipeline for DynaSent

The main objective of DynaSent is to be a evolving dataset for sentimental analysis that provides more and more challenging phrasings at each of its rounds. The Dataset creation is based on adversarial processes in which one tries to build phrasings with the objective to fool the model in predicting the wrong labeling.

Model 0 is a fine tuning of RoBERTa on sentimental analysis data set (among those Yelp, Amazon, IMDB and SST). Next, Model 0 is used to gather challenging naturally occuring sentences in the Academic Yelp Data Set (+8M reviews).

The gathering of challenging sentences occur in the following way. The Yelp dataset, besides the phrasing, has a number of stars associated to each phrasing. So the idea is to get the phrasings in this data set which Model 0 labeled as positive but has only one start associated to it. Similarly, get phrasings which Model 0 labeled as negative but it has five stars. We are using the assumption that five (one) star indicates the original intention of the user who wrote the review.

Next, these challenging phrasings passes by human scrutiny that validates the phrasings by giving them one among four labels: negative, positive, neutral or mixed. And that's how Round 1 Dataset is built.

The authors didn't stop it here and executed another adversarial round to construct Round 2 Dataset. This time, the fine tuned RoBERTa model is also trained with the challenging phrasings from Round 1 Dataset to produce Model 1. The gathering of challenging phrasings is slightly different in Round 2. Crowdworkers are asked to modify a sentence in order to fool Model 1 in predicting the wrong labeling. These modified phrasings are further validated by human annotators and Round 2 Dataset is done.