Lecture 5

Overview

We start the Supervised Sentiment Analysis unit. In this unit, we are going to touch the following points:

State-of-the-art: Is the sentiment analysis problem solved? Despite many authors in industry saying that yes, we are far from having a general confident model.
Data set exploration: Ternary representation (positive,negative,neutral)
- Stanford Sentiment Treebank (SST)
- The DynaSent
Hyperparameters and classifier comparison
Feature representation
Recurrent Neural Network Classifiers.

Readings

Core Reading

Recursive models for Semantic Compositionality Over a Sentiment Treebank:
DynaSent: A Dynamic Benchmark for Sentiment Analysis: Paper by Potts.

Good Reading

Opinion mining and sentimental analysis: Review and challenges of the field.
A primer on neural network models for natural language processing.

Why is difficult?

Do the phrasings below carry sentiments? If yes, is it positive, negative or neutral?

There was an earthquake in California.
They said it would be great.
They said it would be great, and they were right.
They said it would be great, and they were wrong.
The party fat-cats are sipping their expensive imported wines.
You're terrible!

Affective computing

Extrapolates the ternary view of positive,negative and neutral to stablish a high dimensional network of sentiments, their relations and transitions.

Specialized tasks

There is a list of them (with references) in the slides. Here are some of them

Sarcasm (Khodak 2017)
Condescension (Wang and Potts 2019)
Hate-Speech (Nobata 2016)

General Tips

Plenty of datasets in the wild!

Lexicons

It is a common practice in the models to map lexicons (group of words that are related with each other through a root. Members of the lexicon are used composed by root + inflection) with sentiments. Here are a list of references that sort of do this map.

The SocialSent reference is particular interesting because it consider other dimensions in the mapping such as: word context, category context (sport, politics), time and etc.

Bing Liu’s Opinion Lexicon:
SentiWordNet:
MPQA subjectivity lexicon:
Harvard General Inquirer
Linguistic Inquiry and Word Counts (LIWC):
Hamilton et al. (2016): SocialSen
Brysbaert et al. (2014): Norms of valence, arousal, and dominance for 13,915 English lemmas

Tokenizing

The chosen tokenizer may have an impact in the sentimental analysis performance model. It is usually domain specific. For example; in Twitter, you may want that your tokenizer preserve emoticons.

A standard tokenizer is the whitespace tokenizer. It is usually ok for English text.

Here it is a list of thing you may consider for your tokenizer to be sentiment aware:

Isolates emoticons
Respects Twitter and other domain-specific markup
Uses the underlying mark-up (e.g., \<strong> tags)
Captures those #$%ing masked curses!
Preserves capitalization where it seems meaningful
Regularizes lengthening (e.g., YAAAAAAY⇒YAAAY)
Captures significant multiword expressions (e.g., out of this world)

A good start is: nltk.tokenize.casual.TweetTokenizer

The dangers of stemming

This techinique consists into group lexicons by its root form. It is usually not a good idea for sentiment analysis (it is useful though to identify unique words in a text).

Positive	Negative	Porter Stemmed
extravagance	extravagant	extravag
affection	affectation	affect
competence	compete	compet

Part-of-Speech (POS) tagging

The idea is to tag the words with their syntactic function in the speech (noun, adjective, verb). This is useful because the same word may take different meaning depending of its syntactic function.

Word	Syntactic Function	Sense
fine	adjective	positive
fine	noun (incur a fine)	negative
hit	adjective (it's a hit)	positive
hit	verb	negative

The Harvard Inquirer (General Inquirer) is the reference for this work.

It has limits, of course. Sometimes the same word with the same syntactic function carries different meaning.

He is a mean person (bad person)
He did a mean apple pie (an excelent apple pie)
He is a serious person (positive)
He made a serious mistake (negative)

Simple negation mark

It is kind of similar to the idea above. We are going to tag all the words following a negation in some window (e.g. three words after the negation; all the words until a punctuation mark). The idea is to help the model to invert the meaning of words that are usually associated with positive sentiment whenever they are preceeded by a negation.

He is a good person
He is not a good person.

The reference work for this is: Pang 2002. They simply add a _NEG as a suffix to the words.

The sentiment-aware tokenizer + the simple negation mark score the best in the benchmarks presented in the slides.

Stanford Sentiment Treebank

Fundamental paper

Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank (Socher, 2013)

The model was trained in a fully-labeled (crowdsourced labels) from a movie review corpus with 10K sentences from Rotten Tomatoes (Pang and Lee 2005).

Each sentence in the training set is represented in a tree like the following:

....2
../...\
.2.....4
.|..../.\
NLU..2...4
.....|...|
....is...enlightening

The numbered labels represent a range of sentiments:

Very negative
Negative
Neutral
Positive
Very positive

The labeled nodes refer to the composition of their subtrees. The enlightening word by itself is classified as very positive, and the very positive label remains for the "is enlightening" sub phrasing. Notice how the label changes, however, in the complete phrasing "NLU is enlightening", which is classified as neutral.

Relation with VSM approaches

It seems that priorly to Socher's paper publication, the sentimental analysis models were based solely on scoring positive and negative words. The detected sentiment in a sentence would depend on the final score obtained by the sum of the individual word scores.

As one could expected, this model can't capture several common language structures and nuances. The clearest one being the change in meaning that may follow a negation or an adversity conjunction such as but.

The film started very boring, but the ending made it a great movie.

This film is definitely not made to smart people.

In Socher's model, one of the important features is that context is taken into account. In this sense, it is much more related with contextual word vector embeddings such as BERT.

How to solve sentimental analsyis from BERT

I believe one could use BERT to construct a Sentimental Tree. It would work as the following:

Given a sentence, we start by listing all its subphrasings. From the single word subphrasings, we compute their BERT representation and we proceed by computing its similarity with positive, neutral and negative words. We could develop any sort of method that would give us a score for that.

Next, we evaluate the BERT vectors of the two words in a two-word subphrasing. We compute once again the score for these two new vectors. Next, we compare the resulting labels with the labels given for each word in the previous step to assign the label of the composed two-word subphrasing. And we proceed like this for all the others.

Clearly this is not efficient at all. It has exponential cost and it seems that we are using the wrong tool to solve the problem. Anyway, this is much more a thought experiment than a really idea for a model.

It is insightful to think, nonetheless, that the context in BERT VSM are responsible to a shift in a vector embedding of a word. A context that contains a negation, for example, could shift the word good to be much more closer to the no-context embedding of the word bad.

\[ d\left( good(\text{negation_context}) , bad(\text{no_context}) \right) << d\left( good(\text{no_context}), bad(\text{no_context}) \right) \]

Material

Notebook: Overview of the Stanford Sentiment Treebank

This notebook present how to use some methods in the sst.py library. This library was conceived to help the students of this class to use the Stanford Sentiment Treebanck data sets. It contains helpful methods spliting the dataset in training and test sets. It also comes with variations such as 'includeSubtrees', in which we can train the model with all subphrasings and not only complete sentences.