Skip to content

Word Embeddings

Quick notes

Here it is an attempt plan for tasks:

  1. Compute TF-IDF in english corporas: The goal is to get comfortable with corpora data and familiar with visualization tools.
  2. Compare word classification task between word2vec and GloVe: The goal is to get comfortable with the outputs of word2vec and GloVe models; also, to get familiar with comparison an visualization tools.
  3. Train GloVe vectors in some corpora: The goal is to get familiar with optimization methods in machine learning.
  4. Implement a machine translation model: The goal is to get familiar with attentional encoder-decoder implementation.
  5. Compare GloVe with CoVe: The goal is to get a grasp in how contextualized vectors improve the results upon those with static representations.

Distributional vectors

Co-occurence matrices

References

1.

n-gram models

References

  1. The mathematics of statistical machine translation
  2. A tutorial on Hidden Markov Models
  3. Speech and Language Processing - Chapter 3
  4. Large languages model in machine translation

word2vec

References

  1. Efficient estimation of word representations in vector spaces (2013)

GloVe

References

  1. GloVe: Global Vectors for Word Representation (2014)

CoVe

References

  1. Learned in Translation: Contextualized word vectors (2017)

ELMo

References

  1. Deep contextualized word representations (2018)

BERT

References

  1. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding (2019)