Word Embeddings
Quick notes
Here it is an attempt plan for tasks:
- Compute TF-IDF in english corporas: The goal is to get comfortable with corpora data and familiar with visualization tools.
- Compare word classification task between word2vec and GloVe: The goal is to get comfortable with the outputs of word2vec and GloVe models; also, to get familiar with comparison an visualization tools.
- Train GloVe vectors in some corpora: The goal is to get familiar with optimization methods in machine learning.
- Implement a machine translation model: The goal is to get familiar with attentional encoder-decoder implementation.
- Compare GloVe with CoVe: The goal is to get a grasp in how contextualized vectors improve the results upon those with static representations.
Distributional vectors
Co-occurence matrices
References
1.
n-gram models
References
- The mathematics of statistical machine translation
- A tutorial on Hidden Markov Models
- Speech and Language Processing - Chapter 3
- Large languages model in machine translation