Anshik Bansal

Passionate to learn and develop machine learning systems, built upon strong fundamentals and designed to scale. Working on solving challenging problems in unstructured data using NLP, Deep learning and PGMs

Natural Language Processing, oh I lost track - Part 1

Published Jan 19, 2019

Hi All,
Everyone who is working with technologies knows the joy and pain of rapid updates/research in the field.

It doesn’t matter if you’re developer who needs to learn new frameworks every couple of weeks, data scientist who has to know how to implement latest deep learning papers or financial manager who needs to know recent ways to manage assets better — in any case you have to read a lot and learn a lot to stay on the top.
Natural Language Processing field is on fire today — hundreds of research papers, GitHub code releases, infrastructure updates are appearing every week.

This post is a first part to the series of post i would be putting along with resources to learn all of them. My focus will mostly be on how Deep Learning is being used for making machines understand language better.

Word Vectors

Word vectors are simply vectors of numbers that represent the meaning of a word. In essence, traditional approaches to NLP, such as one-hot encodings, do not capture syntactic (structure) and semantic (meaning) relationships across collections of words and, therefore, represent language in a very naive way.But word vectors have revolutionized the way we(i mean machines) look at the words.

It's all Numbers, everything else is a mere illusion - A Data Scientist

Distributional hypothesis :- Words with similar meanings tend to occur in similar context.
Word embeddings are pre-trained by optimizing an auxiliary objective in a large unlabeled corpus, such as predicting a word based on its context.
Word vectors can capture general syntactical and semantic information.
Mikolov et al. proposed the continuous bag-of-words (CBOW) and skip-gram models to efficiently construct high-quality distributed vector representations.
WordVectors exhibiting compositionality, i.e., adding two word vectors results in a vector that is a semantic composite of the individual words, e.g., ‘man’ + ‘royal’ = ‘king’.
- https://www.youtube.com/watch?v=UqRCEmrv1gQ
- https://www.youtube.com/watch?v=uskth3b6H_A [CBOW]
- https://www.youtube.com/watch?v=Kfn0keI5TwQ [Skip - Gram]
- https://www.youtube.com/watch?v=ASn7ExxLZws [Glove]
Glove by Pennington et al. is another famous word embedding method which is essentially a “count-based” model. Here, the word co-occurrence count matrix is pre-processed by normalizing the counts and log-smoothing operation. This matrix is then factorized to get lower dimensional representations which is done by minimizing a “reconstruction loss”.
What is Difference between WordVectors and Glove?
- https://www.quora.com/How-is-GloVe-different-from-word2vec
Limitations :-
- Inability to represent phrases :-
  - http://mlexplained.com/2017/12/28/an-overview-of-sentence-embedding-methods/
- Embeddings based only on a small window of surrounding words :-
  - These embeddings cluster semantically-similar words which have opposing sentiment polarities (like good and bad). This can be problematic in a semantic similarity task
  - Sentiment specific word embedding (SSWE) :- http://www.aclweb.org/anthology/P14-1146
- Application use dependent:-
  - Using Labelled Data :- http://anthology.aclweb.org/P/P13/P13-2087.pdf
  - Negative Sampling :-
    - http://mccormickml.com/2016/04/19/word2vec-tutorial-the-skip-gram-model/
    - http://mccormickml.com/2017/01/11/word2vec-tutorial-part-2-negative-sampling/
- Polysemy :- the coexistence of many possible meanings for a word or phrase.
  - Eg:- bank would mean an institution where we store money or a bank of a river.
  - https://medium.com/@jegasingamjeyanthasingam/word-embedding-to-polysemy-embedding-17274ab98418
  - In a recent work, Upadhyay et al. provided an innovative way to address this deficit. The authors leveraged multilingual parallel data to learn multi-sense word embeddings. For example, the English word bank, when translated to French provides two different words: banc and banque representing financial and geographical meanings, respectively. Such multilingual distributional information helped them in accounting for polysemy.

I hope you would have enjoyed a comprehensive overview of different techniques and what all developments have been happening in word vectors. WordVectors is just a stepping stone but an important one

Special Thanks to :-
T Young et al.

Anshik

NLP Word vectors Machine learning Deep learning Data Science

Report

Enjoy this post? Give Anshik Bansal a like if it's helpful.

Anshik Bansal

Hi, As a Machine learning expert i am ridiculously committed to your learning and growth in this fast evolving field. I have a total experience of 3+ years in Machine Learning. I keep myself up-to-date with latest development and ...

Discover and read more posts from Anshik Bansal

get started