Natural Language Processing Seminar 2019–2020
The NLP Seminar is organised by the Linguistic Engineering Group at the Institute of Computer Science, Polish Academy of Sciences (ICS PAS). It takes place on (some) Mondays, normally at 10:15 am, in the seminar room of the ICS PAS (ul. Jana Kazimierza 5, Warszawa). All recorded talks are available on YouTube. |
Please see also the talks given in 2000–2015 and 2015–2019. |
2 April 2020
Stan Matwin (Dalhousie University)
Efficient training of word embeddings with a focus on negative examples

This presentation is based on our AAAI 2018 and AAAI 2019 papers on English word embeddings. In particular, we examine the notion of “negative examples”, the unobserved or insignificant word-context co-occurrences, in spectral methods. we provide a new formulation for the word embedding problem by proposing a new intuitive objective function that perfectly justifies the use of negative examples. With the goal of efficient learning of embeddings, we propose a kernel similarity measure for the latent space that can effectively calculate the similarities in high dimensions. Moreover, we propose an approximate alternative to our algorithm using a modified Vantage Point tree and reduce the computational complexity of the algorithm with respect to the number of words in the vocabulary. We have trained various word embedding algorithms on articles of Wikipedia with 2.3 billion tokens and show that our method outperforms the state-of-the-art in most word similarity tasks by a good margin. We will round up our discussion with some general thought s about the use of embeddings in modern NLP.