Building a vocabulary for word embedding lookup