From text to tokens – the NLP pipeline