Tokenizing sentences using regular expressions