Training language models to predict multiple tokens at once results in better sample efficiency, says researchers at Meta. Large language models like Llama and ChatGPT are usually trained for the next token prediction, but with this new approach, better performance can be achieved. What is single token prediction technique? The multi-token prediction technique provides a