Tokenization
31 Oct 2024
🚧 Work in progress…
This article will cover tokenization techniques used in Large Language Models.
Topics to cover:
- What is tokenization and why it matters
- Word-level vs subword-level tokenization
- Byte Pair Encoding (BPE)
- WordPiece
- SentencePiece
- Tokenization in modern LLMs (GPT, BERT, etc.)
- Impact on model performance and multilingual capabilities