← AI Notes

Tokenization

31 Oct 2024

🚧 Work in progress…

This article will cover tokenization techniques used in Large Language Models.

Topics to cover:

  • What is tokenization and why it matters
  • Word-level vs subword-level tokenization
  • Byte Pair Encoding (BPE)
  • WordPiece
  • SentencePiece
  • Tokenization in modern LLMs (GPT, BERT, etc.)
  • Impact on model performance and multilingual capabilities
← AI Notes