Tokenization

Applies to: python, general

Tokenization splits text into units a model can process, such as words, characters, or subword tokens.

tokens = text.split()

See also: embedding, transformer