Skip to main content
Token sampling operations for language model generation. These methods select the next token from a probability distribution, implementing various strategies to balance between diversity and quality in text generation by filtering and sampling from the model’s output probabilities. Variants:
  • Top-k sampling: Keeps only the k highest probability tokens, renormalizes the distribution, then samples. Controls diversity by limiting the vocabulary size to the most likely tokens
  • Top-p sampling: Filters tokens using cumulative probability threshold (nucleus sampling). Dynamically adjusts vocabulary size based on probability mass, maintaining diversity while avoiding low-probability tokens
  • Top-k + Top-p sampling: Combines both filtering methods for fine-grained control over generation quality and diversity
Axes (2 dimensions):
  • batch_size: variable
  • vocab_size: constant
Inputs (1 to 3 tensors):
  • probs: probability distributions after softmax [batch_size, vocab_size]
  • Sampling-specific parameters:
    • top_k: for top-k sampling [batch_size]
    • top_p: for top-p/nucleus sampling [batch_size]
Outputs (1 tensor):
  • samples: sampled token indices [batch_size]
I