Posts

Constrained Decoding is Posterior Inference

If you are familiar with constrained decoding, you might have occasionally heard people mentioning the word “posterior inference” in the context of constrained decoding. In this post, I will explain why constrained decoding can(should) be seen as a form of posterior inference.

Jul 19, 2024 4 min read

Speculative Sampling Explained

Speculative Sampling The idea of speculative sampling is to use a draft sampling to achieve the same sampling result as the target sampling. We have a target sampling distribution $p(x)$ and a draft sampling distribution $q(x)$.

Mar 8, 2024 3 min read

Two Definitions of Perplexity

Perplexity Definition 1 Given a sequence $X = x_1, x_2, \dots, x_T$ of words and a language model $p$, the perplexity of the sequence can is defined as follows: $$ \text{PPL}(X) = p(x_1, x_2, \dots, x_T)^{-\frac{1}{T}} $$

Aug 3, 2023 3 min read

Probabilistic Interpretation of WordPiece

WordPiece WordPiece tokenization is a subword tokenization algorithm. It is used in BERT, which is a work of 2018. Example: from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") tokenizer.tokenize("I have a new GPU!

Apr 3, 2023 6 min read