Saibo Geng
Saibo Geng
Home
Posts
Talks
CV
Light
Dark
Automatic
Posts
Constrained Decoding with Arbitrary Constraints is NP-hard
Constrained decoding is getting more and more attention in the field of large language models (LLMs). It aims to generate sequences of tokens that satisfy certain constraints. A typical example is to force the generation from LLM to satisfy a given JSON schema so that the generated JSON data can be used directly in a downstream application, such as tool use.
Aug 25, 2024
5 min read
My Reading List
Here are some books and blogs I’ve read or plan to read. I’ll update this list from time to time. Books I’ve read these books: Chip War: The Fight for the World’s Most Critical Technology, by Chris Miller, 2022 The PhD Grind: A Ph.
Aug 16, 2024
1 min read
Reading Notes of Learning Bayesian Networks(Neapolitan)
Basics A few quick review of basic probability theory Chapter 1: Introduction to Bayesian Networks Three definition of probability Graph Description of Random Variables No One-to-One Correspondence between Graph and Probability Distribution Graph D-separation Bayesian networks looks like a game of conditional independence and dependence
Jul 26, 2024
9 min read
Constrained Decoding is Posterior Inference
If you are familiar with constrained decoding, you might have occasionally heard people mentioning the word “posterior inference” in the context of constrained decoding. In this post, I will explain why constrained decoding can(should) be seen as a form of posterior inference.
Jul 19, 2024
4 min read
Speculative Sampling Explained
Speculative Sampling The idea of speculative sampling is to use a draft sampling to achieve the same sampling result as the target sampling. We have a target sampling distribution $p(x)$ and a draft sampling distribution $q(x)$.
Mar 8, 2024
3 min read
Understanding Cloud GPU Pricing: A Hotel Analogy
🌟 Understanding Cloud GPU Pricing: A Hotel Analogy 🏨 Hey there, AI and ML enthusiasts! 🤖 Have you ever wondered about the cost of cloud GPU resources for your projects? You’re not alone!
Nov 1, 2023
3 min read
Two Definitions of Perplexity
Perplexity Definition 1 Given a sequence $X = x_1, x_2, \dots, x_T$ of words and a language model $p$, the perplexity of the sequence can is defined as follows: $$ \text{PPL}(X) = p(x_1, x_2, \dots, x_T)^{-\frac{1}{T}} $$
Aug 3, 2023
3 min read
Probabilistic Interpretation of WordPiece
WordPiece WordPiece tokenization is a subword tokenization algorithm. It is used in BERT, which is a work of 2018. Example: from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained("bert-base-uncased") tokenizer.tokenize("I have a new GPU!
Apr 3, 2023
6 min read
Cite
×