Constrained decoding is getting more and more attention in the field of large language models (LLMs). It aims to generate sequences of tokens that satisfy certain constraints.
A typical example is to force the generation from LLM to satisfy a given JSON schema so that the generated JSON data can be used directly in a downstream application, such as tool use.