Efficient Guided Generation for Large Language Models: LLM Sampling and Guided Generation

United States News News

Efficient Guided Generation for Large Language Models: LLM Sampling and Guided Generation
United States Latest News,United States Headlines

Researchers propose a finite-state machine framework for text generation, offering precise control and improved performance.

Author: Brandon T. Willard, Normal Computing; R´emi Louf, Normal Computing. Table of Links Abstract and Intro LLM Sampling and Guided Generation Iterative FSM Processing and Indexing Extensions to Iterative Parsing Discussion, References and Acknowledgments 2.

LLM Sampling and Guided Generation Let St= represent a sequence of t tokens with st ∈ V, V a vocabulary, and |V|=N. The vocabularies, V, are composed of strings from a fixed alphabet and N is often on the order of 104 or larger. We define the next token st+1 as the following random variable: 2.1 Sampling sequences Let F ⊂ P , where P is the powerset operator, be subsets of multi-token strings that end with a special token EOS ∈ V. The text generation task is to draw samples from F. Several procedures have been considered to generate elements of F. Greedy decoding consists in generating tokens recursively, choosing the token with highest probability at each step. Beam search also generates tokens recursively, using a heuristic to find the mode of the distribution. More recently, SMC sampling has also been used to generate sequences . The sampling procedure is described in generality by Algorithm 1. Often called multinomial sampling, the procedure recursively generates new tokens by sampling from the categorical distribution defined above until the EOS token is found. 2.2 Guiding generation • digit samples, • strings that match the regular expression , • and strings that parse according to a specified grammar The sampling procedure with masking is a simple augmentation of Algorithm 1 and is provided in Algorithm 2. The computation of m on line 2.5 is implicitly performed over all the elements of V. Aside from computing α, this step is easily the most expensive. In the case of regular expression-guided masking–and cases more sophisticated than that–the support and, thus, m will necessarily depend on the previously sampled tokens. Guided generation of this kind is ultimately an iterative matching or parsing problem and is not directly amenable to standard approaches that require access to a complete string upfront. In some cases, partial matching or parsing can be performed from the start of the sampled sequence on each iteration, but this has a cost that grows at least linearly alongside the O cost of its application across the entire vocabulary. This leads us to the main questions of this work: how can we efficiently match or parse incomplete strings according to a regular expression or CFG and determine the masks m at each iteration of Algorithm 2? This paper is available on arxiv under CC 4.0 license. Author: Brandon T. Willard, Normal Computing; R´emi Louf, Normal Computing. Author: Author: Brandon T. Willard, Normal Computing; R´emi Louf, Normal Computing. Table of Links Abstract and Intro LLM Sampling and Guided Generation Iterative FSM Processing and Indexing Extensions to Iterative Parsing Discussion, References and Acknowledgments Abstract and Intro Abstract and Intro LLM Sampling and Guided Generation LLM Sampling and Guided Generation Iterative FSM Processing and Indexing Iterative FSM Processing and Indexing Extensions to Iterative Parsing Extensions to Iterative Parsing Discussion, References and Acknowledgments Discussion, References and Acknowledgments 2. LLM Sampling and Guided Generation Let St= represent a sequence of t tokens with st ∈ V, V a vocabulary, and |V|=N. The vocabularies, V, are composed of strings from a fixed alphabet and N is often on the order of 104 or larger. We define the next token st+1 as the following random variable: 2.1 Sampling sequences Let F ⊂ P , where P is the powerset operator, be subsets of multi-token strings that end with a special token EOS ∈ V. The text generation task is to draw samples from F. Several procedures have been considered to generate elements of F. Greedy decoding consists in generating tokens recursively, choosing the token with highest probability at each step. Beam search also generates tokens recursively, using a heuristic to find the mode of the distribution. More recently, SMC sampling has also been used to generate sequences . The sampling procedure is described in generality by Algorithm 1. Often called multinomial sampling, the procedure recursively generates new tokens by sampling from the categorical distribution defined above until the EOS token is found. 2.2 Guiding generation • digit samples, • strings that match the regular expression , • and strings that parse according to a specified grammar The sampling procedure with masking is a simple augmentation of Algorithm 1 and is provided in Algorithm 2. The computation of m on line 2.5 is implicitly performed over all the elements of V. Aside from computing α, this step is easily the most expensive. In the case of regular expression-guided masking–and cases more sophisticated than that–the support and, thus, m will necessarily depend on the previously sampled tokens. Guided generation of this kind is ultimately an iterative matching or parsing problem and is not directly amenable to standard approaches that require access to a complete string upfront. In some cases, partial matching or parsing can be performed from the start of the sampled sequence on each iteration, but this has a cost that grows at least linearly alongside the O cost of its application across the entire vocabulary. This leads us to the main questions of this work: how can we efficiently match or parse incomplete strings according to a regular expression or CFG and determine the masks m at each iteration of Algorithm 2? This paper is available on arxiv under CC 4.0 license. This paper is available on arxiv under CC 4.0 license. available on arxiv

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

hackernoon /  🏆 532. in US

 

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Preparing Complex Datasets for Amazon's Recommender System StudyPreparing Complex Datasets for Amazon's Recommender System StudyLearn about data engineering strategies and efficient computation techniques for large-scale data processing.
Read more »

A large percentage of first-generation students have been impacted by FAFSA challengesA large percentage of first-generation students have been impacted by FAFSA challengesFAFSA issues have caused minority students to delay attending college and have led to severe stress and anxiety for other prospective students.
Read more »

Bentley Continental GT V8 PHEV prototype reviewBentley Continental GT V8 PHEV prototype reviewA new-generation Bentley Continental GT gets a new-generation engine with mighty plug-in assistance
Read more »

What to stream: Embark on guided tour of world cinema with MubiWhat to stream: Embark on guided tour of world cinema with MubiMubi, which is now also a publisher of criticism and commentary (see: Mubi Notebook) and film distributor, fashions themselves as a catch-all destination for film lovers, with an emphasis on curati…
Read more »

Boeing wins massive US contract to turn ‘dumb’ bombs into guided weaponsBoeing wins massive US contract to turn ‘dumb’ bombs into guided weaponsBoeing will build JDAM tail kits, spares, repairs, technical and Laser Joint Direct Attack Munition sensor kits as part of the contract.
Read more »

Efficient Guided Generation for Large Language Models: Abstract and IntroEfficient Guided Generation for Large Language Models: Abstract and IntroResearchers propose a finite-state machine framework for text generation, offering precise control and improved performance.
Read more »



Render Time: 2026-05-06 02:02:30