Abstract

Large Language Models (LLMs) are transformer models that generate text autoregressively in two stages: prefill and decode. The prefill stage processes all input tokens in parallel and caches them into the KV cache. The decode stage uses the KV cache, processes a single token at a time, and is generally memory-bandwidth constrained. Each decode step generates a token probability distribution, samples from it, appends the token to the cache, and repeats. This left-to-right, one-at-a-time generation paradigm has proven effective for language modeling in many domains; however, it also has limitations such as token permanence, constant compute per token generation, constant space per token generation, and a lack of a lookahead mechanism. Methods such as Chain-of-Thought (CoT) prompting, reasoning with reinforcement learning (RL), and others have improved performance through explicitly generating planning representations; however, they still lack an inherent lookahead mechanism. This thesis proposes the Lookahead Transformer, a novel architecture that introduces an explicit lookahead mechanism enabling autoregressive models to attend to and iteratively refine multiple future latent token representations during generation. The model utilizes i future-positionally encoded lookahead tokens, Ψ, refined over N recurrent steps, providing a bidirectional latent planning space that can be efficiently reused across generation steps. Experimental results show that the Lookahead Transformer can outperform a comparable baseline on language modeling tasks with a test-time control mechanism for scaling active tokens to improve performance. The Lookahead Transformer represents a step toward more flexible and efficient autoregressive transformer models that can “lookahead” to improve performance and better utilize compute during inference.

Degree

MS

College and Department

Computational, Mathematical, and Physical Sciences; Computer Science

Rights

https://lib.byu.edu/about/copyright/

Date Submitted

2026-04-21

Document Type

Thesis

Keywords

computer science, transformer, language modeling, planning

Language

english

Share

COinS