Generative AI Gets Shaken Up By Newly Announced Text-Producing Diffusion LLMs

📆 3/7/2025 9:26 PM

Artificial Intelligence AI News

Generative AI, Large Language Models, Autoregression Diffusion Markov Chain

📆 3/7/2025 9:26 PM
📰 ForbesTech

⏱ Reading Time:
654 sec. here
21 min. at publisher
📊 Quality Score:
News: 296%
Publisher: 59%

Generative AI and LLMs are in a rut. The same keystone approach is used over and over. A novel approach of diffusion LLMs (dLLMs) holds great promise. Here's the scoop.

In today’s column, I explore the exciting news that an alternative method to generative AI and large language models appears to be gaining interest and potentially provides some distinct advantages to conventional approaches.

Here’s the deal in a nutshell. The usual path to devising generative AI consists of what is known as autoregressive LLMs, while the promising new avenue is referred to as diffusion LLMs . Yes, indeed, dLLMs just might be a winner-winner chicken dinner. I will share with you how prevailing generative AI works and then introduce the diffusion approach. We don’t know yet that diffusion is going to overtake autoregression for sure, but there is a darned good chance that diffusion will certainly shake things up.This analysis of an innovative AI breakthrough is part of my ongoing Forbes column coverage on the latest in AI, including identifying and explaining various impactful AI complexities . The AI uses the tokens to figure out what other tokens should be generated, doing so on a token-at-a-time basis. The step-by-step generated tokens are ultimately converted back into words when the answer or response is displayed. You can often see this happening in the sense that some of the generative AI apps will display the generated response on a word-at-a-time basis. It is almost as though someone or something is typing out the response, one word at a time. I’m not saying that this is always the case. There are other factors in play, such as the speed of your network access, the speed of the AI, etc. For more about the nitty-gritty of AI processing techniques, tokenization, and other facets that occur within conventional generative AI, see my discussion atNow that we’ve got autoregression on the table, keep that in the back of your mind since I want to bring up a different topic that will shortly tie into the autoregressive aspects. Hang in there.I’m sure you have, or at least you’ve seen or heard about this capability. The customary approach to having AI generate an image or a video is using a technique known as diffusion. I liken diffusion to how a sculptor works. It goes like this. A sculptor starts with a block of marble and carves away the bits and pieces that will help shape the marble toward the end goal in mind. They are removing whatever doesn’t belong. If the sculptor is making the shape of a person, they remove marble so that what is left will have the figure of a human. You can starkly contrast the work of a sculptor to that of a painter. A painter starts with a blank canvas. They add paint to the canvas. Step by step, they are creating on canvas the image that they want to portray. Mindfully note how the two types of artisans differ. A sculptor is taking away bits and pieces, while a painter is adding bits and pieces. Conventional generative AI acts like a painter. Words or tokens are assembled one at a time until the targeted response is fully crafted. You might say that words are being added to a blank canvas.How Diffusion Deals With Images And Video I’m betting that you are curious about the mechanics of diffusion. I certainly hope so since that’s what we will get into next. Suppose I want AI to generate an image of a cat. I first need to data-train the AI on what a cat looks like. Once the AI has been data-trained about what cats look like, you can tell the AI to produce an image or a video showcasing a cat. Get yourself mentally ready for the inside secrets of how this is done. Find a quiet place to read this and grab a glass of fine wine. Here’s what we can do to data-train AI on what cats look like. First, we find an existing picture or rendering of a cat. Next, we fog up that picture or rendering by putting a bunch of noise into the image. The cat is less recognizable now that we have clouded it with the static or noise. The AI is fed the original clean version of the picture along with the second version that has the noise. At this juncture, the AI is supposed to figure out how to best remove the noise so that the clean version can be arrived at. The AI takes away from the clouded image the aspects that don’t belong there. It is denoising the static-filled version.The interesting twist or part that I said would be a mind-bender is this. When you enter a prompt and ask diffusion-based AI to produce an image that looks like a cat, the AI first starts with a static-filled frame. It’s all static. The AI then removes as much static as necessary until the frame ends up showcasing a cat.Most people assume that AI starts with a blank canvas and tries to draw a cat. That would be how a painter would work. But the alternative method is to act like a sculptor. Start with a block of marble, or in this case a frame that’s utterly filled with static. Have the AI remove bits and pieces of static until what remains is the image of a cat.So far, I’ve mentioned that AI diffusion involves first data-training AI on how to remove static or noise until a desired image is attained. Once we’ve done that, we can use the AI to conjure up new images by feeding a static-filled frame, and the AI will carve out or remove the static until the desired image is reached.The conventional generative AI would generate a response by assembling words one at a time. The words being chosen are based on having previously scanned essays, stories, and the like about the life of Abraham Lincoln during the initial data-training of the AI. Patterns of those stories are stored within the AI. The AI taps into those patterns to produce a handy-dandy response about Honest Abe.Just like the above, we will data-train the AI on essays, stories, and the like about the life of Abraham Lincoln. There is a twist. We do so by not only scanning that content, but we take the content and add static or noise to it. The text looks quite garbled if you see it with your naked eye. Numerous letters of the alphabet have been shoved in here and there, and the words look jumbled. The diffusion takes the noisy version and tries to remove the static and get back to the original version. I trust that this seems familiar – it’s pretty much the same as what we did with the cat image. Subsequently, when someone asks the diffusion LLM to share something about the life of Abraham Lincoln, we feed the AI with a bunch of seemingly garbled text. It looks like pure nonsense to the human eye. The diffusion LLM removes the noise and transforms the block of garbled text into a sensible rendition about Abraham Lincoln.Allow me to provide a quick example that might solidify the two approaches, namely, comparing the conventional autoregressive approach versus the diffusion approach to LLMs and generative AI. I will pose a question for AI that is one of my all-time favorite questions because it’s a question that my children used to ask me when they were very young. The question is: “Why is the sky blue?” Yep, that’s a classic question that I’m guessing most parents inevitably get from their curious-minded youngsters. It’s a beauty. With any kind of generative AI, regardless of being autoregressive versus diffusion, the prompt and response might look like this:I’d like to unveil the internal mechanics of the AI to show you how that answer was generated. I am going to simplify the mechanics for the sake of brevity. Any of you trolls out there that are chagrined at the simplification, consider reading the details that I’ve covered in prior columns such as at, thanks. Technically, a diffusion approach entails a latent variable model that uses a fixed Markov chain over a considered latent space . The processing can happen in parallel and doesn’t have to proceed on a serial basis. That’s one of the benefits of diffusion versus autoregression. It is a lot harder to parallelize the autoregression. You generally are going to have autoregression generating each word, one word at a time. I’m not saying that it can’t be sped up, and I’m only saying that it somewhat goes against the normal grain.I already noted that the generated response can readily happen in parallel and thus be quite speedy. The response time to you, the user, will likely be faster. It is almost as though your response magically appears all at once, in a flash, rather than on a word-for-word processing basis. Proponents of diffusion LLMs contend that another benefit is that coherence across large portions of text is more likely than with the autoregressive approach. Here’s the deal on that claim. You might know that autoregressive has tended to struggle with handling long-range dependencies in a large body of text. Fortunately, recent advances in generative AI based on autoregression have enabled larger and larger bodies of text to be handled, and ergo, this has gradually become less of a problem . Anyway, diffusion LLMs seem to handle this with ease . Some also assert that diffusion LLMs will end up being more “creative” than autoregression-based generative AIs. Please know this is speculative. The logic for the claim is like this. With autoregression, once a generated word is chosen, by and large, the AI stays loyal to that chosen word and won’t readily back up and opt to replace it with something else .In theory, a diffusion LLM could rework a being-generated response. It’s an easy possibility. You see, I had noted that the diffusion might proceed via a series of passes. In my example, perhaps the AI landed on the word “atmosphere” and then opts in the next pass to change that to the word “troposphere.” Proponents would argue that you can adjust the diffusion toward a semblance of being more creative by allowing that kind of multi-pass alteration.A heated debate is underway about whether diffusion LLMs will be less costly, which proponents of diffusion suggest will be the case. This is a mixed bag. The initial data training is probably going to be higher in cost than a comparable autoregression approach. The cost saving is potentially during run-time or said-to-be thinking time when the AI is generating a response. If the underlying hardware allows for parallelism, it seems plausible that the generation process might be faster and less costly. The cost aspects are hard to pin down since there are so many conflicting and confounding variables that come into the picture when determining costs for any kind of generative AI. A recent big-news story was about a conventional autoregression generative AI called R1 by the vendor DeepSeek that claimed to have dramatically reduced costs when producing their particular generative AI, though not everyone believes the cost claims per se (see my coverage at Let’s consider the other side of the coin. The sky is not always blue, and we ought to recognize the chances of storms or overcast days. In other words, diffusion LLMs are not a silver bullet. Do not let the emerging elation overtake your thoughtful way of thinking. On the one hand, it is absolutely refreshing to have an alternative to conventional groupthink on generative AI. I welcome it. We need to think outside the box if we want to make substantial added progress on AI (my remarks on this point are described atConcerns about diffusion LLM include that such models seem to be less interpretable than autoregression. If you want to generate an explanation or reasoning associated with the response, right now, it is less palatable than conventional generative AI. Research is seeking to enhance that aspect. Another qualm is that diffusion LLM, which is non-deterministic as is autoregression, seems to act in an even less deterministic way. That is presumably a plus for creativity. Meanwhile, it seems to be negative when it comes to controlling the AI and ascertaining its predictability.Will this approach suffer from fewer AI hallucinations, the same amount, or more? Do existing architectures that are driven toward autoregression text-based LLMs need to be overhauled or devised anew to best accommodate diffusion LLMs? We already know that with the use of diffusion for image and video generation, there are issues dealing with potential mode collapse. The AI will sometimes generate the same output repeatedly. Will that happen in a text-based generation mode for dLLMs?Part of the recent spark for caring about diffusion LLMs was the announcement by a company called Inception Labs regarding their product Mercury Coder, which uses a diffusion LLM approach. This got some outsized headlines within the AI field due to the novelty of the underlying approach.Some heavyweights in the AI field quickly harked that the diffusion LLM approach overall is a welcome entrant into the competitive space concerning how to best devise generative AI and LLMs. I agree, as stated above.I definitely have my eye keenly focused on the advent of diffusion LLMs. That being said, they need some room to breathe, including being intensely sliced and diced. Let’s give them a hearty test run. AI researchers that I know are already venturing in this direction. I am anticipating some interesting results soon and will keep you posted. As Albert Einstein famously said about the pursuit of innovation: “To raise new questions, new possibilities, to regard old problems from a new angle, requires creative imagination and marks real advance in science.”

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Generative AI Large Language Models Autoregression Diffusion Markov Chain Latent Space Vectors Tokens Tokenization Anthropic Claude Google Gemini Meta Llama Microsof Openai Chatgpt O1 O3 GPT-4O Dllm Autoregressive Parallel Serial Inception Labs Mercury Coder AI Innovation Breakthrough Invention Discovery

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Amazon's Alexa Gets a Generative AI Makeover: A High-Risk, High-Reward BetAmazon is set to launch a significantly upgraded version of its AI-powered home assistant, Alexa, later this month. The update will leverage generative AI, aiming to transform Alexa into a more interactive and intuitive platform. This move presents both a major opportunity and a significant risk for Amazon, as the technology is still evolving and prone to inconsistencies.
Read more »

Why Doing Chain-Of-Thought Prompting In Reasoning LLMs Gums Up The WorksChain-of-thought is a vital technique in prompting generative AI. Turns out that advanced AI does this implicitly. Problems ensue. Here is the inside scoop on what to do.
Read more »

LLMs: The Ship of Theseus of AIThis article explores the dynamic nature of Large Language Models (LLMs), arguing that they are not static entities but constantly evolving systems that construct meaning in real time. It compares LLMs to the Ship of Theseus paradox, highlighting how they restructure themselves with each interaction, creating the illusion of continuity despite their lack of persistent memory.
Read more »

The Eerie Intelligence of LLMs: When AI Feels Too HumanThis article explores the unsettling feeling many people experience when encountering AI language models (LLMs) that seem too human-like in their responses. It delves into the psychological reasons behind this unease, examining how our own cognitive wiring and the uncanny valley effect contribute to the perception of LLMs as both fascinating and disturbing.
Read more »

James Bond Franchise Shaken As Amazon Gets Creative Control of MoviesJames Bond producers Barbara Broccoli and Michael G. Wilson issued a shock statement on Instagram Thursday.
Read more »

The FBI Universe Is Getting Shaken Up As ‘Most Wanted’ Gets Devastating NewsNeither of FBI&039;s two spin-offs will be returning, as FBI: Most Wanted has been cancelled along with FBI: International.
Read more »