The Inference Difference: Why Clunky Data Engineering Unhinges AI

📆 8/1/2025 4:53 AM

Data Engineering News

Data Storage, Data Science

📆 8/1/2025 4:53 AM
📰 ForbesTech

⏱ Reading Time:
408 sec. here
10 min. at publisher
📊 Quality Score:
News: 174%
Publisher: 59%

Firms can’t benefit from agentic AI without the inference process being able to draw on fluid, well-governed and permissioned data supply.

I track enterprise software application development & data management.Digital technology, software development concept. Coding programmer, software engineer working on laptop with circuit board and javascript on virtual screen, internet of things IoTAI has a shiny front end.

As everyone who’s used an artificial intelligence service knows, we can now ask for pictures of ourselves standing on top of the Rocky Mountains, we can have ourselves superimposed onto a classic Beatles album cover… and we can ask for a text-based interpretation and summary of 147-page document to be delivered in 147 words. As impressive as the end results are, rather like any greasy engine room, it’s the work that goes on at the backend that really forms the intelligence quotient being experienced by the user. As we start to push agentic AI services into areas beyond summarizing notes and putting us on mountains, its application in mission-critical application scenarios means it will need to reason through inference, adapt and act in real-time.“We know that enterprises today want to be able to train AI models for specific use cases, but they can’t truly benefit from agentic AI without the inference process being able to draw on fluid, well-governed and permissioned data supply. It’s a question of inference engines being able to execute the crucial process of pointing trained models to brand new data to draw conclusions from… and right now, inference is hitting a bottleneck,” said Stuart Abbott, managing director for UK and Ireland at Vast Data. Known for its work in core data management capabilities and deep learning computing infrastructures, Abbott and the Vast team suggest that “few organizations have the architecture in place” to support the AI inference requirements of tomorrow at speed, scale and with a commensurate degree of trust.is what most agentic AI systems are built on, it is the “retrieval” part of RAG that represents the problem beneath the surface. More acutely informed that the generalistic world of large language models and their small model counterpart systems, RAG enables enterprises to draw upon additional external, internal or domain-specific information sources in the pursuit of a smarter and more sharp-angled end result.That’s a lot of extra data, so for practicality’s sake, rather than storing all knowledge within the model itself, these systems fetch data from enterprise sources as needed. Each request depends on retrieving relevant context at the time of the prompt, which places increased pressure on the data infrastructure. “This makes perfect sense from a design point of view. It’s what allows agents to be flexible, current and scalable. But it also introduces a huge dependency on the underlying infrastructure,” argued Abbott, speaking at a data symposium this week. “Imagine asking an AI service to summarize a contract, but the system has to search a document store, index it, apply permissions and return results… all before it can start writing. Now multiply that by every task an agent does across a business.”Where the “request-response loop” that AI makes upon the various data stores and repositories that serve it takes far longer than expected, the AI fails to deliver. The illusion of real-time AI fades when agents are held up by clunky storage, outdated indexes, or disconnected access policies. Abbott says it’s a question of latency in data service, ultimately becoming latency in judgment. “The inference layer is often seen as super fast, especially with the rise of graphical processing units and model acceleration techniques. But the real bottleneck is often elsewhere in the total IT stack,” said Abbott. “Inference isn’t just about how fast any given AI model runs, it’s about how quickly it can get the right, permissioned data to the model and then get an answer back.” According to Vast, even with state-of-the-art models, traditional infrastructure can result in time-to-first-token delays of up to 11 seconds. But by adopting persistent key-value caching such as through technologies such as vLLM, LMCache and NvidiaGPU Direct Storage , Abbott suggests that enterprises can cut this delay down to 1.5 seconds, sometimes less. All very technical, yes. Although you might need to be a senior data science engineer to instantiate and operate those functions, anyone can understand the aligment point of this technology i.e. if we are able to smarter about how data is stored, spliced, sorted , sieved and served at the backend, then we can make it more useful at the front-end as it moves into its role in AI.“This is not simply a performance tweak. For agentic systems that revisit the same source material, caching reduces repeated prefill processes and helps deliver faster, more responsive interactions,” explained Vast’s Abbott. “Pair this with continuous batching, chunked prefill and disaggregated decode and the gap between enterprise data and ‘agentic thought’ starts to close. But only if infrastructure allows for it.” Today, many organizations still rely on traditional extract, transform and load data pipelines, separate vector databases and batch-based access controls. The Vast team argue that these systems weren’t designed for agentic workloads; they weren’t built for real-time semantic search, for identity-bound access or for multi-turn inference at scale. “The smartest AI model in the world can’t help you if it’s stuck waiting on glued-up code and batch data processing jobs,” Abbott noted. “That’s why enterprises are starting to re-evaluate how and where inference happens. The move is toward AI-native infrastructure. A combination of storage and software platforms where data, permissions, compute, and AI workflows live together, not in silos. The future of enterprise AI isn’t just about smarter agents. If we’re asking agents to help guide decisions, they can’t be working from memory… they need to see what we see, when we see it, securely."Vast Data makes much of its approach to building its unified architecture, which coalesces storage intelligence into a centrally managed space to deliver on deep learning computing infrastructures. But Vast is not alone in this market, Pure Storage is never far behind when enterprise organziations or any size line up a beauty parade of potential vendors to engage with. Also eating from the same table is NetApp, with its heritage in storage intelligence offering, the NetApp ONTAP Data Platform for AI goes very much head-to-head in this market. HPE has an Nvidia partnership hinged around providing data services for AI, Dell Technologies has its PowerScale, which eats up unstructured data for breakfast… and then there are the cloud hyperscalers. All of that said, DataDirect Networks, IBM Storage Scale, Weka and so on are all now positioning themselves as AI-first and AI-friendly i.e. it’s the default message that the entire technology industry obviously has to pay lip service to and resonate. With cloud-native machine learning platforms at the heart of some of their most progressive deployments, Microsoft Azure, Google Cloud and Amazon Web Services all represent obvious options for businesses looking to sharpen up their AI data backbones. Vast execs would probably prefer to be compared to the cloud hyperscalers as opposed to the above-noted storage specialists, but that’s an argument to chew out in a well-stocked bar, or at least a technology conference break-out session. What might matter most in terms of competitive analysis in this field is whether, under the hood, any given data storage specialist still essentially remains as a file system, with performance enhancements, rather than being a platform aimed at integrating full AI workflows. Vast aims to set itself apart from that tier of data engineering by saying that its approach with its Insight Engine service is to build a native vector capability and structured data layer with additional capabilities designed to enable policy-aware, real-time inference inside the storage layer.There’s an additional challenge around data sovereignty to bear in mind here. Agentic systems will need to enforce permissions dynamically, explain how and why they reached decisions and prove that data access was compliant every single time. Abbott’s parting words on this subject were that this calls for more than just AI enthusiasm; it calls for AI infrastructure maturity.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

Data Storage Data Science

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Why people boo Santa Clara County Assessor Larry Stone — and why he loves itThe longtime elected official — who is stepping down this week — says he decided early on to turn the tables on the negative reaction to the tax assessor’s office.
Read more »

Video: Why prosecutors didn't compel Bryan Kohberger to explain why he murdered four Idaho studentsFormer King County Prosecutor Scott O'Toole joined The John Curley Show on KIRO Newsradio 97.3 FM to talk about the psychology and difficulty of prosecuting dea
Read more »

Kioxia AiSAQ Improves AI Inference With Lower DRAM CostsKioxia’s AiSAQ allows more scalable RAG AI inference systems by moving database vectors entirely into storage, thus avoiding DRAM growth with increasing database sizes.
Read more »

Why doctors don’t talk about shame—and why they need to start.Personal Perspective: One patient changed everything for me. This is how shame quietly thrives in medicine and why naming it may be the first step toward healing.
Read more »

Why do we fear difference, act irrationally, and choose aggression over peace?Human conflict arises from traits such as tribalism, aggression, and irrationality. To mitigate conflict, we must reframe our cultural systems to temper our primal instincts.
Read more »

Why Arsenal Have Signed Noni Madueke—and Why Doubting Fans Should Be ExcitedTrying to make sense of Arsenal’s move for Chelsea winger Noni Madueke, and why Gooners shouldn’t be so down on the move.
Read more »