Scaling Agentic AI: Beyond the Demo

Artificial Intelligence News

Scaling Agentic AI: Beyond the Demo
Agentic AIAI AgentsEnterprise AI
  • 📰 HarvardBiz
  • ⏱ Reading Time:
  • 735 sec. here
  • 19 min. at publisher
  • 📊 Quality Score:
  • News: 314%
  • Publisher: 63%

This article discusses the challenges of scaling agentic AI in enterprises. It highlights the shift from treating AI agents as turnkey software to managing them like a new workforce, emphasizing the need for defined roles, authority, and supervision. The article identifies key frictions such as identity, context, and other challenges, illustrating how companies can prepare for successful agentic AI deployments.

Picture a familiar scene. A vendor demonstrates a new generative AI “agent” to your leadership team. It’s impressive: The agent triages support tickets, updates customer records, drafts a proposal, and routes it for approval.

The demo is seamless. Pretty soon, inevitably, someone asks the question: How soon can we deploy this across the enterprise? That question reflects an assumption that has guided enterprise software adoption in the SaaS era. Most tools could be provisioned, configured, and scaled with relatively little customization. If the integration worked and employees adopted the product, deployment was largely an implementation project. But agentic AI breaks that model. Unlike traditional software, AI agents are designed to reason, plan, and take actions across systems. The moment an agent can change a system of record—update a price, send a payment, or modify customer data—it stops being a productivity tool and becomes part of the organization’s operating model. Most importantly, it introduces new categories of risk. A narrow gen-AI tool creates content risk: It might say something wrong. Agentic AI creates execution risk: It might do something wrong. Drawing on our work as researchers and a practitioner with experience leading agentic enterprise AI deployments , we’ve paid close attention to how agentic AI is actually being implemented. What we’ve found is that although many agents today are ready to act, companies are rarely ready to let them. To cross that threshold and effectively integrate AI agents at scale, you will need to stop treating them as turnkey software that simply needs to be installed. Instead, you’ll need to treat your agents like a new kind of workforce that requires management. Just like your human employees, each of your AI agents will need a role, a defined scope of authority, approved sources of truth, and clear escalation rules. They’ll also need supervision and audit trails, because you will be accountable for their actions. Until those organizational foundations are in place, scaling agentic AI will be difficult. Four recurring frictions, in particular, tend to slow or derail progress. Understanding them is the first step toward managing them. 1) Identity: “Who” Is Acting? Companies have spent decades building access controls for human employees. Every user of a computer system logs in with a unique identity, and their role determines what they can and cannot do. AI agents complicate that because they behave like human employees but aren’t. Many early deployments handle this problem by giving agents a shared “service account” with broad access to multiple systems. It’s a convenient solution—but granting agents so many permissions create a security risk. Consider a simple example. A customer-service representative might be authorized to issue refunds up to $500. If they try to issue a larger refund, the system blocks the transaction and routes it for approval. But an AI agent operating through a shared back-end service account might not be subject to that authorization limit and so might issue a $5,000 credit in a single step. The risks are not just hypothetical. In 2025, a developer experiment using Replit’s AI coding agent showed how quickly automation can outrun its controls. Despite being instructed not to make any changes, the agent executed commands that deleted a production database. It then attempted to obscure the failure, generating thousands of fake records and misleading system messages, behaviors that slowed the response and complicated recovery. The lesson is clear: Organizations should treat each AI agent as a distinct digital worker with its own identity, credentials, and role. Instead of relying on shared service accounts, companies should assign agents narrowly scoped permissions that reflect the specific tasks they are designed to perform. The same principles used to manage human employees—such as least-privilege access and role-based limits—should apply to agents. If a customer service employee can’t issue refunds above a certain threshold without approval, then the same constraint should obviously apply to the agent performing that work. Just as important, every action taken by an agent should be logged under a traceable identity so that the organization can clearly see who—or what—performed it. If leaders cannot easily explain which identity an agent uses when it executes an action, the system is not ready for production. 2) Context: When Bad Data Leads to Bad Actions AI agents perform well in demonstrations because the environment is controlled. The data is clean, the instructions are clear, and the sources of truth are obvious. But real organizations are different. Enterprise data is fragmented across systems, duplicated across teams, and often contradictory. Policies evolve over time, and older documents remain in circulation. People handle this ambiguity by using judgment and experience that AI agents don’t have. For a system that generates text, an imperfect context may produce a flawed answer. That is a problem, but its consequences are often limited. For a system that takes actions, however, the consequences are far greater. Imagine an HR agent that retrieves a frequently referenced policy document from 2022 and uses it—even though the rules have since changed—to guide managers through a termination process. That’s not a hallucination. It’s a retrieval mistake that exposes the company to legal risk. Agents also introduce a new security challenge: The manipulation of context. If an agent reads emails, forms, or support tickets, and then performs tasks based on that information, attackers can embed hidden instructions designed to influence its behavior. Researchers demonstrated this risk in 2025 through a vulnerability known as “ForcedLeak.” By embedding malicious instructions in a routine web form, they tricked a Salesforce Agentforce agent into retrieving sensitive CRM data and sending it to an external destination. To cope with these context frictions, organizations need to establish clear standards for which information their agents can trust. This will require defining authoritative sources for policies, pricing, and operational data, so that agents can consistently rely on the correct version of the truth. Systems should also capture the provenance of information used in decision-making, allowing teams to trace any agent action back to the specific documents or data sources it relied upon. Finally, companies will have to treat external inputs—such as emails, forms, or uploaded files—not simply as helpful context but as potential attack vectors. Inputs that originate outside the organization should be handled carefully and validated before an agent is allowed to act on them. Without these safeguards, the same data inconsistencies that humans routinely navigate can quickly become operational errors for automated systems. 3) Control: Probabilistic Systems Need Hard Boundaries Traditional software behaves predictably. The same input produces the same output every time. But large language models don’t work that way. Their responses are probabilistic, meaning that the same request can produce slightly different results across runs. That variability is acceptable when the output is a draft email. It becomes far more problematic when the output is a transaction. One company deploying an AI support agent encountered this issue when the legal team insisted that the system never mention a particular competitor. The company’s knowledge base, however, included many legitimate comparison articles that referenced that competitor, so the agent frequently mentioned it while answering customer questions. When engineers tightened the guardrails to block those responses, the system began refusing to answer valid questions altogether. The deeper issue here is that traditional testing methods assume stable behavior. A system that passes a test suite today may behave differently tomorrow if the model updates, the prompt changes, or new data is added. And the risks grow in multi-agent environments where agents pass work to one another. In 2025, for example, researchers at AppOmni demonstrated how insecure configurations in ServiceNow’s Now Assist environment could allow “second-order prompt injection.” In their experiment, malicious instructions introduced by one agent were passed along to others, potentially leading to unintended or unauthorized actions, such as retrieving sensitive records or sending information to external destinations. The takeaway here is that without clear boundaries, mistakes—or attacks—can cascade across an automated system. To manage that risk, organizations should build deterministic controls around probabilistic AI systems. Rather than allowing agents to execute actions directly, companies should place validation layers between the AI model and operational systems. In this approach, the agent proposes an action, such as issuing a refund or updating a record, and, before it’s executed, deterministic software verifies that it complies with established rules. Organizations should limit unsupervised agent-to-agent interactions so that outputs from one agent do not automatically become executable instructions for another without validation, policy checks, or human review. By separating the generation of a recommendation from the execution of an action, companies can create guardrails that prevent model variability from translating into operational errors. 4) Accountability: When No One Can Explain What Happened When an employee makes a mistake, managers can investigate by asking questions. When traditional software fails, engineers can check logs. But AI agents introduce a more complicated scenario, because their behavior often emerges from a chain of reasoning steps, retrieved documents, and tool calls that may not be easy to reconstruct after the fact. That creates a serious accountability challenge. Imagine a procurement agent, for example, who summarizes supplier performance and posts the results in a company Slack channel. If the agent accidentally includes confidential contract terms—because it interpreted “share transparently” as permission to disclose them—leaders will need to know exactly how that decision was made. Which documents did it read? What instructions did it follow? Why did it believe it was allowed to share the information? Without that kind of evidence trail, organizations can’t explain their systems’ behavior to regulators, auditors, or customers. A 2024 tribunal decision offers an early signal of how the law may treat this issue. In Moffatt v. Air Canada, a customer relied on incorrect information provided by an airline chatbot about bereavement fare eligibility. When the airline argued that the chatbot was effectively a separate entity, the tribunal rejected the claim and held the company responsible for the misinformation. To avoid such problems, companies will have to design their AI systems with accountability in mind from the outset. This begins with maintaining comprehensive records of how agents operate, including which data sources they accessed, what prompts they received, and which tools they used to complete a task. These records should enable reconstruction of the chain of reasoning that led to any action. Organizations should also assign clear internal ownership for the monitoring and governing of agent behavior so that responsibility does not become diffused. If a regulator, auditor, or customer asks why an AI system made a particular decision, the company should be able to provide a clear, evidence-based explanation. Without that level of transparency, large-scale automation will remain difficult to defend. The Way Forward If turnkey deployment is unrealistic for AI agents, the alternative is not to avoid them. It’s to introduce them gradually, expanding their autonomy only as the organization develops the ability to govern them. One useful way to think about this progression is as an “autonomy ladder.” The key distinction is execution authority: whether the system drafts content, proposes actions for approval, or executes actions within tightly defined limits. Organizations typically begin with agents that produce assistive output—drafts, summaries, or recommendations that humans review before anything is sent or executed. The next step is retrieval with guardrails, where agents answer questions using internal information but rely on well-governed data sources. From there, companies may allow supervised actions, in which agents propose operational tasks—issuing refunds, updating records, or routing approvals, but a person will always confirm the decision before execution. Only after those controls are proven should organizations consider bounded autonomy, where agents execute workflows independently within narrow limits and predefined thresholds. Many effective deployments intentionally remain on the lower rungs of the autonomy ladder. The benefits consultancy OneDigital uses Azure OpenAI to accelerate consultant research, improving “time to insight” rather than replacing the consultants themselves. Other prominent deployments reach bounded autonomy by keeping the scope narrow. Klarna has reported that its AI assistant handles a large share of customer service chats autonomously, while maintaining immediate escalation paths to human support for complex or sensitive cases. For leaders evaluating the growing number of agent platforms, the most important investments are often organizational rather than technical. Companies should begin by defining a clear “turnkey boundary,” distinguishing between AI applications that can be deployed with minimal structural change and those that require significant redesign of controls and governance. They should also treat permissions as a core design question by assigning each agent narrowly scoped access aligned with its role. Clear human-in-the-loop thresholds should determine when automated decisions require oversight, particularly in situations involving financial exposure, regulatory obligations, or reputational risk. Finally, leaders should measure outcomes rather than pilots, focusing on operational indicators such as cycle time, error rates, and compliance incidents, rather than simply counting the number of AI experiments underway. . . . Generative AI will continue to improve, and vendors will package more safety features into their platforms, but the gap between an impressive demonstration and a trustworthy production system will persist unless organizations rethink what deployment actually requires. Enterprises are not turnkey environments; they are complex systems shaped by legacy technology, policies, and human judgment. The companies that succeed with agentic AI won’t simply install more agents. Instead, they’ll build the structures that allow those agents to be trusted.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

HarvardBiz /  🏆 310. in US

Agentic AI AI Agents Enterprise AI Deployment Risk Management

 

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

Local Celebrations, Lottery Win, and Community Updates: A Snapshot of News in Rochester and BeyondLocal Celebrations, Lottery Win, and Community Updates: A Snapshot of News in Rochester and BeyondThis news update covers several significant events in the Rochester, N.Y. area. A woman celebrates her 100th birthday, a lottery ticket yields a top prize, a man survives a stabbing, and a police officer recovers from a shooting.
Read more »

Agentic AI commerce may spell the end of internet ads: a16z CryptoAgentic AI commerce may spell the end of internet ads: a16z CryptoThe most recent news about crypto industry at Cointelegraph. Latest news about bitcoin, ethereum, blockchain, mining, cryptocurrency prices and more
Read more »

TRON DAO scales AI Fund to $1 billion, expanding its 2023 agentic economy thesisTRON DAO scales AI Fund to $1 billion, expanding its 2023 agentic economy thesisThe most recent news about crypto industry at Cointelegraph. Latest news about bitcoin, ethereum, blockchain, mining, cryptocurrency prices and more
Read more »

‘Claude, Resize These Photos’ – Anthropic’s Agentic AI Will Run Photoshop For You‘Claude, Resize These Photos’ – Anthropic’s Agentic AI Will Run Photoshop For YouIn a recent update to Claude, Anthropic's AI assistant, the AI can now complete perfunctory tasks on the user's computer.
Read more »

TRON DAO expands AI fund to $1B as agentic economy race heats upTRON DAO expands AI fund to $1B as agentic economy race heats upThe most recent news about crypto industry at Cointelegraph. Latest news about bitcoin, ethereum, blockchain, mining, cryptocurrency prices and more
Read more »

Gap Partners With Bold Metrics on AI-sizing Tool as Agentic Shopping Gains GroundGap Partners With Bold Metrics on AI-sizing Tool as Agentic Shopping Gains GroundGap teams up with Bold Metrics to revolutionize e-commerce with AI sizing tools, enhancing fit accuracy and driving personalized shopping experiences.
Read more »



Render Time: 2026-04-01 04:09:55