Microsoft’s new AI helps robots decide what to do and exactly where to act

📆 3/26/2026 2:26 PM

AI News

AI Robots, Artificial Intelligence, Automation

📆 3/26/2026 2:26 PM
📰 IntEngineering

⏱ Reading Time:
177 sec. here
15 min. at publisher
📊 Quality Score:
News: 112%
Publisher: 63%

Microsoft unveils AI that helps robots decide both what to do and where to act, reducing errors in complex tasks.

Microsoft, along with a consortium of academic researchers, has built a new benchmark called GroundedPlanBench to tackle a persistent problem in robotics, as robots still struggle to decide what to do and where to do it at the same time.

Most current systems split these decisions into two steps. A vision-language model first creates a plan in natural language. Then another model turns that plan into actions. This split often leads to mistakes.The issue shows up in simple tasks. A robot told to discard paper cups may confuse which cup to pick or even invent steps that were never asked for. In cluttered environments, these errors become more frequent.This happens because planning and spatial reasoning are handled separately, allowing errors in one stage to affect the next.Planning meets spatial groundingTo tackle this, the team developed GroundedPlanBench to test whether AI models can plan tasks while also identifying exactly where each action should happen.Instead of relying only on text, each action is tied to a specific location in an image. Basic actions like grasp, place, open, and close are linked to objects or positions, forcing the system to connect decisions with the physical world.The benchmark includes more than 1,000 tasks built from real robot interactions. Some instructions are direct, such as placing a spoon on a plate. Others are more open-ended, like tidying a table.This mix is important because robots often fail when instructions are vague. Language that humans easily understand can be too ambiguous for machines, especially when multiple objects look similar.In one example, a system was asked to put four napkins on a couch. It repeatedly chose the same napkin because the description did not clearly distinguish between them. Even more detailed phrases like “top-left napkin” were not precise enough for reliable execution.The researchers note that “ambiguous language leads to non-executable actions,” highlighting a core limitation in current systems.Learning from real tasksTo improve performance, the team also developed a training method called Video-to-Spatially Grounded Planning, or V2GP.This system learns from videos of robots performing tasks. It detects when a robot interacts with objects, identifies those objects, and tracks their positions. The result is a structured plan that links every action to a specific location.Using this approach, the team generated more than 40,000 grounded plans. These range from simple one-step actions to longer sequences involving up to 26 steps.When models were trained on this data, their performance improved. They were better at choosing correct actions and linking them to the right objects. The system also reduced repeated mistakes, such as acting on the same item multiple times.Still, challenges remain. Long and complex tasks are difficult, especially when instructions are indirect. The researchers said, “Models must reason over longer sequences of actions and maintain consistency across many steps.”The study also compared this approach with traditional systems that separate planning and grounding. Those systems struggled with ambiguity, often mapping multiple actions to the same object or location.By combining both steps into a single process, the new approach reduces this mismatch. It keeps decisions about actions and locations tightly connected.The team suggests that future work could combine this method with predictive models that estimate the outcome of actions before they happen. This could help robots avoid mistakes in real time.For now, the findings point to a clear direction for robotics. Systems that understand both actions and locations together are more likely to work in real-world environments.The study was published in arXiv.

We have summarized this news so that you can read it quickly. If you are interested in the news, you can read the full text here. Read more:

AI Robots Artificial Intelligence Automation Machine Learning Microsoft Research Robot Planning Robotics Robotics Research Spatial Grounding Vision Language Models VLM

Write Comment

United States Latest News, United States Headlines

Similar News:You can also read news stories similar to this one that we have collected from other news sources.

This is What Has Become of the Humane Ai Pin: An Enterprise Laptop ChatbotHP wants to put a chatbot on your PC, but not like Microsoft did with Windows 11.
Read more »

2 delivery robots crash into CTA bus shelters days apart; 1 incident caught on cameraSurveillance footage captured one of at least two food delivery robot crashes into CTA bus shelters.
Read more »

Microsoft, NVIDIA unite to create digital ecosystem for accelerating nuclear reactor deploymentA new partnership between tech giants aims to replace manual regulatory reviews with AI-driven document drafting and unified data.
Read more »

Microsoft makes sweeping overhaul of HR organization, internal memo showsBusiness Insider tells the global tech, finance, stock market, media, economy, lifestyle, real estate, AI and innovative stories you want to know.
Read more »

Microsoft laid me off. I'm still job hunting nearly 3 years later.Business Insider tells the global tech, finance, stock market, media, economy, lifestyle, real estate, AI and innovative stories you want to know.
Read more »

Xbox’s latest games showcase had Hades 2, The Expanse, and BlueyAt its Xbox Partner Preview livestream, Microsoft showed off a number of upcoming games, including the Xbox port of Hades 2.
Read more »