We have developed a ROS2 abstraction layer for fleet management in open-world robotics, Vertical AI agents for high-level reasoning and decision-making in real-time. This concept was born out of the challenges we encountered while developing AgroQR, especially when thinking about controlling and coordinating our drone fleets for exploring large spans of unseen areas.
Our approach introduces an AI-powered decision engine that interfaces directly with the robots' ROS topics and services (e.g., position, acceleration, joint kinematics, gyroscopic data). These agents process this data, generate natural language task directives, and convert them into ROS messages, which are then dispatched to individual robot agents for execution. After completing their tasks, robots resend state feedback to the engine, ensuring a continuous cycle of adaptation and optimization. This mimics a multi-threading system but applied to high-level robotic decision-making across fleets, thereby as decentralized control engine, sitting on top of ROS2.
A key feature of our framework is the ability to spin up dynamic AI agent instances and terminate them once tasks are completed. This eliminates unnecessary memory overhead while allowing real-time addition of new robot agents into an existing fleet pipeline. Such an approach is crucial for large-scale, multi-agent coordination in open-world tasks like exploration, search and rescue, and autonomous delivery systems.
To optimize fleet coordination, we implement a priority queuing system for task execution. Tasks are dynamically assigned based on priority levels, ensuring efficient scheduling and rapid response to high-importance objectives. Additionally, instance behavior summary logs are maintained for special-interest events, enabling inheritance of learned decision patterns across AI agent instances. This allows subsequent instances to inherit useful environmental context and decision logic from prior AI agents, effectively improving long-term autonomy without requiring persistent memory retention within the same session. These logs serve as hardcoded behavior and decision enforcement on future agent sessions, as well as training data for maybe new models altogether, refining classical decision-making through continuous self-improvement.
A major advantage of our system is the integration of diverse sensory inputs, such as vision, LiDAR, and inertial sensors, into a unified decision-making framework. These vertical AI agents synthesize existing inputs to generate new probabilistic environmental insights, enabling robots to make more intelligent decisions beyond their individual perception capabilities. The service is hosted on the cloud, with mechanisms to optimize memory usage by clearing waste data at the end of each session, addressing the limits of an LLM's context window. Since agents are continuously created and terminated in the process, there is no need to worry about these context size limitations.
We are actively developing and testing this control mechanism and have already obtained promising results. If you are interested in open-world robotics research, we welcome partnerships and contributions. Our prior work on single-robot natural language-based high-level decision-making and control was published in ICRA 2025, and we are now expanding to multi-agent systems, building on the same principles. We look forward to publishing our findings in top-tier robotics and AI conferences such as, ICRA, IROS, ICML, ICLR, and NeurIPS and may open-source this research as a standalone project to advance the field in a feasible manner for everyone.
Our long-term mission is to enable mobile robot agents to operate in open-world environments for general-purpose applications. We are dedicated to building scalable frameworks that empower manufacturing companies to deploy robotic fleets for automating unseen/unstructured complex tasks in everyday life. If you share our vision and want to push the boundaries of AI-driven fleet management, reach out—we are eager to collaborate!