Table of Contents

Validation of Control & Planning

 Masters (2nd level) classification icon

[momala]

Principles and Scope

Planning and control are where intent becomes motion. A planning stack selects a feasible, safety-aware trajectory under evolving constraints; the control stack turns that trajectory into actuation while respecting vehicle dynamics and delays. Validating these layers is therefore about much more than unit tests: it is about demonstrating, with evidence, that the combined decision–execution loop behaves safely and predictably across the intended operational design domain (ODD). In practice, this requires two complementary ideas. First, a digital twin of the vehicle and environment that is accurate enough to make simulation a meaningful predictor of real behavior. Second, a design-of-experiments (DOE)–driven scenario program that stresses the decision and control logic where it matters most, and converts outcomes into monitorable, quantitative metrics. Your V&V suite frames both: scenario descriptions feed a co-running simulator with the under-test algorithms, the digital twin (vehicle and environment) is loaded as an external asset, and the outcome is a structured validation report rather than anecdotal test logs.

Planning/control V&V must also navigate the mix of deterministic dynamics and stochastic perception/prediction. At the component level, your framework treats detection, control, localization, mission planning, and low-level control as distinct abstractions, yet evaluates them in the context of Newtonian physics—explicitly trading fidelity for performance depending on the test intent. This modularity enables validating local properties (e.g., trajectory tracking) while still measuring system-level safety effects (e.g., minimum distance to collision).

A final principle is lifecycle realism. A digital twin is not just a CAD model; it is a live feedback loop receiving data from the physical system and its environment, so the simulator remains predictive as the product evolves. The same infrastructure that generates scenarios can replay field logs, inject updated vehicle parameters, and reflect map changes, enabling continuous V&V of planning and control post-deployment.

Scenario-Based Validation with Digital Twins

The V&V workflow begins with a formal scenario description: functional narratives are encoded in a human-readable DSL (e.g., M-SDL/Scenic), then reduced to logical parameter ranges and finally to concrete instantiations selected by DOE. This ensures tests are reproducible, shareable, and traceable from high-level goals down to the numeric seeds that define a specific run. The simulator co-executes these scenarios with under the test algorithms inside the digital twin, and the V&V interface collects vehicle control signals, virtual sensor streams, and per-run metrics to generate the verdicts required by the safety case.

To maintain broad coverage without sacrificing realism, validations can be done using a two-layer approach shown in Figure 1. A low-fidelity (LF) layer (e.g., SUMO) sweeps wide parameter grids quickly to reveal where planning/control begins to stress safety constraints; a high-fidelity (HF) layer (e.g., a game engine simulator like CARLA with the control software in the loop) then replays the most informative cases with photorealistic sensors and closed-loop actuation. Both layers log the same KPIs, so results are comparable and can be promoted to track tests when warranted. This division of labor is central to scaling scenario space while maintaining end-to-end realism for planning and control behaviors like cut-in/out, overtaking, and lane changes.

Low and High Fidelity Simulators
Figure 1: Fidelity of AV simulation: a) Low-Fidelity SUMO simulator[1] b) High-Fidelity AWSIM simulator [2]

Formal methods strengthen this flow. In the simulation-to-track pipeline, scenarios and safety properties are specified formally (e.g., via Scenic and Metric Temporal Logic), falsification synthesizes challenging test cases, and a mapping executes those cases on a closed track[3]. In published evidence, a majority of unsafe simulated cases reproduced as unsafe on track, and safe cases mostly remained safe—while time-series comparisons (e.g., DTW, Skorokhod metrics) quantified the sim-to-real differences relevant to planning and control. This is exactly the kind of transferability and measurement discipline a planning/control safety argument needs.

Finally, environment twins are built from aerial photogrammetry and point-cloud processing (with RTK-supported georeferencing), yielding maps and 3D assets that match the real campus, so trajectory-level decisions (overtake, yield, return-to-lane) are evaluated against faithful road geometries and occlusion patterns[4].

Methods and Metrics for Planning & Control

Mission-level planning validation starts from a start–goal pair and asks whether the vehicle reaches the destination via a safe, policy-compliant trajectory. Your platform publishes three families of evidence: (i) trajectory-following error relative to the global path; (ii) safety outcomes such as collisions or violations of separation; and (iii) mission success (goal reached without violations). This couples path selection quality to execution fidelity.

At the local planning level, your case study focuses on the planner inside the autonomous software. The planner synthesizes a global and a local path, then evaluates them based on predictions from surrounding actors to select a safe local trajectory for maneuvers such as passing and lane changes. By parameterizing scenarios with variables such as the initial separation to the lead vehicle and the lead vehicle’s speed, you create a grid of concrete cases that stress the evaluator’s thresholds. The outcomes are categorized by meaningful labels—Success, Collision, Distance-to-Collision (DTC) violation, excessive deceleration, long pass without return, and timeout—so that planner tuning correlates directly with safety and comfort.

Trajectory Validation
Figure 2: Trajectory validation example

Control validation links perception-induced delays to braking and steering outcomes. Your framework computes Time-to-Collision (Formula) along with the simulator and AV-stack response times to detected obstacles. Sufficient response time allows a safe return to nominal headway; excessive delay predicts collision, sharp braking, or planner oscillations. By logging ground truth, perception outputs, CAN bus commands, and the resulting dynamics, the analysis separates sensing delays from controller latency, revealing where mitigation belongs (planner margins vs. control gains).

A necessary dependency is localization health. Your tests inject controlled GPS/IMU degradations and dropouts through simulator APIs, then compare expected vs. actual pose per frame to quantify drift. Because planning and control are sensitive to absolute and relative pose, this produces actionable thresholds for safe operation (e.g., maximum tolerated RMS deviation before reducing speed or restricting maneuvers).

Finally, your program extends to low-level control via HIL-style twins. A Simulink-based network of virtual ECUs and data buses sits between Autoware’s navigation outputs and simulator actuation. This lets you simulate bus traffic, counters, and checksums; disable subsystems (e.g., steering module) to provoke graceful degradation; and compare physical ECUs against their twin under identical inputs to detect divergence. It is an efficient route to validating actuator-path integrity without building a full physical rig.

Case Study and Safety Argumentation

On the TalTech iseAuto shuttle, the digital twin (vehicle model, sensor suite, and campus environment) is integrated with LGSVL/Autoware through a ROS bridge so that “photons-to-torque” loops are exercised under realistic scenes before any track test. Scenarios are distributed over the campus xodr network using Scenic/M-SDL; multiple events can be chained within a scenario to probe planner behaviors around parked vehicles, slow movers, or oncoming traffic. Logging is aligned to the KPIs above so outcomes are comparable across LF/HF layers and re-runnable when planner or control parameters change.

In practice, this has yielded a concise, defensible narrative for planning & control safety: (1) what was tested (formalized scenarios across a structured parameter space); (2) how it was tested (two-layer simulation with a calibrated digital twin and, when necessary, track execution); (3) what happened (mission success, DTC minima, TTC profiles, braking/steering transients, localization drift); and (4) why it matters (evidence that tuning or algorithmic changes move the decision–execution loop toward or away from safety). The same framework has been used to analyze adversarial stresses on rule-based local planners, reinforcing that planning validation must include robustness to distribution shifts and targeted perturbations.

As a closing reflection, the approach acknowledges that simulation is not the world—so it measures the gap. By transporting formally generated cases to the track and comparing time-series behaviors, the program both validates planning/control logic and calibrates the digital twin itself, using discrepancies to guide model updates and ODD limits. That is the hallmark of modern control & planning V&V: scenario-driven, digitally twinned, formally grounded, and relentlessly comparative to reality.


[1] Pablo Alvarez Lopez, Michael Behrisch, Laura Bieker-Walz, Jakob Erdmann, Yun- Pang Flötteröd, Robert Hilbrich, Leonhard Lücken, Johannes Rummel, Peter Wag- ner, and Evamarie Wießner. Microscopic traffic simulation using sumo. In The 21st IEEE International Conference on Intelligent Transportation Systems. IEEE, 2018.
[2] Autoware Foundation. TIER IV AWSIM. https://github.com/tier4/AWSIM, 2022.
[3] Fremont, Daniel J., et al. “Formal scenario-based testing of autonomous vehicles: From simulation to the real world.” 2020 IEEE 23rd International Conference on Intelligent Transportation Systems (ITSC). IEEE, 2020.
[4] Pikner, Heiko, et al. “Autonomous Driving Validation and Verification Using Digital Twins.” VEHITS (2024): 204-211.