====== Validation Approaches ======
{{:en:iot-open:czapka_m.png?50| Masters (2nd level) classification icon }}

<todo @bertlluk>CTU: Help needed. We don't know what the intent was behind this chapter.</todo>

This section presents a practical, simulation-driven approach to validating the perception, mapping (HD maps/digital twins), and localization layers of an autonomous driving stack. The core idea is to anchor tests in the operational design domain (ODD), express them as reproducible scenarios, and report metrics that connect module-level behavior to system-level safety.

====== Scope, ODD, and Assurance Frame ======

We decompose the stack into Perception (object detection/tracking), Mapping (HD map/digital twin creation and consistency), and Localization (GNSS/IMU and vision/LiDAR aiding) and validate each with targeted KPIs and fault injections. The evidence is organized into a safety case that explains how module results compose at system level. Tests are derived from the ODD and instantiated as logical/concrete scenarios (e.g., with a scenario language like Scenic) over the target environment. This gives you systematic coverage and reproducible edge-case generation while keeping hooks for standards-aligned arguments (e.g., ISO 26262/SOTIF) and formal analyses where appropriate.

====== Perception Validation ======


The objective is to quantify detection performance—and its safety impact—across the ODD. In end-to-end, high-fidelity (HF) simulation, we log both simulator ground truth and the stack’s detections, then compute per-class statistics as a function of distance and occlusion. Near-field errors are emphasized because they dominate braking and collision risk. Scenario sets should include partial occlusions, sudden obstacle appearances, vulnerable road users, and adverse weather/illumination, all realized over the site map so that failures can be replayed and compared.


<figure Detection Validation>
{{ :en:safeav:maps:perception_val.png?400 | Detection Validation}}
<caption>Detection validation example. The Ground truth of the detectable vehicles is indicated using green boxes, while the detections are marked using red boxes. </caption>
</figure>

  * **KPIs**: precision/recall per class and distance bin; time-to-detect and time-to-react deltas; TTC availability and whether perceived obstacles trigger sufficient braking distance.
  * **Search strategy**: use low-fidelity (LF) sweeps for breadth (planner-in-the-loop, simplified sensors) and confirm top-risk cases in HF with full sensor simulation before any track trials.

Figure 1 explains object comparison. Green boxes are shown for objects captured by ground truth, while Red boxes are shown for objects detected by the AV stack. Threshold-based rules are designed to compare the objects. It is expected to provide specific indicators of detectable vehicles in different ranges for safety and danger areas.
====== Mapping / Digital-Twin Validation ======


Validation begins with how the map and digital twin are produced. Aerial imagery or LiDAR is collected with RTK geo-tagging and surveyed control points, then processed into dense point clouds and classified to separate roads, buildings, and vegetation. From there, you export OpenDRIVE (for lanes, traffic rules, and topology) and a 3D environment for HF simulation. The twin should be accurate enough that perception models do not overfit artifacts and localization algorithms can achieve lane-level continuity.

Key checks include lane topology fidelity versus survey, geo-consistency in centimeters, and semantic consistency (e.g., correct placement of occluders, signs, crosswalks). The scenarios used for perception and localization are bound to this twin so that results can be reproduced and shared across teams or vehicles. Over time, you add change-management: detect and quantify drifts when the real world changes (construction, foliage, signage) and re-validate affected scenarios.

====== Localization Validation ======


Here, the focus is on the robustness of ego-pose to sensor noise, outages, and map inconsistencies. In simulation, you inject GNSS multipath, IMU bias, packet dropouts, or short GNSS blackouts and watch how quickly the estimator diverges and re-converges. Similar tests perturb the map (e.g., small lane-mark misalignments) to examine estimator sensitivity to mapping error.

The following is a short KPI list:

  * **Pose error & drift**: per-frame position/orientation error, drift rate during GNSS loss.

  * **Continuity**: lane-level continuity at junctions and during sharp maneuvers.

  * **Recovery**: re-convergence time and heading stability after outages.

  * **Safety propagation**: impact on distance-to-collision (DTC), braking sufficiency, and rule-checking (e.g., lane keeping within margins).


<figure Localization Validation>
{{ :en:safeav:maps:localization_val.png?400 | localization validation}}
<caption> Localization validation, in some cases, the difference between the expected location and the actual location may lead to accidents.</caption>
</figure>

The current validation methods perform a one-to-one mapping between the expected and actual locations. As shown in Fig. 2, for each frame, the vehicle position deviation is computed and reported in the validation report. Later parameters, like min/max/mean deviations, are calculated from the same report. In the validation procedure, it is also possible to modify the simulator to embed a mechanism to add noise in the localization process to check the robustness and validate its performance. 
====== Multi-Fidelity Workflow and Scenario-to-Track Bridge ======


A two-stage workflow balances coverage and realism. First, use LF tools (e.g., planner-in-the-loop with simplified sensors and traffic) to sweep large grids of logical scenarios and identify risky regions in parameter space (relative speed, initial gap, occlusion level). Then, promote the most informative concrete scenarios to HF simulation with photorealistic sensors for end-to-end validation of perception and localization interactions. Where appropriate, a small, curated set of scenarios is carried to closed-track trials. Success criteria are consistent across all stages, and post-run analyses attribute failures to perception, localization, prediction, or planning so fixes are targeted rather than generic.