Introduction to Validation and Verification in Autonomy

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

In most cases, the generic V&V process must grapple with massive ODD spaces, limited execution capacity, and high cost of evaluation. Further, all of this must be done in a timely manner to make the product available to the marketplace. Traditionally, the V&V regimes have been bifurcated into two broad categories: Physics-Based and Decision-Based. We will discuss the key characteristics of each now.

A. TRADITIONAL PHYSICS-BASED EXECUTION

For MaVV, the critical factors are the efficiency of the MiVV “engine” and the argument for the completeness of the validation. Historically, mechanical/non-digital products (such as cars or airplanes) required sophisticated V&V. These systems were examples of a broader class of products which had a Physics-Based Execution (PBE) paradigm. In this paradigm, the underlying model execution (including real life) has the characteristics of continuity and monotonicity because the model operates in the world of physics. This key insight has enormous implications for V&V because it greatly constrains the potential state-space to be explored. Examples of this reduction of state-space include: 1) Scenario Generation: One need only worry about the state space constrained by the laws of physics. Thus, objects which defy gravity cannot exist. Every actor is explicitly constrained by the laws of physics. 2) Monotonicity: In many interesting dimensions, there are strong properties of monotonicity. As an example, if one is considering stopping distance for braking, there is a critical speed above which there will be an accident. Critically, all the speed bins below this critical speed are safe and do not have to be explored. Mechanically, in traditional PBE fields, the philosophy of safety regulation (ISO 26262 [5], AS9100 [6], etc.) builds the safety framework as a process, where 1) failure mechanisms are identified; 2) a test and safety argument is built to address the failure mechanism; 3) there is an active process by a regulator (or documentation for self-regulation) which evaluates these two, and acts as a judge to approve/decline. Traditionally, faults considered were primarily mechanical failure. As an example, the flow for validating the braking system in an automobile through ISO 26262 would have the following steps: 1) Define Safety Goals and Requirements (Concept Phase): Hazard Analysis and Risk Assessment (HARA): Identify potential hazards related to the braking system (e.g., failure to stop the vehicle, uncommanded braking). Assess risk levels using parameters like severity, exposure, and controllability. Define Automotive Safety Integrity Levels (ASIL) for each hazard (ranging from ASIL A to ASIL D, where D is the most stringent). Define safety goals to mitigate hazards (e.g., ensure sufficient braking under all conditions). 2) Develop Functional Safety Concept: Translate safety goals into high-level safety requirements for the braking system. Ensure redundancy, diagnostics, and fail-safe mechanisms are incorporated (e.g., dual-circuit braking or electronic monitoring). 3) System Design and Technical Safety Concept: Break down functional safety requirements into technical requirements, design the braking system with safety mechanisms like (Hardware {e.g., sensors, actuators. Software (e.g., anti-lock braking algorithms). Implement failure detection and mitigation strategies (e.g., failover to mechanical braking if electronic control fails). 4) Hardware and Software Development: Hardware Safety Analysis (HSA): Validate that components meet safety standards (e.g., reliable braking sensors). Software Development and Validation: Use ISO 26262-compliant processes for coding, verification, and validation. Test braking algorithms under various conditions. 5) Integration and Testing: Perform verification of individual components and subsystems to ensure they meet technical safety requirements. Conduct integration testing of the complete braking system, focusing on: Functional tests (e.g., stopping distance), Safety tests (e.g., behavior under fault conditions), and Stress and environmental tests (e.g., heat, vibration). 6) Validation (Vehicle Level): Validate the braking system against safety goals defined in the concept phase. Perform real-world driving scenarios, edge cases, and fault injection tests to confirm safe operation. Verify compliance with ASIL-specific requirements. 7) Production, Operation, and Maintenance: Ensure production aligns with validated designs, implement operational safety measures (e.g., periodic diagnostics, maintenance), monitor and address safety issues during the product's lifecycle (e.g., software updates). 8) Confirmation and Audit: Use independent confirmation measures (e.g., safety audits, assessment reviews) to ensure the braking system complies with ISO 26262.

Finally, the regulations have a strong idea of safety levels with Automotive Safety Integrity Levels (ASIL). Airborne systems follow a similar trajectory (pun intended) with the concept of Design Assurance Levels (DALs). A key part of the V&V task is to meet the standards required at each ASIL level. Historically, a sophisticated set of V&V techniques has been developed to verify traditional automotive systems. These techniques included well-structured physical tests, often validated by regulators, or sanctioned independent companies (ex TUV-Sud [7]). Over the years, the use of virtual physics-based models has increased to model design tasks such as body design [8] or tire performance [9]. The general structure of these models is to build a simulation which is predictive of the underlying physics to enable broader ODD exploration. This creates a very important characterization, model generation, predictive execution, and correction flow. Finally, because the execution is highly constrained by physics, virtual simulators can have limited performance and often require extensive hardware support for simulation acceleration. In summary, the key underpinnings of the PBE paradigm from a V&V point of view are: 1) Constrained and well-behaved space for scenario test generation 2) Expensive physics based simulations 3) Regulations focused on mechanical failure 4) In safety situations, regulations focused on a process to demonstrate safety with a key idea of design assurance levels.

B. TRADITIONAL DECISION-BASED EXECUTION

As cyber-physical systems evolved, information technology (IT) rapidly transformed the world. Electronics design trends revolutionized industries, starting with centralized computing led by firms like IBM and DEC. These technologies enhanced productivity for global business operations, significantly impacting finance, HR, and administrative functions, eliminating the need for extensive paperwork.

Fig. 3. Electronics Megatrends.

The next wave in economy shaping technologies consisted of edge computing devices (red in Figure 3) such as personal computers, cell phones, and tablets. With this capability, companies such as Apple, Amazon, Facebook, Google, and others could add enormous productivity to the advertising and distribution functions for global business. Suddenly, one could directly reach any customer anywhere in the world. This mega-trend has fundamentally disrupted markets such as education (online), retail (ecommerce), entertainment (streaming), commercial real estate (virtualization), health (telemedicine), and more. The next wave of electronics is the dynamic integration with physical assets, and thus even enabling autonomy.

Fig. 4. Progression of System Specification (HW, SW, AI).

As shown in Figure 4, within electronics, there has been a progression of system function construction where the first stage was hardware or pseudo-hardware (FPGA, microcode). The next stage involved the invention of a processor architecture upon which software could imprint system function. Software was a design artifact written by humans in standard languages (C, Python, etc.). The revolutionary aspect of the processor abstraction allowed a shift in function without the need to shift physical assets. However, one needed legions of programmers to build the software. Today, the big breakthrough with Artificial Intelligence (AI) is the ability to build software with the combination of underlying models, data, and metrics. In their basic form, IT systems were not safety critical, and the similar levels of legal liability have not attached to IT products. However, the size and growth of IT is such that problems in large volume consumer products can have catastrophic economic consequences [10]. Thus, the V&V function was very important. IT systems follow the same generic processes for V&V as outlined above, but with two significant differences around the execution paradigm and source of errors. First, unlike the PBE paradigm, the execution paradigm of IT follows a Decision Based Execution mode (DBE). That is, there are no natural constraints on the functional behavior of the underlying model, and no inherent properties of monotonicity. Thus, the whole massive ODD space must be explored which makes the job of generating tests and demonstrating coverage extremely difficult. To counter this difficulty, a series of processes have been developed to build a more robust V&V structure. These include: 1) Code Coverage: Here, the structural specification of the virtual model is used as a constraint to help drive the test generation process. This is done with software or hardware (RTL code). 2) Structured Testing: A process of component, subsection, and integration testing has been developed to minimize propagation of errors. 3) Design Reviews: Structured design reviews with specs and core are considered best practice.

A good example of this process flow is the CMU Capability Maturity Model Integration (CMMI) [11] which defines a set of processes to deliver quality software. Large parts of the CMMI architecture can be used for AI when AI is replacing existing SW components. Finally, testing in the DBE domain decomposes into the following philosophical categories: “Known knowns:” Bugs or issues that are identified and understood, “Known unknowns” Potential risks or issues that are anticipated but whose exact nature or cause is unclear, and “Unknown unknowns” Completely unanticipated issues that emerge without warning, often highlighting gaps in design, understanding, or testing. The last category being the most problematic and most significant for DBE V&V. Pseudo-random test generation has been a key technique used as a method to expose this category [12]. In summary, the key underpinnings of the DBE paradigm from a V&V point of view are: 1) Unconstrained and not well-behaved execution space for scenario test generation, 2) Generally, less expensive simulation execution (no physical laws to simulate), 3) V&V focused on logical errors not mechanical failure 4) Generally, no defined regulatory process for safety critical applications. Most software is “best efforts,” 5) “Unknown-unknowns” a key focus of validation.

A key implication of the DBE space is that the idea from the PBE world of building a list of faults and building a safety argument for them is antithetical to the focus of DBE validation.

en/safeav/as/vvintro.txt · Last modified: 2025/06/16 01:44 by rahulrazdan