−Table of Contents

Authors
- Tallinn University of Technology
- Silesian University of Technology
- Riga Technical University
- ProDron
- Czech Technical University
- External Contributors
- Technical editing
- Graphic Design and Images
Project Information
Introduction
Content classification hints
Autonomous Vehicles
Autonomous Systems
Ground, Aerial, and Marine Vehicle Architectures
Domain-Specific Challenges in Autonomy
Definitions, Classification, and Levels of Autonomy
Legal, Ethical, and Regulatory Frameworks
Introduction to Validation and Verification in Autonomy
Validation Requirements across Domains
Intersection of Autonomy with Governance
Cybersecurity
Hardware and Sensing Technologies
Sensors, Computing Units, and Navigation Systems
Hardware Integration and Supply Chain Considerations
Validating Sensors
Governance, EMC
Calibration, Maintenance, and Supply Chain
Software Systems and Middleware
Autonomy Software Stacks
Software Lifecycle and Configuration Management
Testing Software Systems
Governance Safety Critical systems
Open issues of validating AI components
Perception, Mapping and Localisation
Object Detection, Sensor Fusion, Mapping, and Positioning
AI-based Perception and Scene Understanding
Sources of Instability
Validation Approaches
Control, Planning, and Decision-Making
Classical and AI-Based Control Strategies
Motion Planning and Behavioural Algorithms
Validation of Control & Planning
Simulation & Formal Methods
Human-Machine Communication
Human Machine Interface and Communication
Human Perception and Driving
Cultural and Social Interactions
Language of Driving
Passenger Communication
Pedestrian Communication
AI role on communication
Modes of Interactions
Language of Driving Concepts
Safety Concerns and Public Acceptance
Verification & Validation of HMI
Autonomy Validation Tools
Overview of V&V Techniques
Testing Infrastructure
Challenges Ahead
Research Outlook

Authors

The list of book contributors is presented below.

Tallinn University of Technology

Raivo Sell, Ph. D., ING-PAED IGIP

Silesian University of Technology

Roman Czyba, Ph. D., DSc., Eng.
Piotr Czekalski, Ph. D., Eng.
Tomasz Grzejszczak, Ph. D., Eng.

Riga Technical University

Agris Nikitenko, Ph. D., Eng.
Karlis Berkolds, M. sc., Eng.

ProDron

Tomasz Siwy, CEO

Czech Technical University

[bertlluk]The Czech academic titles are a bit tricky to translate, I will write the Czech version and the equivalent in the parentheses, so it can be translated to some common style later by the editor.

Ing. Libor Přeučil, CSc. (Ing. = Master of Engineering, CSc. = Ph. D.)
Ing. Karel Košnar, Ph.D. (Ing. = Master of Engineering)

External Contributors

Technical editing

Graphic Design and Images

Project Information

This content was implemented under the following project:

Cooperation Partnerships in higher education, 2024, SafeAV: Harmonizations of Autonomous Vehicle Safety Validation and Verification for Higher Education: 2024-1-EE01-KA220-HED-000245441.

Consortium Partners

Silesian University of Technology, Gliwice, Poland (Coordinator),
Riga Technical University, Riga, Latvia,
Czech Technical University in Prague, Prague, Czech Republic
Tallinn University of Technology, Tallinn, Estonia
ITT Group, Tallinn, Estonia.
Prodron, Gliwice, Poland

Erasmus+ Disclaimer
This project has been funded with support from the European Commission.
This publication reflects the views only of the author, and the Commission cannot be held responsible for any use which may be made of the information contained therein.

Copyright Notice
This content was created by the SafeAV Consortium 2024–2027.
The content is copyrighted and distributed under CC BY-NC Creative Commons Licence and is free for non-commercial use.

In case of commercial use, please get in touch with MultiASM Consortium representative.

Introduction

[raivo]Please fill in some introduction

Content classification hints

The book comprises a comprehensive guide for a variety of education levels. A brief classification of the contents regarding target groups may help in a selective reading of the book and ease finding the correct chapters for the desired education level. To inform a reader about the proposed target group, icons are assigned to the top headers of the chapters. The list of icons and their reflection on the target groups is presented in the table 1.

Table 1: List of icons presenting content classification and corresponding target groups

Icon	Target group
	Bachelor and Engineering level students
	Masters students

Autonomous Vehicles

put your contents here

[rczyba]

Follow those subchapters for more content:

Autonomous Systems

[rczyba]

Ground, Aerial, and Marine Vehicle Architectures

[rczyba]

Domain-Specific Challenges in Autonomy

[rczyba]

Definitions, Classification, and Levels of Autonomy

[rczyba]

Legal, Ethical, and Regulatory Frameworks

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

In society, products operate within the confines of a legal governance structure. Whatever value products provide to their consumers is weighed against the potential harm caused by the product, and leads to the concept of legal product liability. While laws diverge across various geographies, the fundamental tenets have key elements of expectation and harm. Expectation as judged by “reasonable behavior given a totality of the facts” attaches liability. As an example, the clear expectation is that if you stand in front of a train, it cannot stop instantly while this is not the expectation for most autonomous driving situations. Harm is another key concept where AI recommendation systems for movies are not held to the same standards as autonomous vehicles. The governance framework for liability is mechanically developed through legislative actions and associated regulations. The framework is tested in the court system under the particular circumstances or facts of the case. To provide stability to the system, the database of cases and decisions are viewed as a whole under the concept of precedence. Clarification on legal points is set by the appellate legal system where arguments on the application of the law are decided what sets precedence [1,2]. From a product development perspective, the combination of laws, regulations, legal precedence form the overriding governance framework around which the system specification must be constructed [3]. The process of validation ensures that a product design meets the user's needs and requirements, and verification ensures that the product is built correctly according to design specifications.

Fig. 1. V&V and Governance Framework. The Master V&V(MaVV) process needs to demonstrate that the product has been reasonably tested given the reasonable expectation of causing harm. It does so using three important concepts [4]: 1) Operational Design Domain (ODD): This defines the environmental conditions and operational model under which the product is designed to work. 2) Coverage: This defines the completeness over the ODD to which the product has been validated. 3) Field Response: When failures do occur, the procedures used to correct product design shortcomings to prevent future harm. As figure 1 shows, the Verification & Validation (V&V) process is the key input into the governance structure which attaches liability, and per the governance structure, each of the elements must show “reasonable due diligence.” An example of unreasonable ODD would be for an autonomous vehicle to give up control a millisecond before an accident.

Fig. 2. Execution is space.

Mechanically, MaVV is implemented with a Minor V&V (MiVV) process consisting of: 1) Test Generation: From the allowed ODD, test scenarios are generated. 2) Execution: This test is “executed” on the product under development. Mathematically, a functional transformation which produces results. 3) Criteria for Correctness: The results of the execution are evaluated for success or failure with a crisp criteria-for-correctness. In practice, each of these steps can have quite a bit of complexity and associated cost. Since the ODD can be a very wide state space, intelligently and efficiently generating the stimulus is critical. Typically, in the beginning, stimulus generation is done manually, but this quickly fails the efficiency test in terms of scaling. In virtual execution environments, pseudo-random directed methods are used to accelerate this process. In limited situations, symbolic or formal methods can be used to mathematically carry large state spaces through the whole design execution phase. Symbolic methods have the advantage of completeness but face algorithmic computational explosion issues as many of the operations are NP-Complete algorithms. The execution stage can be done physically, but this process is expensive, slow, has limited controllability and observability, and in safety critical situations, potentially dangerous. In contrast, virtual methods have the advantage of cost, speed, ultimate controllability and observability, and no safety issues. The virtual methods also have the great advantage of performing the V&V task well before the physical product is constructed. This leads to the classic V chart shown in figure 1. However, since virtual methods are a model of reality, they introduce inaccuracy into the testing domain while physical methods are accurate by definition. Finally, one can intermix virtual and physical methods with concepts such as Software-in-loop or Hardware-in-loop. The observable results of the stimulus generation are captured to determine correctness. Correctness is typically defined by either a golden model or an anti-model. The golden model, typically virtual, offers an independently verified model whose results can be compared to the product under test. Even in this situation, there is typically a divergence between the abstraction level of the golden model and the product which must be managed. Golden model methods are often used in computer architectures (ex ARM, RISCV). The anti-model situation consists of error states which the product cannot enter, and thus the correct behavior is the state space outside of the error states. An example might be in the autonomous vehicle space where an error state might be an accident or violation of any number of other constraints. The MaVV consists of building a database of the various explorations of the ODD state space, and from that building an argument for completeness. The argument typically takes the nature of a probabilistic analysis. After the product is in the field, field returns are diagnosed, and one must always ask the question: Why did not my original process catch this issue? Once found, the test methodology is updated to prevent issues with fixes going forward.

Ref: [1] [2] [3]

Introduction to Validation and Verification in Autonomy

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

In most cases, the generic V&V process must grapple with massive ODD spaces, limited execution capacity, and high cost of evaluation. Further, all of this must be done in a timely manner to make the product available to the marketplace. Traditionally, the V&V regimes have been bifurcated into two broad categories: Physics-Based and Decision-Based. We will discuss the key characteristics of each now.

A. TRADITIONAL PHYSICS-BASED EXECUTION

For MaVV, the critical factors are the efficiency of the MiVV “engine” and the argument for the completeness of the validation. Historically, mechanical/non-digital products (such as cars or airplanes) required sophisticated V&V. These systems were examples of a broader class of products which had a Physics-Based Execution (PBE) paradigm. In this paradigm, the underlying model execution (including real life) has the characteristics of continuity and monotonicity because the model operates in the world of physics. This key insight has enormous implications for V&V because it greatly constrains the potential state-space to be explored. Examples of this reduction of state-space include: 1) Scenario Generation: One need only worry about the state space constrained by the laws of physics. Thus, objects which defy gravity cannot exist. Every actor is explicitly constrained by the laws of physics. 2) Monotonicity: In many interesting dimensions, there are strong properties of monotonicity. As an example, if one is considering stopping distance for braking, there is a critical speed above which there will be an accident. Critically, all the speed bins below this critical speed are safe and do not have to be explored. Mechanically, in traditional PBE fields, the philosophy of safety regulation (ISO 26262 [5], AS9100 [6], etc.) builds the safety framework as a process, where 1) failure mechanisms are identified; 2) a test and safety argument is built to address the failure mechanism; 3) there is an active process by a regulator (or documentation for self-regulation) which evaluates these two, and acts as a judge to approve/decline. Traditionally, faults considered were primarily mechanical failure. As an example, the flow for validating the braking system in an automobile through ISO 26262 would have the following steps: 1) Define Safety Goals and Requirements (Concept Phase): Hazard Analysis and Risk Assessment (HARA): Identify potential hazards related to the braking system (e.g., failure to stop the vehicle, uncommanded braking). Assess risk levels using parameters like severity, exposure, and controllability. Define Automotive Safety Integrity Levels (ASIL) for each hazard (ranging from ASIL A to ASIL D, where D is the most stringent). Define safety goals to mitigate hazards (e.g., ensure sufficient braking under all conditions). 2) Develop Functional Safety Concept: Translate safety goals into high-level safety requirements for the braking system. Ensure redundancy, diagnostics, and fail-safe mechanisms are incorporated (e.g., dual-circuit braking or electronic monitoring). 3) System Design and Technical Safety Concept: Break down functional safety requirements into technical requirements, design the braking system with safety mechanisms like (Hardware {e.g., sensors, actuators. Software (e.g., anti-lock braking algorithms). Implement failure detection and mitigation strategies (e.g., failover to mechanical braking if electronic control fails). 4) Hardware and Software Development: Hardware Safety Analysis (HSA): Validate that components meet safety standards (e.g., reliable braking sensors). Software Development and Validation: Use ISO 26262-compliant processes for coding, verification, and validation. Test braking algorithms under various conditions. 5) Integration and Testing: Perform verification of individual components and subsystems to ensure they meet technical safety requirements. Conduct integration testing of the complete braking system, focusing on: Functional tests (e.g., stopping distance), Safety tests (e.g., behavior under fault conditions), and Stress and environmental tests (e.g., heat, vibration). 6) Validation (Vehicle Level): Validate the braking system against safety goals defined in the concept phase. Perform real-world driving scenarios, edge cases, and fault injection tests to confirm safe operation. Verify compliance with ASIL-specific requirements. 7) Production, Operation, and Maintenance: Ensure production aligns with validated designs, implement operational safety measures (e.g., periodic diagnostics, maintenance), monitor and address safety issues during the product's lifecycle (e.g., software updates). 8) Confirmation and Audit: Use independent confirmation measures (e.g., safety audits, assessment reviews) to ensure the braking system complies with ISO 26262.

Finally, the regulations have a strong idea of safety levels with Automotive Safety Integrity Levels (ASIL). Airborne systems follow a similar trajectory (pun intended) with the concept of Design Assurance Levels (DALs). A key part of the V&V task is to meet the standards required at each ASIL level. Historically, a sophisticated set of V&V techniques has been developed to verify traditional automotive systems. These techniques included well-structured physical tests, often validated by regulators, or sanctioned independent companies (ex TUV-Sud [7]). Over the years, the use of virtual physics-based models has increased to model design tasks such as body design [8] or tire performance [9]. The general structure of these models is to build a simulation which is predictive of the underlying physics to enable broader ODD exploration. This creates a very important characterization, model generation, predictive execution, and correction flow. Finally, because the execution is highly constrained by physics, virtual simulators can have limited performance and often require extensive hardware support for simulation acceleration. In summary, the key underpinnings of the PBE paradigm from a V&V point of view are: 1) Constrained and well-behaved space for scenario test generation 2) Expensive physics based simulations 3) Regulations focused on mechanical failure 4) In safety situations, regulations focused on a process to demonstrate safety with a key idea of design assurance levels.

B. TRADITIONAL DECISION-BASED EXECUTION

As cyber-physical systems evolved, information technology (IT) rapidly transformed the world. Electronics design trends revolutionized industries, starting with centralized computing led by firms like IBM and DEC. These technologies enhanced productivity for global business operations, significantly impacting finance, HR, and administrative functions, eliminating the need for extensive paperwork.

Fig. 3. Electronics Megatrends.

The next wave in economy shaping technologies consisted of edge computing devices (red in Figure 3) such as personal computers, cell phones, and tablets. With this capability, companies such as Apple, Amazon, Facebook, Google, and others could add enormous productivity to the advertising and distribution functions for global business. Suddenly, one could directly reach any customer anywhere in the world. This mega-trend has fundamentally disrupted markets such as education (online), retail (ecommerce), entertainment (streaming), commercial real estate (virtualization), health (telemedicine), and more. The next wave of electronics is the dynamic integration with physical assets, and thus even enabling autonomy.

Fig. 4. Progression of System Specification (HW, SW, AI).

As shown in Figure 4, within electronics, there has been a progression of system function construction where the first stage was hardware or pseudo-hardware (FPGA, microcode). The next stage involved the invention of a processor architecture upon which software could imprint system function. Software was a design artifact written by humans in standard languages (C, Python, etc.). The revolutionary aspect of the processor abstraction allowed a shift in function without the need to shift physical assets. However, one needed legions of programmers to build the software. Today, the big breakthrough with Artificial Intelligence (AI) is the ability to build software with the combination of underlying models, data, and metrics. In their basic form, IT systems were not safety critical, and the similar levels of legal liability have not attached to IT products. However, the size and growth of IT is such that problems in large volume consumer products can have catastrophic economic consequences [10]. Thus, the V&V function was very important. IT systems follow the same generic processes for V&V as outlined above, but with two significant differences around the execution paradigm and source of errors. First, unlike the PBE paradigm, the execution paradigm of IT follows a Decision Based Execution mode (DBE). That is, there are no natural constraints on the functional behavior of the underlying model, and no inherent properties of monotonicity. Thus, the whole massive ODD space must be explored which makes the job of generating tests and demonstrating coverage extremely difficult. To counter this difficulty, a series of processes have been developed to build a more robust V&V structure. These include: 1) Code Coverage: Here, the structural specification of the virtual model is used as a constraint to help drive the test generation process. This is done with software or hardware (RTL code). 2) Structured Testing: A process of component, subsection, and integration testing has been developed to minimize propagation of errors. 3) Design Reviews: Structured design reviews with specs and core are considered best practice.

A good example of this process flow is the CMU Capability Maturity Model Integration (CMMI) [11] which defines a set of processes to deliver quality software. Large parts of the CMMI architecture can be used for AI when AI is replacing existing SW components. Finally, testing in the DBE domain decomposes into the following philosophical categories: “Known knowns:” Bugs or issues that are identified and understood, “Known unknowns” Potential risks or issues that are anticipated but whose exact nature or cause is unclear, and “Unknown unknowns” Completely unanticipated issues that emerge without warning, often highlighting gaps in design, understanding, or testing. The last category being the most problematic and most significant for DBE V&V. Pseudo-random test generation has been a key technique used as a method to expose this category [12]. In summary, the key underpinnings of the DBE paradigm from a V&V point of view are: 1) Unconstrained and not well-behaved execution space for scenario test generation, 2) Generally, less expensive simulation execution (no physical laws to simulate), 3) V&V focused on logical errors not mechanical failure 4) Generally, no defined regulatory process for safety critical applications. Most software is “best efforts,” 5) “Unknown-unknowns” a key focus of validation.

A key implication of the DBE space is that the idea from the PBE world of building a list of faults and building a safety argument for them is antithetical to the focus of DBE validation.

Validation Requirements across Domains

[rahulrazdan]

Intersection of Autonomy with Governance

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

The fundamental characteristics of DBE systems are problematic in safety critical systems. However, the IT sector has been a key megatrend which has transformed the world over the last 50 years. In the process, it has developed large ecosystems around semiconductors, operating systems, communications, and application software. At this point, using these ecosystems is critical to nearly every product’s success, so mixed-domain safety critical products are now a reality. Mixed Domain structures can be classified in three broad paradigms each of which have very different V&V requirements: Mechanical Replacement (Big PBE, small DBE), Electronic Adjacent (separate PBE and DBE), autonomy (Big DBE, small PBE). Drive-by-Wire functionality is an example of the mechanical replacement paradigm where the implementation of the original mechanical functionality is done by electronic components (HW/SW). In their initial configurations, these mixed electronic/mechanical systems were physically separated as independent subsystems. In this configuration, the V&V process looked very similar to the traditional mechanical verification process. Regulations were updated to include the idea of electronics failure with standards such as SOTIF (See Table 1). TABLE I DIFFERENCES BETWEEN SOTIF AND ISO 26262 Aspect ISO 26262 SOTIF Focus System faults and malfunctions Hazards due to functional insufficiencies Applicability All safety-critical systems Primarily ADAS and autonomous systems Hazard Source Hardware and software failure Limitations in functionality, unknown scenarios Methods Fault avoidance and control Scenario-based testing

The paradigm of separate physical subsystems has the advantage of V&V simplification and safety, but the large disadvantage of component skew and material cost. Thus, a large trend has been to build underlying computational fabrics with networking and virtually separate functionality. From a V&V perspective, this means that the virtual backbone which maintains this separation (ex: RTOS) must be verified to a very high standard. Infotainment systems are an example of Electronics Adjacent integration. Generally, there is an independent IT infrastructure working with the safety critical infrastructure, and from a V&V perspective, they can be validated separately. However, the presence of infotainment systems enables very powerful communication technologies (5G, Bluetooth, etc.) where the cyber-physical system can be impacted by external third parties. From a safety perspective, the simplest method for maintaining safety would be to physically separate these systems. However, this is not typically done because a connection is required to provide “over-the-air” updates to the device. Thus, the V&V capability must again verify the virtual safeguards against malicious intent are robust. Finally, the last level of integration is in the context of autonomy. In autonomy, the DBE processes of sensing, perception, location services, path planning envelope the traditional mechanical PBE functionality. As Figure 5 shows, the Execution paradigm consists of four layers of functionality. The inner core, layer 4, is of course the world of physics which has all the nice PBE properties. Layer 3 consists of the traditional actuation and edge sensing functionality which maintains nice PBE properties. As we go to layer 2, there is a combination of software and AI which operate in the DBE-AI world. Finally, the outer design for the experiment V&V layer has the unique challenge of testing a system with fundamentally PBE properties but doing so through a layer dominated by DBE-AI functions.

Fig. 5. Conceptual Layers in Cyber-Physical Systems

V. AUTONOMY V&V CURRENT APPROACHES For safety-critical systems, the evolution of V&V has been closely linked to regulatory standards frameworks such as ISO 26262. Key elements of this framework include: 1) System Design Process: A structured development assurance approach for complex systems, incorporating safety certification within the integrated development process. 2) Formalization: The formal definition of system operating conditions, functionalities, expected behaviors, risks, and hazards that must be mitigated. 3) Lifecycle Management: The management of components, systems, and development processes throughout their lifecycle. The primary objective was to meticulously and formally define the system design, anticipate expected behaviors and potential issues, and comprehend the impact over the product's lifespan. With the advent of conventional software paradigms, safety-critical V&V adapted by preserving the original system design approach while integrating software as system components. These software components maintained the same overall structure of fault analysis, lifecycle management, and hazard analysis within system design. However, certain aspects required extension. For instance, in the airborne domain, standard DO-178C, which addresses “Software Considerations in Airborne Systems and Equipment Certification,” updated the concept of hazard from physical failure mechanisms to functional defects, acknowledging that software does not degrade due to physical processes. Also revised were lifecycle management concepts, reflecting traditional software development practices. Design Assurance Levels (DALs) were incorporated, allowing the integration of software components into system design, functional allocation, performance specification, and the V&V process, akin to SOTIF in the automotive industry. TABLE II CONTRAST OF CONVENTIONAL AND MACHINE LEARNING ALGORITHMS

Conventional Algorithm ML Algorithms Comment Logical Theory No Theory In conventional algorithms, one needs a theory of operation to implement the solution. ML algorithms can often “work” without a clear understanding of exactly why they work. Analyzable Not Analyzable Conventional algorithms are encoded in a way one can see and analyze the software code. Most validation and verification methodologies rely on this ability to find errors. ML algorithms offer no such ability, and this leaves a large gap in validation. Causal Correlation Conventional algorithms have built in causality and ML algorithms discover correlations. The difference is important if one wants to reason at a higher level. Deterministic Non-Deterministic Conventional algorithms are deterministic in nature, and ML algorithms are fundamentally probabilistic in nature. Known Computational Complexity Unknown Computational Complexity Given the analyzable nature of conventional algorithms, one can build a model for computational complexity. That is, how long will it take the algorithm to work. For ML techniques, no generic method exists to evaluate computational complexity.

Moving beyond software, AI has built a “learning” paradigm. In this paradigm, there is a period of training where the AI machine “learns” from data to build its own rules, and in this case, learning is defined on top of traditional optimization algorithms which try to minimize some notion of error. This effectively is data driven software development. However, as Table 2 above shows, there are profound differences between AI software and conventional software. These differences have generated three “elephants in the room” issues: AI component validation, AI Specification, and Intelligent Scaling.

A. AI COMPONENT VALIDATION

Both the automotive and airborne spaces have reacted to AI by viewing it as “specialized Software” in standards such as ISO 8800 [14] and [13]. This approach has the great utility of leveraging all the past work in generic mechanically safety and past work in software validation. However, now, one must manage the issue of how to handle the fact that we have a data generated “code” vs conventional programming code. In the world of V&V, this difference is manifested in three significant aspects: coverage analysis, code reviews, and version control. TABLE III V&V Technique Software AI/ML Coverage Analysis: Code Structure provides basis of coverage No structure Code Reviews: Crowd source expert knowledge No Code to Review Version Control Careful construction/release Very Difficult with data

These differences generate an enormous issue for intelligent test generation and any argument for completeness. This is an area of active research, and two threads have emerged: 1) Training Set Validation: Since the final referenced component is very hard to analyze, one approach is to examine the training set and the ODD to find interesting tests which may expose the cracks between them [16]. 2) Robustness to Noise: Either through simulation or using formal methods [17], the approach is to assert various higher-level properties and use these to test the component. An example in object recognition might be to assert the property that an object should be recognized independent of orientation. Overall, developing robust methods for AI component validation is quite an active and unsolved research topic for “fixed” function AI components. That is, AI components where the function is changing with active version control. Of course, many AI applications prefer a model where the AI component is constantly morphing. Validating the morphing situation is a topic of future research.

B. AI SPECIFICATION

For well-defined systems with an availability of system level abstractions, AI/ML components significantly increase the difficulty of intelligent test generation. With a golden spec, one can follow a structured process to make significant progress in validation and even gate the AI results with conventional safeguards. Unfortunately, one of the most compelling uses of AI is to employ it in situations where the specification of the system is not well defined or not viable using conventional programming. In these Specification Less /ML (SLML) situations, not only is building interesting tests difficult, but evaluating the correctness of the results creates further difficulty. Further, most of the major systems (perception, location services, path planning, etc.) in autonomous vehicles fall into this category of system function and AI usage. To date, there have been two approaches to attack the lack of specification problem: Anti-Spec and AI-Driver. 1) Anti-Spec In these situations, the only approach left is to specify correctness through an anti-spec. The simplest anti-spec is to avoid accidents. Based on some initial work by Intel, there is a standard, IEEE 2846, “Assumptions for Models in Safety-Related Automated Vehicle Behavior” [18] which establishes a framework for defining a minimum set of assumptions regarding the reasonably foreseeable behaviors of other road users. For each scenario, it specifies assumptions about the kinematic properties of other road users, including their speed, acceleration, and possible maneuvers. Challenges include an argument for completeness, a specification for the machinery for checking against the standard, and the connection to a liability governance framework. 2) AI-Driver While IEEE 2846 comes from a bottom-up technology perspective, Koopman/Widen [19] have proposed the concept of defining an AI driver which must replicate all the competencies of a human driver in a complex, real-world environment. Key points of Koopman’s AI driver concept include:

a) Full Driving Capability: The AI driver must handle the entire driving task, including perception (sensing the environment), decision-making (planning and responding to scenarios), and control (executing physical movements like steering and braking). It must also account for nuances like social driving norms and unexpected events. b) Safety Assurance: Koopman stresses that AVs need rigorous safety standards, similar to those in industries like aviation. This includes identifying potential failures, managing risks, and ensuring safe operation even in the face of unforeseen events. c) Human Equivalence: The AI driver must meet or exceed the performance of a competent, human driver. This involves adhering to traffic laws, responding to edge cases (rare or unusual driving scenarios), and maintaining situational awareness at all times. d) Ethical and Legal Responsibility: An AI driver must operate within ethical and legal frameworks, including handling situations that involve moral decisions or liability concerns. e) Testing and Validation: Koopman emphasizes the importance of robust testing, simulation, and on-road trials to validate AI driver systems. This includes covering edge cases, long-tail risks, and ensuring that systems generalize across diverse driving conditions. Overall, it is a very ambitious endeavor and there are significant challenges to building this specification of a reasonable driver. First, the idea of a “reasonable” driver is not even well encoded on the human side. Rather, this definition of “reasonableness” is built over a long history of legal distillation, and of course, the human standard is built on the understanding of humans by other humans. Second, the complexity of such a standard would be very high and it is not clear if it is doable. Finally, it may take quite a while of legal distillation to reach some level of closure on a human like an “AI-Driver.” Currently, the state-of-art for specification is relatively poor for both ADAS and AV. ADAS systems, which are widely proliferated, have massive divergences in behavior and completeness. When a customer buys ADAS, it is not entirely clear what they are getting. Tests by industry groups such as AAA, consumer reports, and IIHS have shown the significant shortcomings of existing solutions [20]. In 2024, IIHS introduced a ratings program to evaluate the safeguards of partial driving automation systems. Out of 14 systems tested, only one received an acceptable rating, highlighting the need for improved measures to prevent misuse and ensure driver engagement [21]. Today, there is only one non process oriented regulation in the marketplace, and this is the NHTSA regulations around AEB [22].

C. INTELLIGENT TEST GENERATION

Recognizing the importance of intelligent scenarios for testing, three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing. 1) Physical Testing Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently [23]. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the MaVV flow can be managed by large databases and supercomputers (DoJo) [24]. By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes. 2) Real-World Seeding Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base [25]. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database [26]. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms. 3) Virtual Testing Another important contribution was ASAM OpenSCENARIO 2.0 [27] which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors. Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, “Standard for Safety for the Evaluation of Autonomous Products.” [28] Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions. UL 4600 is a good step forward, but at the end, it is a process standard, and does not offer any advice on how to exactly solve the “elephants” in the room for AI validation. Overall, nearly all the standards and current regulations are process centric. They focus on the product developer making an argument and either through self-certification or explicit regulator getting approval. This methodology has the Achilles heel that the product owner does not have a method to get past the critical issues, nor does the regulator have a way to access completeness. All of these techniques have moved the state-of-art forward, but there remains a very fundamental issue. For both physical and virtual execution, how does one sufficient scale to reasonably explore the ODD. Further, when performing virtual execution, what level of abstraction is appropriate? Is it better to have abstract models or highly detailed physics-based models? Typically, the answer is dependent on the nature of the verification. If so, how do these abstraction levels connect to each other? A key missing piece is an ability to split the problem into manageable pieces and then recompose the result. This capability has not been developed for cyber-physical systems but has been developed for semiconductor designs.

Cybersecurity

[pczekalski]

Hardware and Sensing Technologies

put your contents here

[karlisberkolds]

Follow those subchapters for more content:

Sensors, Computing Units, and Navigation Systems

[karlisberkolds]

Hardware Integration and Supply Chain Considerations

[karlisberkolds]

Validating Sensors

[rahulrazdan]

Governance, EMC

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

Modern automobiles and transportation infrastructure are introducing active sensor modalities such as radar and lidar at an unprecedented rate. In the visual spectrum, the interference

caused by the production, reflection, and sensing of light is understood very well. This understanding informs the current practices for construction (avoiding mirrors or blinding lights which may impact drivers), active interference between cars (low-beam/high-beam), and environmental factors (dusk, white-out snow conditions). Modern automobiles rely on the lidar and radar as primary sensory inputs. However, the similar constraints on construction, active interference, and environmental factors are not well understood. This paper presents both the theoretical and experimental results of examining the topics of construction, active interference, and worse-case environmental factors for the LIDAR and radar sensor modalities. [ some words around conclusions]

Calibration, Maintenance, and Supply Chain

[rahulrazdan][✓ rahulrazdan, 2025-06-16]

Logistics is even more important today than it was in the early 1800’s. Further, the effectiveness of Defense systems is increasingly driven by sophisticated electronics. As the recent Ukraine conflict reveals, weapons such as precision munitions, autonomous drones, and other similar systems generate asymmetrical advantages on the battlefield. However, all these systems also generate a massive and complex electronics logistical tail which must be managed carefully. For the electronics supply ecosystem, Defense falls into the broader category of Long Lifecycle (LLC) system products.

“Two Speed” Challenge LLC (Long lifecycle) are products which need to be supported for many years – typically anything over five years or more. In this time, these products need to provide legacy part support, and rarely drive very high chip volumes. However, the economics of semiconductor design imply that custom semiconductors only make sense for markets with high volume. Today, this consists largely of the consumer (cell phone, laptop, tablet, cloud, etc) marketplace which are short lifecycle products with typical lifecycle of < 5 years. This hard economic fact has optimized the semiconductor industry towards consumer driven short lifecycle products. This is at odds with the requirement of long lifecycle products, both from frequent End-of-Life obsolescence and long-term product reliability. The reliability is further impacted in Defense by the fact that these components must perform under strenuous external environments with challenging thermal, vibration and even radiation conditions. This “Two Speed” challenge and results in very frequent failure and obsolescence of electronic components.

Move towards “Availability” Contracts in Aerospace & Défense Traditionally in the aerospace and defense industry, an initial contract for development and manufacture is followed by a separate contract for spares and repairs. In the past, there has been a trend towards “availability” contracts where industry delivers a complete Product Service System (PSS). The key challenge in such contracts is to estimate “Whole Life Cost” (WLC) of the product which may span 30 or even 40 years. As one might imagine, this PSS paradigm skyrockets the cost of systems and still is not fool proof because of its need to predict the future. This has led to some embarrassing costs for defense part procurement as compared to the commercial equivalent.

The US Secretary of Defense William Perry memorandum in 1994 resulted in a move towards performance-based specifications which led to the virtual abandonment of the MIL-STD and MIL-SPEC system that had been the mainstay of all military procurement in the US for several decades. Coupled with “Diminishing Manufacturing Sources” (DMS) for military grade components, the imperative was to move towards COTS (Commercial Of the Shelf) components while innovating at system level to cater to the stringent operating environment requirements. The initial reasoning and belief in using COTS to circumvent system costs was effective. However, it did expose defense systems to some key issues.

Key Issues for Defense Industry today Component Obsolescence Primarily as a result of “Two-Speed” challenge described above, components become harder to source over time and even grow obsolete and the rate of discontinuance of part availability is increasing steadily. Many programs such as the F22 stealth fighter, AWACS, Tornado, and Eurofighter are suffering from such component obsolescence. As a result, the OEM’s are forced to design replacements for those obsolete components and face nonrecurring engineering costs as a result. As per McKinsey’s recent estimates [“How industrial and aerospace and defense OEMs can win the obsolescence challenge, April 2022, McKinsey Insights” ], the aggregate obsolescence related nonrecurring costs for military aircraft segment alone are in the range of US $50 billion to US $70 billion.

Whole Life Cost (WLC) As mentioned above, with increasing move towards “availability” contracts in defense and aerospace, one of the huge challenges has been to compute a realistic “Whole Life Cost” (WLC) of the products through the product lifecycle. This leads to massive held inventory costs with associated waste when held inventory is no longer useful. Moreover, any good estimate of WLC will require an accurate prediction of the future.

Reliability Semiconductors for the consumer market are optimized for consumer lifetimes – typically 5 years or so. For LLC markets like Defense, the longer product life in non-traditional environmental situations often leads to product reliability and maintenance issues especially with increased use of COTS components.

Logistics chain in forward deployed areas One of the unique issues in Defense which is further accentuated due to increased move towards “availability” contracts is the logistics nightmare to support equipment deployed in remote forward areas. A very desirable characteristic would be to have “in theatre” maintenance and update capability for electronic systems. The last mile is the hardest mile in logistics.

Future Function Given the timeframes of interest, upgrades in functionality are virtually guaranteed to happen. Since Defense products often have the characteristic of being embedded in the environment, upgrade costs are typically very high. A classic example is one of a satellite where the upgrade cost is prohibitively high. Similarly with weapon systems deployed in forward areas, upgrade costs are prohibitive. Another example is obsolescence of industry standards and protocols and need to adhere to newer ones. In fact, field embedded electronics (more so in defense) require flexibility to manage derivative design function WITHOUT hardware updates. How does one design for this capability, and how does a program manager understand the band of flexibility in defining new products, derivatives, and upgrades?

Solutions for Defense Electronics Supply Chain Challenges Figure 1: Design for Supply Chain What is the solution to these issues? That answer there is the need to build a Design for Supply Chain methodology and associated Electronic Design Automation (EDA) capability.

Just as manufacturing test was optimized by “Designing for Test” and power optimized by “Designing for Power” or performance with “Design for performance” etc, one should be designing for “Supply Chain and Reliability”! What are the critical aspects of Design for Supply Chain capability?

Programmable Semiconductor Parts: Programmable parts (CPU, GPU, FPGA, etc) have the distinct advantages of: Parts Obsolescence: A smaller number of programmable parts minimize inventory skews, can be forward deployed, and can be repurposed around a large number of defense electronic systems. Further, the aggregation of function around a small number of programmable parts raises the volume of these parts and thus minimizes the chances for parts obsolescence. Redundancy for Reliability: Reliability can be greatly enhanced by the use of redundancy within and of multiple programmable devices. Similar to RAID storage, one can leave large parts of an FPGA unprogrammed and dynamically move functionality based on detected failures. Future Function: Programmability enables the use of “over the air” updates which update functionality dynamically. Electronic Design Automation (EDA): To facilitate a design for supply chain approach, a critical piece is the EDA support. Critical functionality required is: Total Cost of Ownership Model: With LLCs, it is very important to consider lifetime costs based on downstream maintenance and function updates. An EDA system should help calculate lifetime cost metrics based on these factors to avoid mistakes which simply optimize near-term costs. This model has to be sophisticated enough to understand that derivative programmable devices often can provide performance/power increases which are favorable as compared to older technology custom devices. Programming Abstractions: Programmable devices are based on abstractions (computer ISA, Verilog, Analog models, etc) from which function is mapped onto physical devices. EDA functionality is critical to maintain these abstractions and automate the process of mapping which optimizes for power, performance, reliability, and other factors. Can these abstractions & optimizations further be extended to obsolescence? Static and Dynamic Fabrics: When the hardware configuration does not have to be changed, EDA functionality is only required for the programming the electronic system. However, if hardware devices require changes, there is a need for a flexible fabric to accept the updates in a graceful manner. The nature of the flexible fabric maybe mechanical (ex..rack-mountable boards) or chemical (quick respins of PCB which maybe done in-field). All of these methods have to be managed by a sophisticated EDA system. These methods are the key to the ease of integration of weapons systems. With the above capability, one can perform proactive logistics management. One of the best practices that can yield rich dividends is to constitute a cross functional team (with representation from procurement, R&D, manufacturing and quality functions) which continuously scans for potential issues. This team can be tasked with developing set of lead indicators to assess components, identify near-term issues, and develop countermeasures. In order for this cross-functional programs to work, the EDA functionality has to be tied into the Product Lifecycle Management (PLM) systems in the enterprise.

Currently, a great deal of system intent is lost across the enterprise or over time as the design team moves to other projects. Thus, even the OEMs who use such proactive obsolescence management best practices are significantly hampered by lack of structured information or sophisticated tools which allow them to accurately predict these issues and plan mitigating actions including last-time strategic buys, finding alternate suppliers and even finding optimal FFF (Fit-Form-Function) replacements. It is imperative that such software (EDA) tool functions are available yesterday.

Summary The semiconductor skew towards short lifecycle products (aka consumer electronics) has created a huge opportunity for the Defense industry to access low-cost electronics. However, this also generates issues when there is a need to support products in excess of 30 years as a result of fast obsolescence and shorter component reliability. What further worsens the situation for Defense are couple of unique situations for them – first, logistics issues in supporting equipment deployed on forward deployed areas and second, the increasing use of “availability” or full Product Service System contracts where establishing WLC (Whole Life Costs) for the equipment becomes critical. The only way this can be solved efficiently is to bring in a paradigm shift to “Design for Supply Chain” EDA innovation is needed to significantly enhance to support these levels of abstractions at system/PCB level and be a catalyst to bring in this paradigm shift. McKinsey estimated a 35% cost reduction in obsolescence related nonrecurring costs by only using a structured albeit still reactive obsolescence management methodology, a proactive “Design for Supply Chain” approach can be truly transformational.

Software Systems and Middleware

put your contents here

[karlisberkolds]

The following chapters contain more details:

Autonomy Software Stacks

[karlisberkolds]

Software Lifecycle and Configuration Management

[karlisberkolds]

Testing Software Systems

[raivo.sell]

Governance Safety Critical systems

[raivo.sell]

Open issues of validating AI components

[raivo.sell]

Perception, Mapping and Localisation

[bertlluk][✓ bertlluk, 2025-06-25]

This chapter explores the perception, mapping, and localization in the context of autonomous vehicles and usage of different sensor modalities. It examines the determination of vehicle position, position and movement of other participants of the traffic, understanding of the surrounding scenes, applications of AI, and possible sources of uncertainty and instability.

The following chapters contain more details:

Object Detection, Sensor Fusion, Mapping, and Positioning

[bertlluk][✓ bertlluk, 2025-06-25]

It examines approaches to detect objects and surrounding environment in different light and weather conditions, making use of different sensors and fusion of different sensor modalities such as cameras, lidars, and radars. It describes the methods of creating maps from sensory data and localizing vehicles relative to these maps (SLAM, PF, Visual Odometry) and using the global navigation systems (GNSS).

AI-based Perception and Scene Understanding

[bertlluk][✓ bertlluk, 2025-06-25]

Advances in AI, especially the convolutional neural network, allow us to process raw sensory information and recognize objects and categorize them into class with higher levels of abstraction (pedestrians, cars, trees etc.). Taking these categories into account, allows autonomous vehicles to understand the scene and reason about the other participants of road traffic and make assumptions about their interactions. This section describes commonly used methods, their advantages, and weaknesses.

Sources of Instability

[bertlluk][✓ bertlluk, 2025-06-25]

There are several sources of uncertainty: sensor noise, model uncertainty, environment randomness, occlusions, adversarial attacks, and participants intention estimation errors. This section categorizes and describes them.

Validation Approaches

[bertlluk]CTU: Help needed. We don't know what was the intent behind this chapter.

Control, Planning, and Decision-Making

put your contents here

[momala]

The following chapters contain more details:

Classical and AI-Based Control Strategies

[rczyba]

Motion Planning and Behavioural Algorithms

[rczyba]

Validation of Control & Planning

[momala]

Simulation & Formal Methods

[momala]

Human-Machine Communication

put your contents here

[raivo.sell]

The following chapters contain more details:

Human Machine Interface and Communication

[raivo.sell]

This chapter explores the specificities of Human-Machine Interaction (HMI) in the context of autonomous vehicles. It examines how HMI in autonomous vehicles differs fundamentally from traditional car dashboards. With the human driver no longer actively involved in operating the vehicle, the challenge arises: how should AI-driven systems communicate effectively with passengers, pedestrians, and other road users?

This section addresses the available communication channels and discusses how these channels must be redefined and implemented to accommodate the new paradigm. Additionally, it considers how various environmental factors—including cultural, geographical, seasonal, and spatial elements—can impact communication strategies.

A concept, the Language of Driving (LoD), will be introduced, offering a framework for structuring and standardizing communication in autonomous vehicle contexts.

Human Perception and Driving

Understanding how humans perceive the world is crucial for autonomous vehicles to effectively communicate and interact with them. This chapter explores how human perception, driven by sensory input and cognitive processing, can inform the development of autonomous perception systems, emphasizing the parallels between human and animal intelligence in recognizing focus, body positioning, gestures, and movement. By examining innate perceptual capabilities such as basic physics calculations and environmental modeling, AVs can better anticipate human behavior and respond appropriately in complex traffic environments.

Cultural and Social Interactions

This chapter explores how AVs might adopt human-like communication methods, such as facial expressions or humanoid interfaces, to effectively interact in complex social driving environments.

Language of Driving

Human communities build languages for cooperative teaming. To participate in the act of cooperative transportation, AVs will have to understand this language. Depending on the level of expectation communicated by the AV, this language may extend into social interaction models.

Passenger Communication

A key requirement of an effective Passenger Communication system is to have in-built fail-safe mechanisms based on the environment. AVSC has worked with SAE ITC to build group standards around the safe deployment of SAE Level 4 and Level 5 ADS and has recently released an “AVSC Best Practice for Passenger-Initiated Emergency Trip Interruption.”However, passenger communication extends beyond emergency stop and call functions. Warnings and explanations of unexpected maneuvers may need to be communicated to passengers even when there is no immediate danger. This should replicate and replace the function that a human bus driver would typically perform in such situations.

Pedestrian Communication

Communication between the car and pedestrians at a crosswalk is a difficult and important problem for automation.

AI role on communication

The role of conventional and LLM based AI in HMI.

Modes of Interactions

[raivo.sell]

Language of Driving Concepts

[raivo.sell]

Safety Concerns and Public Acceptance

[raivo.sell]

Verification & Validation of HMI

[raivo.sell]

Autonomy Validation Tools

put your contents here

The following chapters contain more details:

Overview of V&V Techniques

[raivo.sell]

Testing Infrastructure

[raivo.sell]

Challenges Ahead

[raivo.sell]

Research Outlook

[raivo.sell]