Testing Infrastructure

[raivo.sell] Recognizing the importance of intelligent scenarios for testing, three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing. 1) Physical Testing Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently [23]. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the MaVV flow can be managed by large databases and supercomputers (DoJo) [24]. By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes. 2) Real-World Seeding Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base [25]. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database [26]. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms. 3) Virtual Testing Another important contribution was ASAM OpenSCENARIO 2.0 [27] which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors. Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, “Standard for Safety for the Evaluation of Autonomous Products.” [28] Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions. UL 4600 is a good step forward, but at the end, it is a process standard, and does not offer any advice on how to exactly solve the “elephants” in the room for AI validation. Overall, nearly all the standards and current regulations are process centric. They focus on the product developer making an argument and either through self-certification or explicit regulator getting approval. This methodology has the Achilles heel that the product owner does not have a method to get past the critical issues, nor does the regulator have a way to access completeness. All of these techniques have moved the state-of-art forward, but there remains a very fundamental issue. For both physical and virtual execution, how does one sufficient scale to reasonably explore the ODD. Further, when performing virtual execution, what level of abstraction is appropriate? Is it better to have abstract models or highly detailed physics-based models? Typically, the answer is dependent on the nature of the verification. If so, how do these abstraction levels connect to each other? A key missing piece is an ability to split the problem into manageable pieces and then recompose the result. This capability has not been developed for cyber-physical systems but has been developed for semiconductor designs.

en/safeav/avt/infrastruct.1751250725.txt.gz · Last modified: 2025/06/30 02:32 by rahulrazdan