Testing Infrastructure

[raivo.sell]

As discussed earlier, generic V&V process consists of testing the product under test within the ODD. This is generally done with a number of techniques. The central paradigm is to generate a test, execute the test, and have a clear criteria for correctness. Three major styles of intelligent test generation are currently active: physical testing, real-world seeding, and virtual testing.

Physical Testing :Typically, physical scaling is the most expensive method to verify functionality. However, Tesla has built a flow where their existing fleet is a large distributed testbed. Using this fleet, Tesla's approach to autonomous driving uses a sophisticated data pipeline and deep learning system designed to process vast amounts of sensor data efficiently. In this flow, the scenario under construction is the one driven by the driver, and the criterion for correctness is the driver's corrective action. Behind the scenes, the global verification flow can be managed by large databases and supercomputers (DoJo) . By employing this methodology, Tesla knows that its scenarios are always valid. However, there are challenges with this approach. First, the real world moves very slowly in terms of new unique situations. Second, by definition the scenarios seen are very much tied to the market presence of Tesla, so not predictive of new situations. Finally, the process of capturing data, discerning an error, and building corrective action is non-trivial. At the extreme, this process is akin to taking crash logs from broken computers, diagnosing them, and building the fixes.
Real-World Seeding: Another line of test generation is to use physical situations as a seed for further virtual testing. Pegasus, the seminal project initiated in Germany, took such an approach. The project emphasized a scenario-based testing methodology which used observed data from real-world conditions as a base. Another similar effort comes from Warwick University with a focus on test environments, safety analysis, scenario-based testing, and safe AI. One of the contributions from Warwick is Safety Pool Scenario Database. Databases and seeding methods, especially of interesting situations, offer some value, but of course, their completeness is not clear. Further, databases of tests are very susceptible to be over optimized by AI algorithms.
Virtual Testing: Another important contribution was ASAM OpenSCENARIO 2.0 which is a domain-specific language designed to enhance the development, testing, and validation of Advanced Driver-Assistance Systems (ADAS) and Automated Driving Systems (ADS). A high-level language allows for a symbolic higher level description of the scenario with an ability to grow in complexity by rules of composition. Underneath the symbolic apparatus are pseudo-random test generation which can scale the scenario generation process. The randomness also offers a chance to expose “unknown-unknown” errors.

Beyond component validation, there have been proposed solutions specifically for autonomous systems such as UL 4600, “Standard for Safety for the Evaluation of Autonomous Products.” Similar to ISO 26262/SOTIF, UL 4600 has a focus on safety risks across the full lifecycle of the product and introduces a structured “safety case” approach. The crux of this methodology is to document and justify how autonomous systems meet safety goals. It also emphasizes the importance of identifying and validating against a wide range of real-world scenarios, including edge cases and rare events. There is also a focus on including human-machine interactions.