Safety Before Autonomy

State of Autonomy | 2020

There is a growing consensus in the autonomous driving community: Urban self-driving—the biggest market for AVs—is harder than previously assumed. We can't annotate-train-repeat our way around it, and any sufficiently complex, rules-based system will eventually run into its long-tail blind spots in the real world. Once met with the blind spot, it will lead to trip failure, which renders the self-reported disengagement metric useless, since one single disengagement jeopardizes the safety of an entire trip.

What do we mean by rules based ?

Driving rules extracted from or embedded into an HD map
Hard coding country, city and culture specific rules
Detecting vehicles, pedestrians and their motion from the sensor suite
The planner ingests 1, 2 and 3, then outputs commands to control the vehicle

Patching Up

Why is this problematic ?

To see the shortcomings of rules-based driving, we draw an analogy with language and conversation. We equate rules of driving to grammar of a language and navigation to holding a conversation. More specifically, let's call:

01—The Global Grammar

Stop sign, speed limit, lane marking, drivable area etc.
Applicable universally and form the basis of legal driving

02—The Local Grammar

Right turn on red & Pittsburgh Left (drive coding)
Hyper local additions to the Global Grammar, only applicable in specific geographies or time

03—The Conversation

Vehicles, pedestrians, geography, visibility etc
An ongoing conversation we are trying to navigate

Do we, as a society, flout the rules of grammar when navigating a conversation? Are there long pauses? Interruptions? Mispronunciations? Illegal sentences? Passive assertion? Slang? The answer is: yes, we do.

Rules-based self-driving is tantamount to teaching English language grammar to a machine and expecting it to navigate a real conversation. This is with the self-reported metric being how many times we ask the human to complete or correct the machine-generated sentence in a conversation (disengagement). The grammar and the vocabulary can be tediously extended and exceptions to the rules can be added to make the conversation more natural. This doesn't scale well.

(1.) A typical crosswalk is an example of Global Grammar. Its meaning is understood universally. (2.) These vehicles are allowed to turn right on red, an example of this locality's Local Grammar. (3.) Pedestrians and vehicles navigate the Local and … — **(1.)** A typical crosswalk is an example of Global Grammar. Its meaning is understood universally. **(2.)** These vehicles are allowed to turn right on red, an example of this locality's Local Grammar. **(3.)** Pedestrians and vehicles navigate the Local and Global Grammars in order to hold Conversation, i.e. travel safely. | Original photo by Rodolfo Tanno.

Event Horizon

In the last two to four years, machine translation and language modeling has moved from expertly engineered models to data-driven end-to-end transformers. They have been successfully applied to search to handle unseen queries. We believe autonomous driving has its end-to-end moment on the horizon. However, before we as a community arrive there, we have an important question to resolve.

For language models, high quality peer-reviewed datasets: books, journals, newspapers, Reddit and natural conversation transcripts are used. All malformed or undesired sentences are weeded out by peer review. What is the equivalent paradigm for end-to-end autonomous driving?

Looking closely at the analogy—how do we learn a language? We learn with guidance, correction, feedback, curriculum design and demonstrative examples from a teacher. We get scored on our writing, reading, and comprehension skills, and then graduate to a higher grade on meeting the requirements.

Human in the Loop

At Yaak, we use the student-teacher paradigm for learning how to drive with end-to-end models. Like learning a language, actionable feedback on its weak spots, demonstrations from experts and a safety metric (grading) help us answer the question: Is the autonomous stack ready for safe, urban driving?

Before we can quantify safe autonomous driving, we asked ourself an obvious question: Can we even quantify safe human driving? How do we score driving skills? Are some driving tasks more difficult than others? What are the different grades in a driving school? Are some drivers better than others?

To answer this, we partner with driving schools to help them assess their students’ progress. EU driving schools provide their students a standardized program which includes:

15+ hours of mandatory driving lessons with an instructor (expert)
Standard driving curriculum (theoretical and practical)
Verbal feedback from instructors to the students during the lesson
Simulation rigs to improve their driving (optional)

After finishing the driving school program, the students are assessed with a state-mandated driving test. One in three driving students fail their first driving test, after which they have to undergo additional driving lessons. On average, a student needs 18+ hours of additional driving lessons (in the EU) before passing their first driving test. This highlights a need for better feedback tools, driving curriculum design, and metrics for assessing a student's readiness for their driving test.

Bird’s-eye view of a simplified driving lesson. The instructor (blue) provides an expert demo drive, which the student has to reproduce. The student (red) is inexperienced and cannot fully reproduce the drive. Our technology layer quantifies the stu… — Bird’s-eye view of a simplified driving lesson. The instructor (blue) provides an expert demo drive, which the student has to reproduce. The student (red) is inexperienced and cannot fully reproduce the drive. Our technology layer quantifies the student's driving skills by aggregating the errors (green).

So You Think You Can Drive?

At Yaak, we build the technology layer and tools for instructors and driving schools that equip them with actionable metrics by answering the following questions before the student takes the driving test:

How does the student compare with an expert instructor in driving?
What are the weak spots in their driving? Parallel parking, unprotected turns etc.
Is the student ready for urban driving on their own?

Our technology layers help driving schools:

Understand the progression or regression of a student's driving
Redesign their driving curriculum to focus on students' proven weak points
Put students at par with their instructors

Yaak's technology layer, which helps assess and improve a student's driving before their driving test, provides oversight for our autonomous stack. It’s assessed on the same metrics as students before its real world deployment:

Hows does the autonomous stack compare with an expert instructor in driving?
What are the weak spots in its driving? Parallel parking, unprotected turns etc.
Is the autonomous stack ready for urban driving on its own?

Licensed to Drive

Having built the technology layer for quantifying safe human driving, any autonomous stack can be validated for safety and provided with actionable feedback in the form of a re-designed curriculum (expert demonstrations/training data). On the journey to build a safe autonomous vehicle, we first focus on quantifying and improving the safety of the next generation of human drivers.

Safety before autonomy.

— The Yaak Team

Our technology layer—validated by quantifying safe human driving—evaluates the candidates (red) from an autonomous stack's path planner, highlighting the path(s) which it deems the safest (green). An autonomous stack which is premature would produce… — Our technology layer—validated by quantifying safe human driving—evaluates the candidates (red) from an autonomous stack's path planner, highlighting the path(s) which it deems the safest (green). An autonomous stack which is premature would produce candidates that get rejected by our technology layer, and thus not meet the requirements for urban deployment. | Original photo by Eduardo Zmievski.