Failing to Learn: Autonomously Identifying Perception Failures for Self-driving Cars

ABSTRACT

 

One of the major open challenges in self-driving cars is the ability to detect cars and pedestrians to safely navigate in the world. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application such as autonomous driving, the error rates of the current state-of the- art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels. In this paper, we propose an automated method to identify mistakes made by object detectors without ground truth labels. We show that inconsistencies in object detector output between a pair of similar images can be used as hypotheses for false negatives (e.g. missed detections) and using a novel set of features for each hypothesis, an off-the-shelf binary classifier can be used to find valid errors. In particular, we study two distinct cues – temporal and stereo inconsistencies – using data that is readily available on most autonomous vehicles. Our method can be used with any camera-based object detector and we illustrate the technique on several sets of real world data. We show that a state-of-theart detector, tracker and our classifier trained only on synthetic data can identify valid errors on KITTI tracking dataset with an Average Precision of 0:94. We also release a new tracking dataset with 104 sequences totaling 80; 655 labeled pairs of stereo images along with ground truth disparity from a game engine to facilitate further research.

 

 

EXISTING SYSTEM :

As robots aspire for long-term autonomous operations in complex dynamic environments, the ability to reliably take mission-critical decisions in ambiguous situations becomes critical. This motivates the need to build systems that have situational awareness to assess how qualified they are at that moment to make a decision. We call this self-evaluating capability as introspection. we take a small step in this direction and propose a generic framework for introspective behavior in perception systems. Our goal is to learn a model to reliably predict failures in a given system, with respect to a task, directly from input sensor data. We present this in the context of vision-based autonomous MAV flight in outdoor natural environments, and show that it effectively handles uncertain situations. Deep learning-based object detector approaches have enabled great advances in using camera imagery to detect and classify objects. But for a safety critical application such as autonomous driving, the error rates of the current state-of the- art are still too high to enable safe operation. Moreover, the characterization of object detector performance is primarily limited to testing on prerecorded datasets. Errors that occur on novel data go undetected without additional human labels.

 

 

PROPOSED SYSTEM :

We propose that inconsistencies in object detector output between a pair of similar images (either spatially or temporally), if properly filtered, can be used to identify errors as a vehicle traverses the world. The power of this should not be understated. It means that even miles driven by humans for testing purposes can be used to validate object detectors in an unsupervised manner and furthermore any archives of logged sensor data can be mined for the purposes of evaluating a vehicle’s perception system. The key contributions of our paper are as follows: 1) We present the first full system, to the best of our knowledge, that autonomously detects errors made by single frame object detectors on unlabeled data; 2) We show that inconsistencies in object detector output between pairs of similar images – spatially or temporally – provides a strong cue for identifying missed detections; 3) We pose the error detection problem as binary classification problem where for each inconsistent detection, we propose novel set of meta classification features that are used to predict the likelihood of the inconsistency being a real error; 4) In conjunction with additional localization data available in AV systems we show that our system facilitates the analysis of correlations to geo-locations in errors; 5) We release a tracking dataset with sequences of stereo images gathered at 10 Hz with ground truth labels following the KITTI format with 104 sequences totaling 80; 655 pairs of images along with ground truth disparity maps from a game engine making it the largest publicly available dataset of its kind.

 

 

CONCLUSION :

We presented a system for self-driving cars that enables checking for inconsistency using two distinct mechanisms: temporal and stereo cues. Our proposed system provides a means of identifying false negatives made by single frame object detectors on large unlabeled datasets. We propose that finding object detector errors can be posed as a binary classification problem. We use an off-the-shelf multi-  object tracker to construct tracklets and each tracklet without an associated detection is used as a hypothesis. We use stereo disparity to shift detections from one camera view to the other and use the unassociated shifted detections as hypothesis for missed objects. Furthermore, we showed that our system (detector, tracker and classifier) trained only on synthetic data can find errors made in the KITTI dataset with an AP score of 0:94 for RRC detector. This offers the promising of bootstrapping dataset labels for new domains through a process of synthetic training and failure mining in real data. Through extensive experiments we have shown that even the state-of-the-art object detectors make systematic errors and we can reliably localize these in a global reference frame.

Naturally, the next step is to make object detectors learn from these identified mistakes. This is a deceptively hard task for CNN based object detectors. In a supervised learning setting, the images are assumed to be exhaustively labeled. Any region in the image without a label is assumed to be a negative sample while the labels themselves are considered positive samples with tight bounding boxes. While our method reliably detects false negatives, it does not always detect all mistakes in the image. We plan to address how best to learn from this partial information in our subsequent research work. Additionally, we plan to incorporate free space computation from the path of the vehicle and from active sensor returns like LIDAR to identify false positives to further improve our assessment and understanding of modern object detectors at the fleet level.