Road scene interpretation for autonomous navigation fusing stereo vision and digital maps

Supervised by:
  1. Miguel Angel Sotelo Vázquez Director
  2. David Fernández Llorca Co-director

Defence university: Universidad de Alcalá

Fecha de defensa: 23 September 2016

  1. Sebastián Sánchez Prieto Chair
  2. Ignacio Parra Alonso Secretary
  3. Jose Eugenio Naranjo Hernández Committee member
  4. José María Armingol Moreno Committee member
  5. Fawzi Nashashibi Committee member

Type: Thesis

Teseo: 525173 DIALNET lock_openTESEO editor


A stereo vision based road detection method is presented in this thesis. Machine learning is widely applied to solve computer vision problems, particularly, the applied technique is boosting, which internally uses decision trees to classify every pixel of the image as road or non road pixels. The feature set includes information provided by a digital navigation map, 3D stereo vision, color and grayscale cameras. The features obtained from the grayscale camera are road markings, Local Binary Patterns (LBP) and Histogram of Oriented Gradients (HOG). Color cameras provide an illuminant invariant image and a shadow detection function. In addition, Hue Saturation Value (HSV) and a vegetation detector is developed to discard green areas of the scene. The stereo cameras have a very important role because they supply 3D information to the system. Some features that take advantage of 3D points are normal vectors and curvature values. A novel method for the specific task of curb detection is developed.The basis of the curb detector is the curvature feature because it describes the variation of the road shape even in presence of small curbs.The function is able to detect curbs of 3 cm height up to 20 meters whenever the curb is connected in the curvature image. Other obstacles such as vehicles, walls or trees are also detected using stereo vision methods. A new approach is created to merge features that describe road limits to a new feature that describes road surface. The new feature uses road markings, curbs, obstacles and vegetation areas to obtain a road model with the additional information of the number of lanes provided by the digital navigation map. The originality of the presented method is the point where the road limits are detected from. Other methods create radial rays from the bottom center of the image until the ray reach an obstacle. Our proposal finds the road limits from a different point of view, its rays start from the vanishing point along the image and their accumulated values are analyzed to obtain a road model. Another important feature is obtained from a digital navigation map. The aim is to get a prior model of the road based on the GPS position and the information extracted from a digital map. The uncertainty around the correct position is taken into account during the modeling process and the road width is precisely adapted thanks to the radial ray road model. Several tests have been deployed with different classifiers based on decision trees to choose the best type of classifier. After all, the features mentioned before are integrated into a boosting classifier. It generates a probability to be road for every pixel, which is used into a Conditional Random Field (CRF) to filter the classifier response and obtain a smoother result. The metric for the classifiers evaluation is the F1 − score, which is the harmonic mean of precision and recall. The system is evaluated in the image plane, which is the most common approach in the literature. However, in a vehicle scenario, its control stage usually happens in a 2D Bird’s Eye View (BEV). The KITTI benchmark has a ranking sorted by F1 − score calculated on the BEV images. In order to compare our system with other algorithms in an international benchmark, the same evaluation method is adopted