A stereo vision system based on soft computing techniques for human robot interaction

  1. PAUL MIRANDA DE OLIVEIRA, Rui Filipe
Supervised by:
  1. Eugenio Aguirre Molina Director
  2. Rafael Muñoz Salinas Director
  3. Miguel García Silvente Director

Defence university: Universidad de Granada

Fecha de defensa: 30 September 2013

Committee:
  1. Antonio González Muñoz Chair
  2. Jesús Chamorro Secretary
  3. Miguel Cazorla Quevedo Committee member
  4. Alexandre Bernardino Costa Committee member
  5. Vicente Matellán Olivera Committee member

Type: Thesis

Abstract

ENGLISH: ABSTRACT Introduction The main goal of this thesis is the development of visual techniques that could be useful in order to establish a natural interaction between people and robots. In this context, ¿natural¿ interaction means an interaction similar to the ones existing between humans. Therefore, efforts were put in making it possible for a robot equipped with a Stereo Vision (SV) system to study and analyse the behavior of those people which are located in its surroundings. The motivation behind this goal is to give robots the ability to behave and choose between actions as any human would do. This means performing several tasks such as: being able to detect and track people on the surroundings of the robot and accurately detecting who is potentially interested on the actions executed by the robot and/or responding to them. Furthermore, by doing so, robots may use their resources more adequately and even improve their decision capabilities and communication methodologies while achieving a kind of behavior similar to the human behavior. Methods To achieve this kind of Human-Robot Interaction (HRI) different techniques are detailed. These techniques contribute to solve several issues inherent to this field. In particular, Soft Computing (SC) techniques are employed to deal with uncertainty and vagueness as well as to represent variables and rules in a human oriented way. Image analysis techniques are also employed to extract relevant information from the scene. All of them allow the enhancing of the socialisation of robots. The purpose of this work is twofold. First, detection and tracking of people that are located in the surroundings of the robot, are done. Second, computing whether a person is interested in interacting with the robot, requesting its attention or responding to its actions, is carried out. This is done by analysing typical interaction cues between humans such as: the distance between interlocutors, head pose, arms shaking, head shaking/ nodding and smiling. To achieve the first goal, two different methods are presented: one based in a probabilistic approach and a second one based on a ¿possibilistic¿ approach. The probabilistic method presents a novel approach for person tracking which combines depth, color and gradient information based on stereo vision. The degree of confidence assigned to depth information in the tracking process varies according to the amount of stereo information found in the disparity map. A novel confidence measure is defined for it and the tracking is carried out using Particle Filter (PF) techniques. The second method, based on a possibilistic approach, is employed to add more information based on expert knowledge, when evaluating the particles, and without being confined to the probabilistic models. This approach also uses Fuzzy Logic (FL) when managing stereo information in order to improve the people detection phase. Thus, in the people detection phase, two fuzzy systems are used to filter out false positives of a face detector. Then, in the tracking phase, a new Fuzzy Logic based Particle Filter (FLPF) is proposed to fuse stereo and color information assigning different confidence levels to each of these information sources. Information regarding depth and occlusion is used to create these confidence levels. This way, the system is able to keep track of people, in the reference camera image, even when either stereo information or color information is confusing or not reliable. Considering a robot as an intelligent system, the determination of some typical interaction situations is an interesting ability to implement. Therefore, to achieve the second goal, a method based in several cues, namely the distance and angle towards the robot and the person head pose, is presented. The head pose is estimated in realtime by a view based approach using Support Vector Machines (SVM) while a Fuzzy System(s) (FS) is used to compute the final interest value, based on the three mentioned variables. Whenever the level of interest achieves a high value, the person is analysed in more detail to detect the position and the motions of the arms as well as whether the person is shaking or nodding the head. This information is managed by a fuzzy system in order to detect a possible interest demand or the intention of the person to say yes or no using his/her head. Some of the above mentioned sources of information are used together with smile detection, in the last work mentioned in this thesis, to build a system based on FL which is able to measure certain types of human response. As the reliability of the visual information detected by the system mainly depends on the distance of the person towards the camera, we prioritise different visual cues according to the distance of the user towards the robot. The human response is computed by means of a hierarchical fuzzy system that is able to deal with the uncertainty and vagueness of the measures depending on the distance of person. This human response measure is used for detecting the person or people which are responding to the social interactions proposed by the robot and it might be also used to improve or adjust the interaction skills of the robot in the future. Conclusions This thesis has presented contributions in different areas of the Soft Computing (SC), Computer Vision and Human-Robot Interaction (HRI) fields. Efforts have been focused in the problem of people detection and tracking which could be considered a first step before developing any other Human-Robot Interaction (HRI) techniques. Additionally, we have proposed a novel approach to detect different kinds of human responses interacting with a robot. We may then conclude that the 4 main contributions of this thesis are: - The development of a fast stereo tracking algorithm using a confidence measure. The confidence measure is employed to modify the probability distribution function employed for weighting the particles in the particle filtering algorithm. This proposal is robust and allows to manage the uncertainty associated to the disparity information. - The development of a fuzzy stereo tracking algorithm. In this proposal not only the uncertainty associated to disparity information is managed. The managing of the vagueness associated to the rest of sources of information is considered too. - A new fuzzy system that allows the visual detection of interaction demands. A level of interest is computed in realtime by a view based approach using Support Vector Machines. - The proposal of a hierarchical fuzzy system to measure human response using stereo vision. The hierarchical fuzzy system is able to deal with the uncertainty and vagueness of the measures depending on the distance of the tracked person.