Development of new models for vision-based human activity recognition
- Saleh Ali Alraimi, Adel
- Domènec Puig Valls Directeur/trice
- Miguel Angel García García Co-directeur/trice
Université de défendre: Universitat Rovira i Virgili
Fecha de defensa: 12 avril 2019
- Andrea Fusiello President
- Hatem Abd El-Latif Fatahallah Ibrahim Mahmoud Rashwan Secrétaire
- Vicente Matellán Olivera Rapporteur
Type: Thèses
Résumé
Justification The main goal of the action recognition field is to give computers the ability to recognize human actions in real life videos. The performance of applications, such as surveillance systems and human-computer interaction , mainly depend on the accuracy of human activity recognition systems. Human action recognition is still an open challenging problem in computer vision community. Several methods have been proposed to improve the performance of human action recognition in uncontrolled videos. Bag of words (BOW) based on a set of low level features, such as the histograms of optical flow (HOF), histograms of oriented gradients (HOG) and motion boundary histograms (MBH) have become very common video representations for action recognition . These models are insensitive to the position and orientation of the objects in the image. In addition, they have fixed length vectors independently of the number of objects and number of frames in each video. Moreover, they have a poor localization of the objects and actions in the videos. The use of local information can be very useful to improve the recognition rate . Constructing robust models to recognize free-form activities of the same class is still an open problem One of the sources of misclassification is the wide margin of variation inside the same class. Other sources can be the change in viewpoint, scale and background clutter. In addition, there are other high level unmeasurable factors that can play a role in human action recognition, such as human-objects interaction, human poses and scene context. Main Approaches Indeed, most of the development in the action recognition area over the last years has been related to Ł three approaches: - The first approach is the definition of local features, such as spatio-temporal features and densely sampled local visual features . We can also mention interest points, motion-based gradient descriptors and dense trajectories . - The second approach is concentrated on the exploration of encoding methods, which has a long history of success in the object recognition field. A good example is Fisher vectors . - The third approach is related to the exploration of using deep learning models. Thesis objectives The main objectives of this thesis are: 1. To address the analysis of human activities and develop suitable machine learning methods. We consider models usually applied in this research area to analyze streams of video data. 2. Unfortunately, the lack of widely good data sets for activity recognition limits the possibility of comparing different machine learning models in this field. On the other hand, such a comparison is very important since it would help researchers to choose the most appropriate models in the different applications of human activity recognition. To that end, we extend the analysis of the existing models with different data sources, and we exploit the knowledge resulting from other fields such as semantic segmentation. 3. To use deep learning models to perform the body part segmentation task in order to obtain a completely abstracted recognition system.