Discriminative convolutional Fisher vector network for action recognition

Posted on Sat 05 August 2017 in research • Tagged with human action recognition, deep learningLeave a comment

We uploaded our paper entitled Discriminative convolutional Fisher vector network for action recognition to arXiv. You can check out the paper by following this link.

Abstract

Discriminative convolutional Fisher vector network for action recognition

Petar Palasek, Ioannis Patras

In this work we propose a novel neural network architecture for the problem of human action recognition in videos. The proposed architecture expresses the processing steps of classical Fisher vector approaches, that is dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction, as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, having them defined as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. Furthermore, we show that the proposed architecture can be used as a replacement for the fully connected layers in popular convolutional networks achieving a comparable classification performance, or even significantly surpassing the performance of similar architectures while reducing the total number of trainable parameters by a factor of 5. We show that our method achieves significant improvements in comparison to the classical chain.


New publication added!

Posted on Sat 05 August 2017 in research • Tagged with human pose estimation, deep learningLeave a comment

Our work on Deep globally constrained MRFs for Human Pose Estimation was accepted at the International Conference on Computer Vision (ICCV 2017) which will be held from October 22nd to 29th, 2017 in Venice, Italy.

Abstract

Deep globally constrained MRFs for Human Pose Estimation

Ioannis Marras, Petar Palasek, Ioannis Patras

This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. We propose a coarse to fine architecture that addresses shortcomings of the baseline architecture in [Tompson2014] that stem from the fact that large inaccuracies of its coarse ConvNet cannot be corrected by the refinement ConvNet that refines the estimation within small windows of the coarse prediction. We overcome this by introducing a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints on the relative locations of the body joints. We propose an architecture in which a) the filters that implement the message passing in the MRF inference are factored in a way that constrains them by a low dimensional pose manifold the projection to which is estimated by a separate branch of the proposed ConvNet and b) the strengths of the pairwise joint constraints are modeled by weights that are jointly estimated by the other parameters of the network. The proposed network is trained in an end-to-end fashion. Experimental results show that the proposed method improves the baseline model and provides state of the art results on very challenging benchmarks.


New publication added!

Posted on Tue 06 June 2017 in research • Tagged with background modelling, deep learningLeave a comment

Our work on Background Modelling Based on Generative Unet was accepted at the Analysis of video and audio “in the Wild” Workshop , an international workshop organised in conjuction with IEEE AVSS 2017, that will be held on August 29 in Lecce, Italy.

Abstract

Background Modelling Based on Generative Unet

Ye Tao, Petar Palasek, Ioannis Patras

Background Modelling is a crucial step in background/foreground detection which could be used in video analysis, such as surveillance, people counting, face detection and pose estimation. Most methods need to choose the hyper parameters manually or use ground truth background masks (GT). In this work, we present an unsupervised deep background (BG) modelling method called BM-Unet which is based on a generative architecture that given a certain frame as input it generates as output the corresponding background image - to be more precise, a probabilistic heat map of the colour values. Our method learns parameters automatically and an augmented version of it that utilises colour, intensity differences and optical flow between a reference and a target frame is robust to rapid illumination changes and camera jitter. Besides, it can be used on a new video sequence without the need of ground truth background/foreground masks for training. Experiment evaluations on challenging sequences in SBMnet data set demonstrate promising results over state-of-the-art methods.