Publications

publications
[1] Petar Palasek and Ioannis Patras. Discriminative convolutional Fisher vector network for action recognition. arXiv preprint arXiv:1707.06119, 2017. [ bib | http ]
In this work we propose a novel neural network architecture for the problem of human action recognition in videos. The proposed architecture expresses the processing steps of classical Fisher vector approaches, that is dimensionality reduction by principal component analysis (PCA) projection, Gaussian mixture model (GMM) and Fisher vector descriptor extraction, as network layers. By contrast to other methods where these steps are performed consecutively and the corresponding parameters are learned in an unsupervised manner, having them defined as a single neural network allows us to refine the whole model discriminatively in an end to end fashion. Furthermore, we show that the proposed architecture can be used as a replacement for the fully connected layers in popular convolutional networks achieving a comparable classification performance, or even significantly surpassing the performance of similar architectures while reducing the total number of trainable parameters by a factor of 5. We show that our method achieves significant improvements in comparison to the classical chain.

[2] Ioannis Marras, Petar Palasek, and Ioannis Patras. Deep refinement convolutional networks for human pose estimation. In Automatic Face & Gesture Recognition (FG 2017), 2017 12th IEEE International Conference on, pages 446-453. IEEE, 2017. [ bib | DOI | http ]
This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. The proposed coarse to fine architecture addresses shortcomings of the baseline architecture that stem from the fact that large inaccuracies of its coarse ConvNet cannot be corrected by the refinement ConvNet that refines the estimation within small windows of the coarse prediction. This is achieved by a) changes in architectural parameters that both increase the accuracy of the coarse model and make the refinement model more capable of correcting the errors of the coarse model, b) the introduction of a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints and c) a training scheme that adapts the data augmentation and the learning rate according to the difficulty of the data examples. The proposed architecture is trained in an end-to-end fashion. Experimental results show that the proposed method improves the baseline model and provides state of the art results on the FashionPose [8] and MPII benchmarks [1].

[3] Petar Palasek and Ioannis Patras. Action recognition using convolutional restricted Boltzmann machines. In Proceedings of the 1st International Workshop on Multimedia Analysis and Retrieval for Multimodal Interaction, MARMI '16, pages 3-8, New York, NY, USA, 2016. ACM. [ bib | DOI | http ]
In this work we study deep learning architectures for the problem of action recognition in image sequences focusing on generative neural networks, namely the convolutional extension of restricted Boltzmann machines (RBMs). We first use a stack of convolutional restricted Boltzmann machines to learn and extract features from sequences of images in an unsupervised way, and then use them for the task of action classification. We modify the energy function of the convolutional RBM in such a way that the training updates reported in the literature follow directly from the differentiation of the objective function, which we define in terms of the free energy function. This is in contrast to other works on convolutional RBMs in the literature whose update equations do not directly follow from a well defined energy function or optimization framework without any ad hoc normalizations. We show that the representations that are derived from unsupervised training of the RBMs have very similar or better descriptive power than hand-designed image descriptors and give competitive performance in the problem of action recognition.

Keywords: action recognition, convolutional RBM, unsupervised learning
[4] Petar Palasek, Heng Yang, Zongyi Xu, Navid Hajimirza, Ebroul Izquierdo, and Ioannis Patras. A flexible calibration method of multiple kinects for 3d human reconstruction. In Multimedia & Expo Workshops (ICMEW), 2015 IEEE International Conference on, pages 1-4. IEEE, 2015. [ bib | DOI | http ]
In this paper, we present a simple yet effective calibration method for multiple Kinects, i.e. a method that finds the relative position of RGB-depth cameras, as opposed to conventional methods that find the relative position of RGB cameras. We first find the mapping function between the RGB camera and the depth camera mounted on one Kinect. With such a mapping function, we propose a scheme that is able to estimate the 3D coordinates of the extracted corners from a standard calibration chessboard. To this end, we are able to build the 3D correspondences between two Kinects directly. This simplifies the calibration to a simple Least-Square Minimization problem with very stable solution. Furthermore, by using two mirrored chessboard images on a thin board, we are able to calibrate two Kinects facing each other, something that is intractable using traditional calibration methods. We demonstrate our proposed method with real data and show very accurate calibration results, namely less than 7mm reconstruction error for objects at a distance of 1.5m, using around 7 frames for calibration.

[5] Petar Palasek, Petra Bosilj, and Sinisa Segvic. Detecting and recognizing centerlines as parabolic sections of the steerable filter response. In 2011 Proceedings of the 34th International Convention MIPRO, pages 903-908. IEEE, 2011. [ bib | http ]
This paper is concerned with detection and recognition of road surface markings in video acquired from the driver's perspective. In particular, we focus on centerlines which separate the two road lanes with opposed traffic directions, since they are often the only markings in many urban and suburban roads. The proposed technique is based on detecting parabolic sections of the thresholded steerable filter response in inverse perspective images. The technique has been experimentally evaluated on production videos acquired from moving service vehicles. The obtained results are provided and discussed.

[6] Petra Bosilj, Petar Palasek, Bojan Popovic, and Daria Stefic. Simulation of a Texas Hold'Em poker player. In 2011 Proceedings of the 34th International Convention MIPRO, pages 1628-1633. IEEE, 2011. [ bib | http ]
Imperfect information environments are amongst common research subjects in the field of Artificial Intelligence. A game of poker is a good example of such an environment. As the popularity of the game grew, so did the interest in implementing a functioning automatized poker player. Approaches to this problem include various Machine Learning techniques like Bayesian decision networks, various Case-based reasoning (CBR) techniques and reinforcement learning. For a player to play well it is not enough to know just the probability estimates of one's own hand. A player must adjust his strategy according to his estimate of the opponents' strategies and an estimate of opponents' hand strength. This paper explores the usage of the k &x2014; Nearest Neighbors technique, an example of CBR techniques, in implementing an automatized poker player. As a result, an average player able to cope with most in-game situations was developed. The main difference from a model based on optimal mathematical play is that the developed player seems more human, which makes its actions harder to predict. Numerous simulations on the developed testing model show that a small but stable profit is gained by the implemented automatized player.

[7] A Bulovic, D Bucar, P Palasek, B Popovic, A Trbojevic, L Zadrija, I Kusalic, K Brkic, Z Kalafatic, and S Segvic. Streamlining collection of training samples for object detection and classification in video. In MIPRO 2010 Proceedings of the 33rd International Convention, pages 728-733. IEEE, 2010. [ bib | http ]
This paper is concerned with object recognition and detection in computer vision. Many promising approaches in the field exploit the knowledge contained in a collection of manually annotated training samples. In the resulting paradigm, the recognition algorithm is automatically constructed by some machine learning technique. It has been shown that the quantity and quality of positive and negative training samples is critical for good performance of such approaches. However, collecting the samples requires tedious manual effort which is expensive in time and prone to error. In this paper we present design and implementation of a software system which addresses these problems. The system supports an iterative approach whereby the current state-of-the-art detection and recognition algorithms are used to streamline the collection of additional training samples. The presented experiments have been performed in the frame of a research project aiming at automatic detection and recognition of traffic signs in video.

[8] Petar Palasek. Visual tracking of soft tissue targets in sequences of 3d ultrasound images. Master's thesis, Faculty of Electrical Engineering and Computing, University of Zagreb, Croatia, Jul 2012. [ bib | .pdf ]
This work considers tracking of deforming soft tissues in sequences of threedimensional ultrasound images. The chosen approach to the tracking is based on modeling the deformations using the thin-plate spline warp. In practice, the tracking is reduced to the estimation of the changes in control point locations by examining the intensity changes of neighbouring three-dimensional images in the sequence. The problem this approach faces is its low robustness to ultrasound noise, preventing the correct evolution of the deformation model. In this work, the basic method for tracking of the deforming soft tissues was implemented, along with two new methods used for improving of the basic method’s robustness. The two new methods used were the regularization of the thin-plate spline warp used as the motion model, and the adding of a mass-spring system used for physically constraining the movement of the control points. The new used methods are described and the results acquired by performing tests on simulated sequences of deforming three-dimensional ultrasound images are discussed.


This file was generated by bibtex2html 1.97.