Background Modelling is a crucial step in background/foreground detection which could be used in video analysis, such as surveillance, people counting, face detection and pose estimation. Most methods need to choose the hyper parameters manually or use ground truth background masks (GT). In this work, we present an unsupervised deep background (BG) modelling method called BM-Unet which is based on a generative architecture that given a certain frame as input it generates as output the corresponding background image - to be more precise, a probabilistic heat map of the colour values. Our method learns parameters automatically and an augmented version of it that utilises colour, intensity differences and optical flow between a reference and a target frame is robust to rapid illumination changes and camera jitter. Besides, it can be used on a new video sequence without the need of ground truth background/foreground masks for training. Experiment evaluations on challenging sequences in SBMnet data set demonstrate promising results over state-of-the-art methods.
Deep Refinement Convolutional Networks for Human Pose Estimation
Ioannis Marras, Petar Palasek, Ioannis Patras
This work introduces a novel Convolutional Network architecture (ConvNet) for the task of human pose estimation, that is the localization of body joints in a single static image. The proposed coarse to fine architecture addresses shortcomings of the baseline architecture that stem from the fact that large inaccuracies of its coarse ConvNet cannot be corrected by the refinement ConvNet that refines the estimation within small windows of the coarse prediction. This is achieved by a) changes in architectural parameters that both increase the accuracy of the coarse model and make the refinement model more capable of correcting the errors of the coarse model, b) the introduction of a Markov Random Field (MRF)-based spatial model network between the coarse and the refinement model that introduces geometric constraints and c) a training scheme that adapts the data augmentation and the learning rate according to the difficulty of the data examples. The proposed architecture is trained in an end-to-end fashion. Experimental results show that the proposed method improves the baseline model and provides state of the art results on the FashionPose [8] and MPII benchmarks [1].
A couple of months ago I was asked by Lidija Matulin, an academy-trained artist from Croatia, to help her on a project that she wanted to work on. She had this idea to make paintings that would extend to an additional dimension and include sound. That sounded (tee-hee) cool to me so I decided to help! Yay!
First, here's one of the painings Lidija made as part of this project, entitled "Vi niste pozvani pa izvolite sjesti" which translates roughly to "You're not invited so please have a seat":
The idea
We brainstormed how a painting could be tranformed into sound and concluded that a simple way would be to have a line that scans through the painting and turns whatever it sees into sound. But how to turn colour information into sound? We tried a couple of approaches.
What we tried
The first approach was just to take the average colour value of all the colours under the scanner line, turn the colour into greyscale, map this value into a frequency and produce a sine wave. Here I have to say thanks to Carl Bussey from Sheffield, who now lives in Berlin and writes software that makes music, because he showed me how to play sine waves in Python using pyaudio. Thanks Carl!
This first try didn't produce sounds that were too exciting so we had to make some changes. We split the scanner line into parts and had each part of the line make a sound on its own. This sounded a bit better.
The other thing that was annoying was that all of the sounds the program made always had the same length, which was dull. So, we made the length of the sound being played also depend on what was in the painting. Here we also decided that it would be better to have two different sources of sound from the painting; one from the background and one from the foreground. As the painting was now split into two different images, it was easy to make the sound of the foreground parts change their length depending on the area that was under each part of the scanner line as it scanned throught the painting.
One more thing that we did was to try different representations of colour other than the usual RGB representation before mapping the colour into the frequency of the sine wave being played. So we tried different combinations of hue, saturation, and value from the HSV representation.
In the end we had a script written in Python with a lot of parameters to play with that produced different sounds for different things that appeared in an image. Once you ran the script given a foreground and a background image, a video would be produced with a scanner line turning whatever it saw into sound.
So I gave this code to Lidija and she tried all kinds of different combinations of parameters and she figured out what to paint to produce what kind of sound and
The final results
The final paintings were shown at an exhibition under the title "Jednosmjernom kartom, umrljanom marmeladom, kroz zečju rupu" / "With a one way ticket, with stained marmelade, through the rabbit hole" in the Center for culture in Čakovec, Croatia.