Real-time video objects detection, tracking, and recognition are challenging issues
due to the real-time processing requirements of the machine learning algorithms. In recent
years, video processing is performed by deep learning (DL) based techniques that achieve
higher accuracy but require higher computations cost. This paper presents a recent survey of
the state-of-the-art DL platforms and architectures used for deep vision systems. It highlights
the contributions and challenges from over numerous research studies. In particular, this paper
first describes the architecture of various DL models such as AutoEncoders, deep Boltzmann
machines, convolution neural networks, recurrent neural networks and deep residual learning.
Next, deep real-time video objects detection, tracking and recognition studies are highlighted
to illustrate the key trends in terms of cost of computation, number of layers and the accuracy
of results. Finally, the paper discusses the challenges of applying DL for real-time video
processing and draw some directions for the future of DL algorithms. |