Inicio  /  Applied Sciences  /  Vol: 10 Par: 22 (2020)  /  Artículo
ARTÍCULO
TITULO

Sequence-to-Sequence Video Prediction by Learning Hierarchical Representations

Kun Fan    
Chungin Joung and Seungjun Baek    

Resumen

Video prediction which maps a sequence of past video frames into realistic future video frames is a challenging task because it is difficult to generate realistic frames and model the coherent relationship between consecutive video frames. In this paper, we propose a hierarchical sequence-to-sequence prediction approach to address this challenge. We present an end-to-end trainable architecture in which the frame generator automatically encodes input frames into different levels of latent Convolutional Neural Network (CNN) features, and then recursively generates future frames conditioned on the estimated hierarchical CNN features and previous prediction. Our design is intended to automatically learn hierarchical representations of video and their temporal dynamics. Convolutional Long Short-Term Memory (ConvLSTM) is used in combination with skip connections so as to separately capture the sequential structures of multiple levels of hierarchy of features. We adopt Scheduled Sampling for training our recurrent network in order to facilitate convergence and to produce high-quality sequence predictions. We evaluate our method on the Bouncing Balls, Moving MNIST, and KTH human action dataset, and report favorable results as compared to existing methods.

 Artículos similares

       
 
Haotian You, Yufang Lu and Haihua Tang    
Video object detection is an important research direction of computer vision. The task of video object detection is to detect and classify moving objects in a sequence of images. Based on the static image object detector, most of the existing video objec... ver más
Revista: Information

 
Obada Issa and Tamer Shanableh    
This paper proposes a novel approach to activity recognition where videos are compressed using video coding to generate feature vectors based on compression variables. We propose to eliminate the temporal domain of feature vectors by computing the mean a... ver más
Revista: Applied Sciences

 
Hayat Ullah and Arslan Munir    
The recognition of human activities using vision-based techniques has become a crucial research field in video analytics. Over the last decade, there have been numerous advancements in deep learning algorithms aimed at accurately detecting complex human ... ver más
Revista: Algorithms

 
Joanna Kulawik, Mariusz Kubanek and Sebastian Garus    
This research aimed to develop a system for classifying horizontal road signs as correct or with poor visibility. In Poland, road markings are applied by using a specialized white, reflective paint and require periodic repainting. Our developed system is... ver más
Revista: Applied Sciences

 
Shukai Li, Xiaofang Wang, Dongri Shan and Peng Zhang    
Temporal modeling is a key problem in action recognition, and it remains difficult to accurately model temporal information of videos. In this paper, we present a local spatiotemporal extraction module (LSTE) and a channel time excitation module (CTE), w... ver más
Revista: Applied Sciences