Abstract
The theory of predictive coding postulates that the central nervous system employs perceived information to generate predictions and imbue perception with meaning. Drawing inspiration from this biological model, artificial intelligence algorithms rooted in predictive theory have been developed to generate predictions in sequences of sound, images and videos. In this context, we present an innovative method that aims to anticipate object movements by predicting optical flow in sequences of images extracted from videos. The objective is to mitigate spatial and temporal redundancy in images, thereby achieving enhanced data compression efficiency and more precise predictions.
Our network architecture entails a convolutional neural network that estimates optical flow from sequences of images sourced from the KITTI dataset. Subsequently, the optical flow is encoded by a series of convolutional residual blocks and finally the output of consecutive frames is fed into a autoregressive function, enabling the representation of a comprehensive scene history, this representation is then harnessed to predict subsequent motion sequences.
This approach has demonstrated its capability to yield reliable predictions about the future trajectories of objects in scenarios involving multiple moving objects. Performance evaluation was conducted using cosine similarity to compare predicted optical flow with actual future flow and Structural Similarity Index (SSMI) to assess the reconstruction of images from optical flow against real future images.
In conclusion, this research proposes an effective amalgamation of predictive coding through autoregressive models with optical flow estimation to anticipate object movement with high precision. The approach significantly contributes to more efficient and accurate predictions by making effective use of data compression. This novel perspective holds immense utility in scenarios where object movement is a critical factor, such as video analysis, tracking systems and the operation of autonomous devices like cars, robots or drones.
Anticipating object motion in complex scenarios using optical flow
Soraya Mora, Cesar Ravello, Tomás Perez-Acle
The theory of predictive coding asserts that the central nervous system utilizes perceived information to create predictions and endow perception with meaning. Inspired by this biological model, artificial intelligence algorithms based on predictive theory have been crafted to generate predictions for sequences of sound, images, and videos. In this research, we introduce an innovative method to forecast object movements by predicting optical flow in sequences of images extracted from videos, aiming to reduce spatial and temporal redundancy, thereby enhancing data compression efficiency and achieving more accurate predictions.
Our approach encompasses a convolutional neural network (CNN) that estimates optical flow from image sequences obtained from the KITTI dataset. The optical flow is then encoded by convolutional residual blocks, and the outputs of consecutive frames are fed into an autoregressive function. This process allows for a comprehensive representation of scene history, which is utilized to predict subsequent motion sequences.
The method has proven its ability to provide dependable predictions regarding the future trajectories of objects in contexts involving multiple moving entities. We assessed performance using cosine similarity to compare predicted optical flow with the actual future flow and the Structural Similarity Index (SSIM) to evaluate the reconstruction of images from optical flow against real future images.
As a whole, our research offers an impactful fusion of predictive coding with optical flow estimation through autoregressive models, facilitating the anticipation of object movement with high precision. Of note, the proposed approach contributes substantially to efficiency and accuracy in predictions through enhanced data compression. This fresh perspective offers vast potential in applications where object movement is vital, such as video analysis, tracking systems, and the control of autonomous devices like cars, robots, or drones.
Our network architecture entails a convolutional neural network that estimates optical flow from sequences of images sourced from the KITTI dataset. Subsequently, the optical flow is encoded by a series of convolutional residual blocks and finally the output of consecutive frames is fed into a autoregressive function, enabling the representation of a comprehensive scene history, this representation is then harnessed to predict subsequent motion sequences.
This approach has demonstrated its capability to yield reliable predictions about the future trajectories of objects in scenarios involving multiple moving objects. Performance evaluation was conducted using cosine similarity to compare predicted optical flow with actual future flow and Structural Similarity Index (SSMI) to assess the reconstruction of images from optical flow against real future images.
In conclusion, this research proposes an effective amalgamation of predictive coding through autoregressive models with optical flow estimation to anticipate object movement with high precision. The approach significantly contributes to more efficient and accurate predictions by making effective use of data compression. This novel perspective holds immense utility in scenarios where object movement is a critical factor, such as video analysis, tracking systems and the operation of autonomous devices like cars, robots or drones.
Anticipating object motion in complex scenarios using optical flow
Soraya Mora, Cesar Ravello, Tomás Perez-Acle
The theory of predictive coding asserts that the central nervous system utilizes perceived information to create predictions and endow perception with meaning. Inspired by this biological model, artificial intelligence algorithms based on predictive theory have been crafted to generate predictions for sequences of sound, images, and videos. In this research, we introduce an innovative method to forecast object movements by predicting optical flow in sequences of images extracted from videos, aiming to reduce spatial and temporal redundancy, thereby enhancing data compression efficiency and achieving more accurate predictions.
Our approach encompasses a convolutional neural network (CNN) that estimates optical flow from image sequences obtained from the KITTI dataset. The optical flow is then encoded by convolutional residual blocks, and the outputs of consecutive frames are fed into an autoregressive function. This process allows for a comprehensive representation of scene history, which is utilized to predict subsequent motion sequences.
The method has proven its ability to provide dependable predictions regarding the future trajectories of objects in contexts involving multiple moving entities. We assessed performance using cosine similarity to compare predicted optical flow with the actual future flow and the Structural Similarity Index (SSIM) to evaluate the reconstruction of images from optical flow against real future images.
As a whole, our research offers an impactful fusion of predictive coding with optical flow estimation through autoregressive models, facilitating the anticipation of object movement with high precision. Of note, the proposed approach contributes substantially to efficiency and accuracy in predictions through enhanced data compression. This fresh perspective offers vast potential in applications where object movement is vital, such as video analysis, tracking systems, and the control of autonomous devices like cars, robots, or drones.
Original language | American English |
---|---|
Title of host publication | The 7th International Conference on Video and Image Processing (ICVIP 2023) |
State | Published - 2023 |
ASJC Scopus subject areas
- Computer Vision and Pattern Recognition
- Signal Processing