Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation

Patrick Dendorfer, Aljoša Ošep, Laura Leal-Taixé
Technical University Munich

ACCV 2020 (Oral)

[Paper] [Bibtex] [Short Talk] [Long Talk] [Github]



Overview Video

Abstract

In this paper, we present Goal-GAN, an interpretable and end-to-end trainable model for human trajectory prediction. Inspired by human navigation, we model the task of trajectory prediction as an intuitive two-stage process: (i) goal estimation, which predicts the most likely target positions of the agent, followed by a (ii) routing module, which estimates a set of plausible trajectories that route towards the estimated goal. We leverage information about the past trajectory and visual context of the scene to estimate a multi-modal probability distribution over the possible goal positions, which is used to sample a potential goal during the inference. The routing is governed by a recurrent neural network that reacts to physical constraints in the nearby surroundings and generates feasible paths that route towards the sampled goal. Our extensive experimental evaluation shows that our method establishes a new state-of-the-art on several benchmarks while generating a realistic and diverse set of trajectories that conform to physical constraints.


Key Ideas

The key idea of our paper is to interpret the task of trajectory prediction as a two-stage process.

Original Scene:
For a given scene we observe the past trajectory of the pedestrian and visual image of the scene

Stage 1: Intermediate Goal Prediction.
We predict a discrete probability distribution of possible intermediate goals and sample goal estimates from this distribution.

Stage 2: Routing.
We combine the estimated intermediate goal positions togehter with the dynamic features of past trajectory and predict the future trajectory with the routing module


The two-stage prediction process allows us to compose the task of pedestrian trajectory prediction into two stages:
Stage 1: The selects a intermediate goal position that the pedestrian wants to reach n the the scene.
Stage 2: The decoder of the model is conditioned on the sampled goal position and the pedestrian routes towards that goal.


Paper

Goal-GAN: Multimodal Trajectory Prediction Based on Goal Position Estimation


ACCV 2020 (Oral)

[Paper]     [Bibtex]     [Github]



Goal Module

The key idea of our work is the Goal Module. The Goal Module estimates a discrete probability distribution over the possible intermediate goal positions in the scene. To do so, the network combines the visual features of the scene with the encoded motion of the pedestrian and outputs a probability map. We sample discrete goal positions from the estimated probability distribution that are passed to the decoder. We use the Gumbel Softmax Trick to backpropagating the gradients of the final loss through the stochastic process.


Model Overview


Our proposed Goal-GAN consists of three key components, as shown in Figure 2.


Visualizing Multimodality

To demonstrate the capability of our model to learn a multimodal distribution of trajectories we create a synthetic dataset based on the hyang 4 scene of the Stanford Drone Dataset. In the GIFs below, we visualize the past trajectories, estimated probability and multiple sampled trajectories with different intermediate goals.


Experiment on Real Datasets

We evaluate our model on the publicly available datasets ETH, UCY, and Stanford Drone Dataset and achieve state-of-the-art results compared with the baselines.
Visualizations: We show the probability map for different trajectories in the test set. Hover over the images to see the final trajectory predictions of the model. To see the visual results, hover over the images.

ETH and UCY Dataset

Stanford Drone Dataset




Code

 [GitHub]

Acknowledgements

This project was funded by the Hum- boldt Foundation through the Sofja Kovalevskaja Award.
This webpage was inspired by Colorful Image Colorization.