Background & AIS Data

What is AIS?

  • AIS = Automatic Identification System
  • A transponder-based broadcast system for sharing vessel navigation data in the maritime domain
  • Data is captured in real-time during vessel navigation, but historical data is also publically available
Standard transmitted AIS data includes:
  • Time-Varying Data such as positional data (latitude/longitude) & velocity data (SOG & COG)
  • Static Vessel Data encoded at the start of a journey (vessel name, callsign, origin, destination)
For the purpose of our MVP, our modeling focuses on just the time-varying positional data captured by AIS, but our methodology can likely expand to include other available AIS data.

Hosting Historical AIS Data

For developing our model and evaluating its performance, we source and host our historical AIS datasets using the following data-hosting architecture:

  • Data Source: 2018 historical AIS data from SF Bay area (publicly-available at
  • Data stored on AWS using Amazon S3
  • Pre-processing applied with AWS Glue pipelines
  • Data querying done using with AWS Athena


Modeling Approach

The historical AIS data available to us defines sequences of vessel positions over time. For our MVP, we were interested in using this data to develop a model that is able to predict the likely forward trajectory of a target vessel, given its most recent known positions (defined by its current position and backward trajectory).

Unlike much of today's research that frames the problem as a sequence-to-sequence problem for predicting discrete forward trajectories (Capobianco et al. 2021; Murray et al. 2020), we look at taking a probabilistic approach to the problem to capture likelihoods of where a vessel might be. Thus, we relax the traditional approach of predicting discrete paths for vessel trajectories and instead of predicting time-varying sequences of positions, look at predicting time-varying sequences of probability distributions of where a target vessel will likely be.

Leveraging traditional deep learning techniques for sequence-to-sequence problems and Uber's open-sourced H3 library to discretize the space of the San Francisco Bay into hexagonal bins, our model takes in only 6 inputs of positional data for a target vessel and predicts a per-timestep probabilistic distribution for the likely future positions of the vessel for the next 30 timesteps. In doing so, we are able to capture important vessel navigation scenarios, such as bifurcation points, terminal points, and standard traffic routes that cannot be captured with the traditional radar-based technology used in industry today.

For the purpose of evaluating our technique, we quantitatively evaluate our approach against others using a custom cost function designed for assessing probablistic sequence predictions, based on the Brier Score. We qualitatively evaluate our approach by comparing probabilistic predictions to ground truth trajectories for key scenarios, as well as assessing inference times to ensure our approach satisfies runtime requirements for real-time inference scenarios.



Example prediction made by our approach on a vessel entering the San Francisco Bay. Note the bifurcating/splitting path at the 14th timestep of the prediction for this target vessel.

Product Architecture


Short Term (MVP): Online demo solution where our product is hosted via Flask and Heroku on and model predictions are cached locally for demonstration purposes.

Long Term: Containerized solution where our product is capable of doing real-time inference on live AIS data coming in from nearby vessel traffic. This long-term, containerized solution is designed to run in cloud or edge scenarios using Amazon IoT Greengrass (thus having support for real-time inference on vessels).