Skip to main content

Big Ocean, Big Data

Toward an open ocean community platform for automated image classification in the deep sea

Published onMar 02, 2018
Big Ocean, Big Data
key-enterThis Pub is a Supplement to


More ocean data has been collected in the last two years than in all previous years combined, and we are on a path to continue to break that record. More than ever we need to establish a solid foundation for processing this ceaseless stream of data, especially for visual data where ocean-going platforms are beginning to integrate multi-camera feeds for observation and navigation. Techniques to efficiently process and utilize visual datasets with machine learning exist and continue to be transformatively effective, but have had limited success in the oceanographic world due to (1) lack of dataset standardization, (2) sparse annotation tools for the wider oceanographic community, and (3) insufficient formatting of existing, expertly curated imagery for use by data scientists.


Our efforts will establish a new baseline dataset, optimized to directly accelerate development of modern, intelligent, automated analysis of underwater visual data.This will enable scientists, explorers, policymakers, storytellers, and the public to learn, understand, and care more about our oceans than ever before.


Building on  successes of the machine learning community, we propose to build a public platform (TBDNet) that makes use of existing (and future), expertly curated data to know what’s in the ocean and where it is for effective and responsible marine stewardship (Figure 1). This platform will be modeled after Stanford’s ImageNet that, along with other datasets, enabled rapid advances in automated visual analysis. Unlike ImageNet, which was created in a field that lacked curated data, TBDNet seeks to organize and index a wealth of existing data. We hope to address items (1) and (2) above, and begin addressing (3) within the duration of this project by utilizing MBARI’s Video Annotation and Reference System (VARS) and MBARI’s annotated deep sea video database that will serve as the primary image set for TBDNet. As the project progresses, we plan to incorporate other existing datasets from WHOI, URI, and elsewhere using the standards and workflow we develop.

MBARI uses high-resolution video equipment to record hundreds of remotely and autonomously operated vehicle dives each year. This video library contains detailed footage of the biological, chemical, geological, and physical aspects of each deployment. Since 1988, more than 23,000 hours of videotape have been archived, annotated, and maintained as a centralized MBARI resource. This resource is enabled by the Video Annotation and Reference System (VARS), which is a software interface and database system that provides tools for describing, cataloguing, retrieving, and viewing the visual, descriptive, and quantitative data associated with MBARI’s deep-sea video archives. All of MBARI’s video resources are expertly annotated by members of the Video Lab (VL), and there are currently more than 6 million annotations and 4000 terms in the VARS knowledgebase, with over 2000 of those terms belonging to either genera or species.

The proposed workflow, which will be completed by November, is described in Figure 2. Using the VARS search, images corresponding to a keyword query (e.g., genus or species) with a single annotation will be selected for automated bounding box identification using an existing computer vision algorithm, and verified by an MBARI VL Technician or crowdsourcing.

In parallel, a state-of-the-art deep learning algorithm will be developed on the expertly verified, labeled, and localized images. This algorithm will be used to augment future data labeling and verification tasks, continuing to improve as more data is added to the system. Once all single-annotation images are verified, multiple-annotation images will be iteratively used until all annotated images have been labeled and verified. As the workflow is finalized, we will pursue incorporating NGS Pristine Seas annotated image data to the training set, and demonstrate our efforts on unannotated video from the NOAA Office of Ocean Exploration and/or the Ocean Exploration Trust.

Figure 1. Schematic describing TBDNet’s inputs and outputs

Figure 2. Schematic describing the proposed localization algorithm development using MBARI’s curated deep sea video database and VARS.


  • Quantity of usable data. The quality of any modern image processing algorithm is directly proportional to the quantity of accurate, labeled images. It is unknown how large, and in what state the MBARI dataset is.

  • Effectiveness of annotation algorithms. We have identified existing, open-source algorithms to augment the laborious image labeling and localization task, but the efficacy of these algorithms on MBARI’s existing dataset is unknown.

  • Validation. The need for accurate, crowdsourced validation of image annotations is crucial to the success of any high-quality, large-scale dataset. While there is prior work demonstrating the power of commercial data labeling services (e.g, Mechanical Turk or CrowdFlower), the effectiveness of these same services for a relatively niche dataset like MBARI’s is unknown.

  • Adoption. A large-scale database such as TBDNet is useless if it is not being used to further scientific understanding and ocean stewardship. Adequately addressing the previous unknowns in a public, well-defined manner is crucial to fostering trust between our efforts and the research community.



  • MBARI annotated video dataset

  • Project Leader salary support

  • CVision High Performance Compute cluster

  • OpenROV GPU machine

  • CVision in kind salary support for B. Woodward

  • OpenROV in kind salary support for G. Montague


  • Travel for on-site (MBARI) meeting

  • Refreshments and meals for on-site meeting

  • Research Assistant (6 months, term position at MBARI)

  • Product manager and CV/ML assist; 2 months @ 100% time and 4 months at 40% time.

  • 2x CV/ML Experts @ 25% time for 6 months


Table 1. Big Ocean, Big Data project timeline (0-6 months)


Grace Young, Oxford University

Gilbert Montague, OpenROV

Ben Woodward, CVision AI

Genevieve Flaspohler, MIT

Joshua Gyllinsky, University of Rhode Island

Adam Soule, Woods Hole Oceanographic Institution

Katy Croff Bell, MIT Media Lab

Kakani Katija, Monterey Bay Aquarium Research Institute

James Neilan:

I am very interested in the expansion from the base training and test set that's you are going to develop for this effort. A good dove into some semi-supervised approaches to expand with data from other unlabeled institutions. I'd love to help if you win funding.

James Neilan:

Might be able to leverage Virginia Aquarium as well. I’m meeting with the director of programs this week or the next and I’ll ask if they have any data sets that might add to the effort.

M M:

Gilbert Montague has submitted this pub for publication.

James Neilan:

Thinking about the architecture in using LSTM RNNs and GANs to build the annotations from Avedac….This is really interesting. Can you build a systems that leverages Avedac results, have an LTSM RNN and/or GAN learn more features for unknown and develop a hybrid unsupervised learning system cluster and then label the rest of the data without a need for crowd sourcing? would that be possible or even make sense? So sad I couldn’t stay for day 2.