Author ORCID Identifier
Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence
Computer Sciences, Geosciences, (multidisciplinary)
Self-Supervised learning (SSL) has reduced the performance gap between supervised and unsupervised learning, due to its ability to learn invariant representations. This is a boon to the domains like Earth Observation (EO), where labelled data availability is scarce but unlabelled data is freely available. While Transfer Learning from generic RGB pre-trained models is still common-place in EO, we argue that, it is essential to have good EO domain specific pre-trained model in order to use with downstream tasks with limited labelled data. Hence, we explored the applicability of SSL with multi-modal satellite imagery for downstream tasks. For this we utilised the state-of-art SSL architectures i.e. BYOL and SimSiam to train on EO data. Also to obtain better invariant representations, we considered multi-spectral (MS) images and synthetic aperture radar (SAR) images as separate augmented views of an image to maximise their similarity. Our work shows that by learning single channel representations through non-contrastive learning, our approach can outperform ImageNet pre-trained models significantly on a scene classification task. We further explored the usefulness of a momentum encoder by comparing the two architectures i.e. BYOL and SimSiam but did not identify a significant improvement in performance between the models.
P. Jain, B. Schoen-Phelan and R. Ross, "Multi-Modal Self-Supervised Representation Learning for Earth Observation," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, 2021, pp. 3241-3244, doi: 10.1109/IGARSS47720.2021.9553741.