Author ORCID Identifier

Document Type



Available under a Creative Commons Attribution Non-Commercial Share Alike 4.0 International Licence


Computer Sciences, Geosciences, (multidisciplinary)


Self-Supervised learning (SSL) has become the new state of the art in several domain classification and segmentation tasks. One popular category of SSL are distillation networks such as Bootstrap Your Own Latent (BYOL). This work proposes RS-BYOL, which builds on BYOL in the remote sensing (RS) domain where data are non-trivially different from natural RGB images. Since multi-spectral (MS) and synthetic aperture radar (SAR) sensors provide varied spectral and spatial resolution information, we utilise them as an implicit augmentation to learn invariant feature embeddings. In order to learn RS based invariant features with SSL, we trained RS-BYOL in two ways, i.e. single channel feature learning and three channel feature learning. This work explores the usefulness of single channel feature learning from random 10 MS bands of 10m-20 m resolution and VV-VH of SAR bands compared to the common notion of using three or more bands. In our linear probing evaluation, these single channel features reached a 0.92 F1 score on the EuroSAT classification task and 59.6 mIoU on the IEEE Data Fusion Contest (DFC) segmentation task for certain single bands. We also compare our results with ImageNet weights and show that the RS based SSL model outperforms the supervised ImageNet based model. We further explore the usefulness of multi-modal data compared to single modality data, and it is shown that utilising MS and SAR data allows better invariant representations to be learnt than utilising only MS data.



Science Foundation Ireland