Inicio  /  Applied Sciences  /  Vol: 13 Par: 22 (2023)  /  Artículo
ARTÍCULO
TITULO

Vision-Language Models for Zero-Shot Classification of Remote Sensing Images

Mohamad Mahmoud Al Rahhal    
Yakoub Bazi    
Hebah Elgibreen and Mansour Zuair    

Resumen

Zero-shot classification presents a challenge since it necessitates a model to categorize images belonging to classes it has not encountered during its training phase. Previous research in the field of remote sensing (RS) has explored this task by training image-based models on known RS classes and then attempting to predict the outcomes for unfamiliar classes. Despite these endeavors, the outcomes have proven to be less than satisfactory. In this paper, we propose an alternative approach that leverages vision-language models (VLMs), which have undergone pre-training to grasp the associations between general computer vision image-text pairs in diverse datasets. Specifically, our investigation focuses on thirteen VLMs derived from Contrastive Language-Image Pre-Training (CLIP/Open-CLIP) with varying levels of parameter complexity. In our experiments, we ascertain the most suitable prompt for RS images to query the language capabilities of the VLM. Furthermore, we demonstrate that the accuracy of zero-shot classification, particularly when using large CLIP models, on three widely recognized RS scene datasets yields superior results compared to existing RS solutions.