REVISTA
Applied Sciences

TODAS

Inicio / Applied Sciences / Vol: 13 Par: 18 (2023) / Artículo

ARTÍCULO

TITULO

Patch-Level Consistency Regularization in Self-Supervised Transfer Learning for Fine-Grained Image Recognition

Yejin Lee

Suho Lee and Sangheum Hwang

Resumen

Fine-grained image recognition aims to classify fine subcategories belonging to the same parent category, such as vehicle model or bird species classification. This is an inherently challenging task because a classifier must capture subtle interclass differences under large intraclass variances. Most previous approaches are based on supervised learning, which requires a large-scale labeled dataset. However, such large-scale annotated datasets for fine-grained image recognition are difficult to collect because they generally require domain expertise during the labeling process. In this study, we propose a self-supervised transfer learning method based on Vision Transformer (ViT) to learn finer representations without human annotations. Interestingly, it is observed that existing self-supervised learning methods using ViT (e.g., DINO) show poor patch-level semantic consistency, which may be detrimental to learning finer representations. Motivated by this observation, we propose a consistency loss function that encourages patch embeddings of the overlapping area between two augmented views to be similar to each other during self-supervised learning on fine-grained datasets. In addition, we explore effective transfer learning strategies to fully leverage existing self-supervised models trained on large-scale labeled datasets. Contrary to the previous literature, our findings indicate that training only the last block of ViT is effective for self-supervised transfer learning. We demonstrate the effectiveness of our proposed approach through extensive experiments using six fine-grained image classification benchmark datasets, including FGVC Aircraft, CUB-200-2011, Food-101, Oxford 102 Flowers, Stanford Cars, and Stanford Dogs. Under the linear evaluation protocol, our method achieves an average accuracy of 78.5%" role="presentation" style="position: relative;">78.5%78.5% 78.5 % , outperforming the existing transfer learning method, which yields 77.2%" role="presentation" style="position: relative;">77.2%77.2% 77.2 % .

Palabras claves

self-supervised learning - fine-grained image recognition - transfer learning - Vision Transformer

Acceso

PÁGINAS

pp. 0 - 0

NÚMERO

Volumen: 13 Parte: 18 (2023)

MATERIAS

INGENIERÍA Y CONSTRUCCIÓN CIVIL
TECNOLOGÍA

REVISTAS SIMILARES

AI
Information
Journal of Marine Science and Engineering

DOI

https://doi.org/10.3390/app131810493

Artículos similares

Self-Supervised Pre-Training Joint Framework: Assisting Lightweight Detection Network for Underwater Object Detection

Acceso

Zhuo Wang, Haojie Chen, Hongde Qin and Qin Chen

In the computer vision field, underwater object detection has been a challenging task. Due to the attenuation of light in a medium and the scattering of light by suspended particles in water, underwater optical images often face the problems of color dis... ver más

Revista: Journal of Marine Science and Engineering

A Wrapped Approach Using Unlabeled Data for Diabetic Retinopathy Diagnosis

Acceso

Xuefeng Zhang, Youngsung Kim, Young-Chul Chung, Sangcheol Yoon, Sang-Yong Rhee and Yong Soo Kim

Large-scale datasets, which have sufficient and identical quantities of data in each class, are the main factor in the success of deep-learning-based classification models for vision tasks. A shortage of sufficient data and interclass imbalanced data dis... ver más

Revista: Applied Sciences

Individualized Stress Mobile Sensing Using Self-Supervised Pre-Training

Acceso

Tanvir Islam and Peter Washington

Stress is widely recognized as a major contributor to a variety of health issues. Stress prediction using biosignal data recorded by wearables is a key area of study in mobile sensing research because real-time stress prediction can enable digital interv... ver más

Revista: Applied Sciences

Regularized Contrastive Masked Autoencoder Model for Machinery Anomaly Detection Using Diffusion-Based Data Augmentation

Acceso

Esmaeil Zahedi, Mohamad Saraee, Fatemeh Sadat Masoumi and Mohsen Yazdinejad

Unsupervised anomalous sound detection, especially self-supervised methods, plays a crucial role in differentiating unknown abnormal sounds of machines from normal sounds. Self-supervised learning can be divided into two main categories: Generative and C... ver más

Revista: Algorithms

Voice Deepfake Detection Using the Self-Supervised Pre-Training Model HuBERT

Acceso

Lanting Li, Tianliang Lu, Xingbang Ma, Mengjiao Yuan and Da Wan

In recent years, voice deepfake technology has developed rapidly, but current detection methods have the problems of insufficient detection generalization and insufficient feature extraction for unknown attacks. This paper presents a forged speech detect... ver más

Revista: Applied Sciences

Revistas destacadas

Acceso directo a los números publicados en la revista Infrastructures

Infrastructures

Acceso directo a los números publicados en la revista Informed Infraestructure

Informed Infraestructure

Acceso directo a los números publicados en la revista BiT

Acceso directo a los números publicados en la revista Revista de la Construcción

Revista de la Construcción

Ver todas las revistas