Awesome Fine-Grained Image Analysis – Papers, Codes and Datasets

tricks 

Table of contents

  1. Introduction

  2. Tutorials

  3. Survey papers

  4. Benchmark datasets

  5. Fine-grained image recognition

    1. Fine-grained recognition by localization-classification subnetworks

      1. Employing detection or segmentation techniques

      2. Utilizing deep filters / activations

      3. Leveraging attention mechanisms

      4. Other methods

    2. Fine-grained recognition by end-to-end feature encoding

      1. High-order feature interactions

      2. Specific loss functions

      3. Other methods

    3. Fine-grained recognition with external information

      1. Fine-grained recognition with web data / auxiliary data

      2. Fine-grained recognition with multi-modality data

      3. Fine-grained recognition with humans in the loop

  6. Fine-grained image retrieval

    1. Content-based fine-grained image retrieval

    2. Sketch-based fine-grained image retrieval

  7. Future directions of FGIA

    1. Fine-grained few/zero shot learning

    2. Fine-grained hashing

    3. Fine-grained domain adaptation

    4. Fine-grained image generation

    5. FGIA within more realistic settings

  8. Toolbox

  9. Recognition leaderboard

Introduction

This homepage lists some representative papers/codes/datasets all about deep learning based fine-grained image analysis, including fine-grained image recognition, fine-grained image retrieval, etc. If you have any questions, please feel free to contact Prof. Xiu-Shen Wei.

Tutorials

Survey papers

Benchmark datasets

Summary of popular fine-grained image datasets. Note that ‘‘BBox’’ indicates whether this dataset provides object bounding box supervisions. ‘‘Part anno.’’ means providing the key part localizations. ‘‘HRCHY’’ corresponds to hierarchical labels. ‘‘ATR’’ represents the attribute labels (e.g., wing color, male, female, etc). ‘‘Texts’’ indicates whether fine-grained text descriptions of images are supplied. Several datasets are listed here twice since they are commonly used in both recognition and retrieval tasks.

Fine-grained image recognition

Dataset name Year Meta-class sharp images sharp categories BBox Part anno. HRCHY ATR Texts
Oxford flower 2008 Flowers 8,189 102 surd
CUB200 2011Birds 11,788 200 surdsurd surd surd
Stanford Dog 2011Dogs 20,580 120 surd
Stanford Car 2013Cars 16,185 196 surd
FGVC Aircraft 2013Aircrafts 10,000 100 surd surd
Birdsnap 2014Birds 49,829 500 surd surd surd
Food-101 2014Food dishes 101,000 101
NABirds 2015Birds 48,562 555 surd surd
Food-975 2016Foods 37,885 975 surd
DeepFashion 2016 Clothes 800,000 1,050surdsurd surd
Fru92 2017Fruits 69,614 92 surd
Veg200 2017Vegetable 91,117 200 surd
iNat2017 2017Plants & Animals 859,000 5,089 surd surd
Dog-in-the-Wild 2018Dogs 299,458 362
RPC 2019Retail products 83,739 200 surd surd
Products-10K 2020Retail products 150,000 10,000 surd surd
UFG 2021Leaf 47,114 3,526 surd
iNat2021 2021Plants & Animals 3,286,843 10,000 surd

Fine-grained image retrieval

Dataset name Year Meta-class sharp images sharp categories BBox Part anno. HRCHY ATR Texts
Oxford flower 2008 Flowers 8,189 102 surd
CUB200 2011Birds 11,788 200 surdsurd surd surd
Stanford Car 2013Cars 16,185 196 surd
SBIR2014 2014Multiple 1,120/7,267 14 surd surd surd
DeepFashion 2016 Clothes 800,000 1,050 surd surd surd
QMUL-Shoe 2016Shoes 419/419 1 surd
QMUL-Chair 2016Chairs 297/297 1 surd
Sketchy 2016Multiple 75,471/12,500 125
QMUL-Handbag 2017Handbags 568/568 1
SBIR2017 2017Shoes 912/304 1 surd surd
QMUL-Shoe-V2 2019Shoes 6,730/2,000 1
FG-Xmedia 2019Birds 11,788 200 surd

Fine-grained image recognition

Fine-grained recognition by localization-classification subnetworks

Employing detection or segmentation techniques

Utilizing deep filters / activations

Leveraging attention mechanisms

Other methods

Fine-grained recognition by end-to-end feature encoding

High-order feature interactions

Specific loss functions

Other methods

Fine-grained recognition with external information

Fine-grained recognition with web data

Fine-grained recognition with multi-modality data

Fine-grained recognition with humans in the loop

Fine-grained image retrieval

Content-based fine-grained image retrieval

Sketch-based fine-grained image retrieval

Future directions of FGIA

Fine-grained few/zero shot learning

Fine-grained hashing

Fine-grained recognition/retrieval with coarse labels

Fine-grained domain adaptation

Fine-grained image generation

FGIA within more realistic settings

Toolbox

Recognition leaderboard

The section is being continually updated. Since CUB200-2011 is the most popularly used fine-grained dataset, we list the fine-grained recognition leaderboard by treating it as the test bed.

Method Published BBox? Part? External information? Base model Image resolution Accuracy
PB R-CNN ECCV 2014 surd Alex-Net 224x224 73.9%
MaxEnt NeurIPS 2018 GoogLeNet TBD 74.4%
PB R-CNN ECCV 2014 surd surd Alex-Net 224x224 76.4%
PS-CNN CVPR 2016 surd surd CaffeNet 454x454 76.6%
MaxEnt NeurIPS 2018 VGG-16 TBD 77.0%
Mask-CNN PR 2018 surd Alex-Net 448x448 78.6%
PC ECCV 2018 ResNet-50 TBD 80.2%
DeepLAC CVPR 2015 surd surd Alex-Net 227x227 80.3%
MaxEnt NeurIPS 2018 ResNet-50 TBD 80.4%
Triplet-A CVPR 2016 surd Manual labour GoogLeNet TBD 80.7%
Multi-grained ICCV 2015 WordNet etc. VGG-19 224x224 81.7%
Krause et al. CVPR 2015 surd CaffeNet TBD 82.0%
Multi-grained ICCV 2015 surd WordNet etc. VGG-19 224x224 83.0%
TS CVPR 2016 VGGD+VGGM 448x448 84.0%
Bilinear CNN ICCV 2015 VGGD+VGGM 448x448 84.1%
STN NeurIPS 2015 GoogLeNet+BN 448x448 84.1%
LRBP CVPR 2017 VGG-16 224x224 84.2%
PDFS CVPR 2016 VGG-16 TBD 84.5%
Xu et al. ICCV 2015 surd surd Web data CaffeNet 224x224 84.6%
Cai et al. ICCV 2017 VGG-16 448x448 85.3%
RA-CNN CVPR 2017 VGG-19 448x448 85.3%
MaxEnt NeurIPS 2018 Bilinear CNN TBD 85.3%
GZ IEEE TPAMI 2021 ResNet-101 448x448 85.4%
PC ECCV 2018 Bilinear CNN TBD 85.6%
CVL CVPR 2017 Texts VGG TBD 85.6%
Mask-CNN PR 2018 surd VGG-16 448x448 85.7%
Peer-learning ICCV 2021 Web data ResNet-32 224x224 85.7%
GP-256 ECCV 2018 VGG-16 448x448 85.8%
KP CVPR 2017 VGG-16 224x224 86.2%
T-CNN IJCAI 2018 ResNet 224x224 86.2%
MA-CNN ICCV 2017 VGG-19 448x448 86.5%
MaxEnt NeurIPS 2018 DenseNet-161 TBD 86.5%
DeepKSPD ECCV 2018 VGG-19 448x448 86.5%
OSME+MAMC ECCV 2018 ResNet-101 448x448 86.5%
StackDRL IJCAI 2018 VGG-19 224x224 86.6%
DFL-CNN CVPR 2018 VGG-16 448x448 86.7%
Bi-Modal PMA IEEE TIP 2020 VGG-16 448x448 86.8%
PC ECCV 2018 DenseNet-161 TBD 86.9%
KERL IJCAI 2018 Attributes VGG-16 224x224 87.0%
HBP ECCV 2018 VGG-16 448x448 87.1%
SAM ECCV 2022 DBTNet-50 224x224 87.26%
Mask-CNN PR 2018 surd ResNet-50 448x448 87.3%
P-CNN IEEE TPAMI 2022 VGG-19 448x448 87.3%
DFL-CNN CVPR 2018 ResNet-50 448x448 87.4%
NTS-Net ECCV 2018 ResNet-50 448x448 87.5%
HSnet CVPR 2017 surd surd GoogLeNet+BN TBD 87.5%
Bi-Modal PMA IEEE TIP 2020 ResNet-50 448x448 87.5%
CIN AAAI 2020 ResNet-50 448x448 87.5%
ProtoTree CVPR 2021 ResNet-32 224x224 87.5%
MetaFGNet ECCV 2018 Auxiliary data ResNet-34 TBD 87.6%
Cross-X CVPR 2020 ResNet-50 448x448 87.7%
GZ IEEE TPAMI 2021 MA-CNN 448x448 87.7%
DCL CVPR 2019 ResNet-50 448x448 87.8%
ACNet CVPR 2020 VGG-16 448x448 87.8%
TASN CVPR 2019 ResNet-50 448x448 87.9%
ACNet CVPR 2020 ResNet-50 448x448 88.1%
CIN AAAI 2020 ResNet-101 448x448 88.1%
DBTNet-101 NeurIPS 2019 ResNet-101 448x448 88.1%
Bi-Modal PMA IEEE TIP 2020 Texts VGG-16 448x448 88.2%
CMN IEEE TIP 2022 ResNet-50 448x448 88.2%
GCL AAAI 2020 ResNet-50 448x448 88.3%
AP-CNN IEEE TIP 2021 ResNet-50 448x448 88.4%
LC3DOR ICCV 2021 ResNet-50 512x512 88.4%
S3N CVPR 2020 ResNet-50 448x448 88.5%
Sun et al. AAAI 2020 ResNet-50 448x448 88.6%
FDL AAAI 2020 ResNet-50 448x448 88.6%
Bi-Modal PMA IEEE TIP 2020 Texts ResNet-50 448x448 88.7%
SPS ICCV 2021 Resnet-50 448x448 88.70%
DF-GMM CVPR 2020 ResNet-50 448x448 88.8%
PMG ECCV 2020 VGG-16 550x550 88.8%
FDL AAAI 2020 DenseNet-161 448x448 89.1%
DP-Net AAAI 2021 ResNet-50 448x448 89.3%
SnapMix AAAI 2021 ResNet-101 448x448 89.32%
PMG ECCV 2020 ResNet-50 550x550 89.6%
GHORD CVPR 2021 ResNet-50 448x448 89.6%
API-Net AAAI 2020 DenseNet-161 512x512 90.0%
PART IEEE TIP 2021 ResNet-101 448x448 90.1%
DTRG IEEE TIP 2022 DenseNet-161 448x448 90.1%
P2P-Net CVPR 2022 ResNet-50 448x448 90.2%
Ge et al. CVPR 2019 GoogLeNet+BN Shorter side is 800 px 90.3%
CAL ICCV 2021 ResNet-101 448x448 90.6%
CP-CNN IEEE TIP 2022 ResNet-50 448x448 91.4%
DCAL CVPR 2022 ViT-Base 448x448 91.4%
TransFG AAAI 2022 ViT-B_16 448x448 91.7%
CAP AAAI 2021 Xception 224x224 91.8%
SR-GNN IEEE TIP 2022 Xception 224x224 91.9%
DCAL CVPR 2022 R50-ViT-Base 448x448 92.0%