Awesome Fine-Grained Image Analysis – Papers, Codes and Datasets

tricks 

Table of contents

  1. Introduction

  2. Tutorials

  3. Survey papers

  4. Benchmark datasets

  5. Fine-grained image recognition

    1. Fine-grained recognition by localization-classification subnetworks

      1. Employing detection or segmentation techniques

      2. Utilizing deep filters / activations

      3. Leveraging attention mechanisms

      4. Other methods

    2. Fine-grained recognition by end-to-end feature encoding

      1. High-order feature interactions

      2. Specific loss functions

      3. Other methods

    3. Fine-grained recognition with external information

      1. Fine-grained recognition with web data / auxiliary data

      2. Fine-grained recognition with multi-modality data

      3. Fine-grained recognition with humans in the loop

  6. Fine-grained image retrieval

    1. Content-based fine-grained image retrieval

    2. Sketch-based fine-grained image retrieval

  7. Future directions of FGIA

    1. Fine-grained few shot learning

    2. Fine-grained hashing

    3. Fine-grained domain adaptation

    4. Fine-grained image generation

    5. FGIA within more realistic settings

  8. Recognition leaderboard

Introduction

This homepage lists some representative papers/codes/datasets all about deep learning based fine-grained image analysis, including fine-grained image recognition, fine-grained image retrieval, etc. If you have any questions, please feel free to contact Prof. Xiu-Shen Wei.

Tutorials

Survey papers

Benchmark datasets

Summary of popular fine-grained image datasets. Note that ‘‘BBox’’ indicates whether this dataset provides object bounding box supervisions. ‘‘Part anno.’’ means providing the key part localizations. ‘‘HRCHY’’ corresponds to hierarchical labels. ‘‘ATR’’ represents the attribute labels (e.g., wing color, male, female, etc). ‘‘Texts’’ indicates whether fine-grained text descriptions of images are supplied.

Dataset name Year Meta-class sharp images sharp categories BBox Part anno. HRCHY ATR Texts
Oxford flower 2008 Flowers 8,189 102 surd
CUB200 2011Birds 11,788 200 surdsurd surd surd
Stanford Dog 2011Dogs 20,580 120 surd
Stanford Car 2013Cars 16,185 196 surd
FGVC Aircraft 2013Aircrafts 10,000 100 surd surd
Birdsnap 2014Birds 49,829 500 surd surd surd
NABirds 2015Birds 48,562 555 surd surd
DeepFashion 2016 Clothes 800,000 1,050surdsurd surd
Fru92 2017Fruits 69,614 92 surd
Veg200 2017Vegetable 91,117 200 surd
iNat2017 2017Plants & Animals 859,000 5,089 surd surd
RPC 2019Retail products 83,739 200surd surd

Fine-grained image recognition

Fine-grained recognition by localization-classification subnetworks

Employing detection or segmentation techniques

Utilizing deep filters / activations

Leveraging attention mechanisms

Other methods

Fine-grained recognition by end-to-end feature encoding

High-order feature interactions

Specific loss functions

Other methods

Fine-grained recognition with external information

Fine-grained recognition with web data

Fine-grained recognition with multi-modality data

Fine-grained recognition with humans in the loop

Fine-grained image retrieval

Content-based fine-grained image retrieval

Sketch-based fine-grained image retrieval

Future directions of FGIA

Fine-grained few shot learning

Fine-grained hashing

Fine-grained domain adaptation

Fine-grained image generation

FGIA within more realistic settings

Recognition leaderboard

The section is being continually updated. Since CUB200-2011 is the most popularly used fine-grained dataset, we list the fine-grained recognition leaderboard by treating it as the test bed.

Method Published BBox? Part? External information? Base model Image resolution Accuracy
PB R-CNN ECCV 2014 surd Alex-Net 224x224 73.9%
MaxEnt NeurIPS 2018 GoogLeNet TBD 74.4%
PB R-CNN ECCV 2014 surd surd Alex-Net 224x224 76.4%
PS-CNN CVPR 2016 surd surd CaffeNet 454x454 76.6%
MaxEnt NeurIPS 2018 VGG-16 TBD 77.0%
Mask-CNN PR 2018 surd Alex-Net 448x448 78.6%
PC ECCV 2018 ResNet-50 TBD 80.2%
DeepLAC CVPR 2015surd surd Alex-Net 227x227 80.3%
MaxEnt NeurIPS 2018 ResNet-50 TBD 80.4%
Triplet-A CVPR 2016 surd Manual labour GoogLeNet TBD 80.7%
Multi-grained ICCV 2015 WordNet etc. VGG-19 224x224 81.7%
Krause et al. CVPR 2015 surd CaffeNet TBD 82.0%
Multi-grained ICCV 2015 surd WordNet etc. VGG-19 224x224 83.0%
TS CVPR 2016 VGGD+VGGM448x448 84.0%
Bilinear CNN ICCV 2015 VGGD+VGGM 448x448 84.1%
STN NeurIPS 2015 GoogLeNet+BN 448x448 84.1%
LRBP CVPR 2017 VGG-16 224x224 84.2%
PDFS CVPR 2016 VGG-16 TBD 84.5%
Xu et al. ICCV 2015 surd surd Web data CaffeNet 224x224 84.6%
Cai et al. ICCV 2017 VGG-16 448x448 85.3%
RA-CNN CVPR 2017 VGG-19 448x448 85.3%
MaxEnt NeurIPS 2018 Bilinear CNN TBD 85.3%
PC ECCV 2018 Bilinear CNN TBD 85.6%
CVL CVPR 2017 Texts VGGTBD 85.6%
Mask-CNN PR 2018 surd VGG-16 448x448 85.7%
GP-256 ECCV 2018 VGG-16 448x448 85.8%
KP CVPR 2017 VGG-16 224x224 86.2%
T-CNN IJCAI 2018 ResNet224x224 86.2%
MA-CNN ICCV 2017 VGG-19 448x448 86.5%
MaxEnt NeurIPS 2018 DenseNet-161 TBD 86.5%
DeepKSPD ECCV 2018 VGG-19 448x448 86.5%
OSME+MAMC ECCV 2018 ResNet-101 448x448 86.5%
StackDRL IJCAI 2018 VGG-19 224x224 86.6%
DFL-CNN CVPR 2018 VGG-16 448x448 86.7%
Bi-Modal PMA IEEE TIP 2020 VGG-16 448x448 86.8%
PC ECCV 2018 DenseNet-161 TBD 86.9%
KERL IJCAI 2018 Attributes VGG-16 224x224 87.0%
HBP ECCV 2018 VGG-16 448x448 87.1%
Mask-CNN PR 2018 surd ResNet-50 448x448 87.3%
DFL-CNN CVPR 2018 ResNet-50 448x448 87.4%
NTS-Net ECCV 2018 ResNet-50 448x448 87.5%
HSnet CVPR 2017 surd surd GoogLeNet+BN TBD 87.5%
Bi-Modal PMA IEEE TIP 2020 ResNet-50 448x448 87.5%
CIN AAAI 2020 ResNet-50 448x448 87.5%
MetaFGNet ECCV 2018 Auxiliary data ResNet-34TBD 87.6%
Cross-X CVPR 2020 ResNet-50 448x448 87.7%
DCL CVPR 2019 ResNet-50448x448 87.8%
ACNet CVPR 2020 VGG-16 448x448 87.8%
TASN CVPR 2019 ResNet-50 448x448 87.9%
ACNet CVPR 2020 ResNet-50 448x448 88.1%
CIN AAAI 2020 ResNet-101 448x448 88.1%
DBTNet-101 NeurIPS 2019 ResNet-101 448x448 88.1%
Bi-Modal PMA IEEE TIP 2020 TextsVGG-16 448x448 88.2%
GCL AAAI 2020 ResNet-50 448x448 88.3%
S3N CVPR 2020 ResNet-50 448x448 88.5%
Sun et al. AAAI 2020 ResNet-50 448x448 88.6%
FDL AAAI 2020 ResNet-50 448x448 88.6%
Bi-Modal PMA IEEE TIP 2020 TextsResNet-50 448x448 88.7%
DF-GMM CVPR 2020 ResNet-50 448x448 88.8%
PMG ECCV 2020 VGG-16 550x550 88.8%
FDL AAAI 2020 DenseNet-161 448x448 89.1%
PMG ECCV 2020 ResNet-50 550x550 89.6%
API-Net AAAI 2020 DenseNet-161 512x512 90.0%
Ge et al. CVPR 2019 GoogLeNet+BN Shorter side is 800 px 90.3%