08.08.2019       Выпуск 294 (05.08.2019 - 11.08.2019)       Интересные проекты, инструменты, библиотеки

rwightman / pytorch-image-models PyTorch image models


Экспериментальная функция:

Ниже вы видите текст статьи по ссылке. По нему можно быстро понять ссылка достойна прочтения или нет

Просим обратить внимание, что текст по ссылке и здесь может не совпадать.

PyTorch Image Models, etc


For each competition, personal, or freelance project involving images + Convolution Neural Networks, I build on top of an evolving collection of code and models. This repo contains a (somewhat) cleaned up and paired down iteration of that code. Hopefully it'll be of use to others.

The work of many others is present here. I've tried to make sure all source material is acknowledged:


I've included a few of my favourite models, but this is not an exhaustive collection. You can't do better than Cadene's collection in that regard. Most models do have pretrained weights from their respective sources or original authors.

  • ResNet/ResNeXt (from torchvision with mods by myself)
    • ResNet-18, ResNet-34, ResNet-50, ResNet-101, ResNet-152, ResNeXt50 (32x4d), ResNeXt101 (32x4d and 64x4d)
    • 'Bag of Tricks' / Gluon C, D, E, S variations (https://arxiv.org/abs/1812.01187)
    • Instagram trained / ImageNet tuned ResNeXt101-32x8d to 32x48d from from facebookresearch
  • DenseNet (from torchvision)
    • DenseNet-121, DenseNet-169, DenseNet-201, DenseNet-161
  • Squeeze-and-Excitation ResNet/ResNeXt (from Cadene with some pretrained weight additions by myself)
    • SENet-154, SE-ResNet-18, SE-ResNet-34, SE-ResNet-50, SE-ResNet-101, SE-ResNet-152, SE-ResNeXt-26 (32x4d), SE-ResNeXt50 (32x4d), SE-ResNeXt101 (32x4d)
  • Inception-ResNet-V2 and Inception-V4 (from Cadene )
  • Xception (from Cadene)
  • PNasNet & NASNet-A (from Cadene)
  • DPN (from me, weights hosted by Cadene)
  • Generic EfficientNet (from my standalone GenMobileNet) - A generic model that implements many of the efficient models that utilize similar DepthwiseSeparable and InvertedResidual blocks

Use the --model arg to specify model for train, validation, inference scripts. Match the all lowercase creation fn for the model you'd like.


Several (less common) features that I often utilize in my projects are included. Many of their additions are the reason why I maintain my own set of models, instead of using others' via PIP:

  • All models have a common default configuration interface and API for
    • accessing/changing the classifier - get_classifier and reset_classifier
    • doing a forward pass on just the features - forward_features
    • these makes it easy to write consistent network wrappers that work with any of the models
  • All models have a consistent pretrained weight loader that adapts last linear if necessary, and from 3 to 1 channel input if desired
  • The train script works in several process/GPU modes:
    • NVIDIA DDP w/ a single GPU per process, multiple processes with APEX present (AMP mixed-precision optional)
    • PyTorch DistributedDataParallel w/ multi-gpu, single process (AMP disabled as it crashes when enabled)
    • PyTorch w/ single GPU single process (AMP optional)
  • A dynamic global pool implementation that allows selecting from average pooling, max pooling, average + max, or concat([average, max]) at model creation. All global pooling is adaptive average by default and compatible with pretrained weights.
  • A 'Test Time Pool' wrapper that can wrap any of the included models and usually provide improved performance doing inference with input images larger than the training size. Idea adapted from original DPN implementation when I ported (https://github.com/cypw/DPNs)
  • Training schedules and techniques that provide competitive results (Cosine LR, Random Erasing, Label Smoothing, etc)
  • Mixup (as in https://arxiv.org/abs/1710.09412) - currently implementing/testing
  • An inference script that dumps output to CSV is provided as an example


A CSV file containing an ImageNet-1K validation results summary for all included models with pretrained weights and default configurations is located here

Self-trained Weights

I've leveraged the training scripts in this repository to train a few of the models with missing weights to good levels of performance. These numbers are all for 224x224 training and validation image sizing with the usual 87.5% validation crop.

ModelPrec@1 (Err)Prec@5 (Err)Param #Image ScalingImage Size
efficientnet_b279.760 (20.240)94.714 (5.286)9.11Mbicubic260
resnext50d_32x4d79.674 (20.326)94.868 (5.132)25.1Mbicubic224
mixnet_l78.976 (21.02494.184 (5.816)7.33Mbicubic224
efficientnet_b178.692 (21.308)94.086 (5.914)7.79Mbicubic240
resnext50_32x4d78.512 (21.488)94.042 (5.958)25Mbicubic224
resnet5078.470 (21.530)94.266 (5.734)25.6Mbicubic224
mixnet_m77.256 (22.744)93.418 (6.582)5.01Mbicubic224
seresnext26_32x4d77.104 (22.896)93.316 (6.684)16.8Mbicubic224
efficientnet_b076.912 (23.088)93.210 (6.790)5.29Mbicubic224
resnet26d76.68 (23.32)93.166 (6.834)16Mbicubic224
mixnet_s75.988 (24.012)92.794 (7.206)4.13Mbicubic224
mobilenetv3_10075.634 (24.366)92.708 (7.292)5.5Mbicubic224
mnasnet_a175.448 (24.552)92.604 (7.396)3.89Mbicubic224
resnet2675.292 (24.708)92.57 (7.43)16Mbicubic224
fbnetc_10075.124 (24.876)92.386 (7.614)5.6Mbilinear224
resnet3475.110 (24.890)92.284 (7.716)22Mbilinear224
seresnet3474.808 (25.192)92.124 (7.876)22Mbilinear224
mnasnet_b174.658 (25.342)92.114 (7.886)4.38Mbicubic224
spnasnet_10074.084 (25.916)91.818 (8.182)4.42Mbilinear224
seresnet1871.742 (28.258)90.334 (9.666)11.8Mbicubic224

Ported Weights

ModelPrec@1 (Err)Prec@5 (Err)Param #Image ScalingImage SizeSource
tf_efficientnet_b7 *tfp84.480 (15.520)96.870 (3.130)66.35bicubic600Google
tf_efficientnet_b784.420 (15.580)96.906 (3.094)66.35bicubic600Google
tf_efficientnet_b6 *tfp84.140 (15.860)96.852 (3.148)43.04bicubic528Google
tf_efficientnet_b684.110 (15.890)96.886 (3.114)43.04bicubic528Google
tf_efficientnet_b5 *tfp83.694 (16.306)96.696 (3.304)30.39bicubic456Google
tf_efficientnet_b583.688 (16.312)96.714 (3.286)30.39bicubic456Google
tf_efficientnet_b483.022 (16.978)96.300 (3.700)19.34bicubic380Google
tf_efficientnet_b4 *tfp82.948 (17.052)96.308 (3.692)19.34bicubic380Google
tf_efficientnet_b3 *tfp81.576 (18.424)95.662 (4.338)12.23bicubic300Google
tf_efficientnet_b381.636 (18.364)95.718 (4.282)12.23bicubic300Google
gluon_senet15481.224 (18.776)95.356 (4.644)115.09bicubic224
gluon_resnet152_v1s81.012 (18.988)95.416 (4.584)60.32bicubic224
gluon_seresnext101_32x4d80.902 (19.098)95.294 (4.706)48.96bicubic224
gluon_seresnext101_64x4d80.890 (19.110)95.304 (4.696)88.23bicubic224
gluon_resnext101_64x4d80.602 (19.398)94.994 (5.006)83.46bicubic224
gluon_resnet152_v1d80.470 (19.530)95.206 (4.794)60.21bicubic224
gluon_resnet101_v1d80.424 (19.576)95.020 (4.980)44.57bicubic224
gluon_resnext101_32x4d80.334 (19.666)94.926 (5.074)44.18bicubic224
gluon_resnet101_v1s80.300 (19.700)95.150 (4.850)44.67bicubic224
tf_efficientnet_b2 *tfp80.188 (19.812)94.974 (5.026)9.11bicubic260Google
tf_efficientnet_b280.086 (19.914)94.908 (5.092)9.11bicubic260Google
gluon_resnet152_v1c79.916 (20.084)94.842 (5.158)60.21bicubic224
gluon_seresnext50_32x4d79.912 (20.088)94.818 (5.182)27.56bicubic224
gluon_resnet152_v1b79.692 (20.308)94.738 (5.262)60.19bicubic224
gluon_resnet101_v1c79.544 (20.456)94.586 (5.414)44.57bicubic224
gluon_resnext50_32x4d79.356 (20.644)94.424 (5.576)25.03bicubic224
gluon_resnet101_v1b79.304 (20.696)94.524 (5.476)44.55bicubic224
tf_efficientnet_b1 *tfp79.172 (20.828)94.450 (5.550)7.79bicubic240Google
gluon_resnet50_v1d79.074 (20.926)94.476 (5.524)25.58bicubic224
tf_mixnet_l *tfp78.846 (21.154)94.212 (5.788)7.33bilinear224Google
tf_efficientnet_b178.826 (21.174)94.198 (5.802)7.79bicubic240Google
gluon_inception_v378.804 (21.196)94.380 (5.620)27.16Mbicubic299MxNet Gluon
tf_mixnet_l78.770 (21.230)94.004 (5.996)7.33bicubic224Google
gluon_resnet50_v1s78.712 (21.288)94.242 (5.758)25.68bicubic224
gluon_resnet50_v1c78.010 (21.990)93.988 (6.012)25.58bicubic224
tf_inception_v377.856 (22.144)93.644 (6.356)27.16Mbicubic299Tensorflow Slim
gluon_resnet50_v1b77.578 (22.422)93.718 (6.282)25.56bicubic224
adv_inception_v377.576 (22.424)93.724 (6.276)27.16Mbicubic299Tensorflow Adv models
tf_efficientnet_b0 *tfp77.258 (22.742)93.478 (6.522)5.29bicubic224Google
tf_mixnet_m *tfp77.072 (22.928)93.368 (6.632)5.01bilinear224Google
tf_mixnet_m76.950 (23.050)93.156 (6.844)5.01bicubic224Google
tf_efficientnet_b076.848 (23.152)93.228 (6.772)5.29bicubic224Google
tf_mixnet_s *tfp75.800 (24.200)92.788 (7.212)4.13bilinear224Google
tf_mixnet_s75.648 (24.352)92.636 (7.364)4.13bicubic224Google
gluon_resnet34_v1b74.580 (25.420)91.988 (8.012)21.80bicubic224
gluon_resnet18_v1b70.830 (29.170)89.756 (10.244)11.69bicubic224

Models with *tfp next to them were scored with --tf-preprocessing flag.

The tf_efficientnet, tf_mixnet models require an equivalent for 'SAME' padding as their arch results in asymmetric padding. I've added this in the model creation wrapper, but it does come with a performance penalty.



All development and testing has been done in Conda Python 3 environments on Linux x86-64 systems, specifically Python 3.6.x and 3.7.x. Little to no care has been taken to be Python 2.x friendly and I don't plan to support it. If you run into any challenges running on Windows, or other OS, I'm definitely open to looking into those issues so long as it's in a reproducible (read Conda) environment.

PyTorch versions 1.0 and 1.1 have been tested with this code.

I've tried to keep the dependencies minimal, the setup is as per the PyTorch default install instructions for Conda:

conda create -n torch-env
conda activate torch-env
conda install -c pytorch pytorch torchvision cudatoolkit=10.0


This package can be installed via pip. Currently, the model factory (timm.create_model) is the most useful component to use via a pip install.

Install (after conda env/install):

pip install timm


>>> import timm
>>> m = timm.create_model('mobilenetv3_100', pretrained=True)
>>> m.eval()


A train, validation, inference, and checkpoint cleaning script included in the github root folder. Scripts are not currently packaged in the pip release.


The variety of training args is large and not all combinations of options (or even options) have been fully tested. For the training dataset folder, specify the folder to the base that contains a train and validation folder.

To train an SE-ResNet34 on ImageNet, locally distributed, 4 GPUs, one process per GPU w/ cosine schedule, random-erasing prob of 50% and per-pixel random value:

./distributed_train.sh 4 /data/imagenet --model seresnet34 --sched cosine --epochs 150 --warmup-epochs 5 --lr 0.4 --reprob 0.5 --remode pixel --batch-size 256 -j 4

NOTE: NVIDIA APEX should be installed to run in per-process distributed via DDP or to enable AMP mixed precision with the --amp flag

Validation / Inference

Validation and inference scripts are similar in usage. One outputs metrics on a validation set and the other outputs topk class ids in a csv. Specify the folder containing validation images, not the base as in training script.

To validate with the model's pretrained weights (if they exist):

python validate.py /imagenet/validation/ --model seresnext26_32x4d --pretrained

To run inference from a checkpoint:

python inference.py /imagenet/validation/ --model mobilenetv3_100 --checkpoint ./output/model_best.pth.tar


A number of additions planned in the future for various projects, incl

  • Do a model performance (speed + accuracy) benchmarking across all models (make runable as script)
  • Add usage examples to comments, good hyper params for training
  • Comments, cleanup and the usual things that get pushed back

Разместим вашу рекламу

Пиши: mail@pythondigest.ru

Нашли опечатку?

Выделите фрагмент и отправьте нажатием Ctrl+Enter.

Система Orphus