NNI Tuner Comparasion Using MNIST

This project gives an overview of various tuners provided by NNI (https://github.com/Microsoft/nni/) and compared them based on MNIST dataset from official examples.

Built-in Tuners provided by NNI

NNI provides state-of-the-art tuning algorithm as our builtin-tuners and makes them easy to use. Here are the list:

  • TPE (suggested tuner type for most cases?)
  • Random Search
  • Anneal
  • Naive Evolution
  • SMAC (Optimized for discrete hyperparameters)
  • Batch tuner
  • Grid Search
  • Hyperband
  • Network Morphism (pytorch only)
  • Metis Tuner

Config Tuner in config.yml

Usually, base configuration for tuner:

1
2
3
4
5
# config.yml
tuner:
builtinTunerName: TPE(or other types)
classArgs:
optimize_mode: maximize

MNIST Examples Utilizing Different Tuners

Analysis

NNI WebUI provides us with multiple ways to montitor training process, which helps us to kill bad trial even in the running phase from webpage. Through these all-round training information, we can analyze various tuners from total two different perspectives.

Learning to Learn

There are two types tuners, the one is some kind of brutal and the other contains some intelligence, and let’s classify our tuners based on this:

Brutal: GridSearch Tuner, Batch Tuner, Random Tuner

Intelligent: TPE Tuner, Evolutionary Tuner, Anneal Tuner, SMAC Tuner

Now, let’s check the performance of each tuner based on Figure 1 above. It’s easy to notice that as the training phase goes, the overall performance of intelligent tuners performs better and better as the training goes, since it learns how to schedule the hyper parameters. Here, the best tuner under this circumstance is Anneal Tuner. I’ m quite astonished until I read the official documents in NNI GitHub. It says:

This simple annealing algorithm begins by sampling from the prior, but tends over time to sample from points closer and closer to the best ones observed.

So, it’s not so surprised to see that Anneal achieves the best 10 average trials score. However, It may fall into local optimal due to shortsighted. TPE addresses this search through ensemble method, like tree based. On the contrary, brutal methods performs quite poor since you can see it still have pretty much trials far below the average score. But sometime, it may lead to a global optimal if you have enough time :)

Close to Visible Optimal

Besides the scatter figure, NNI web UI also provides us with best 10 trials during experiments. And from Figure 2, you can identity which tuner can have most efficiency finding possible optimal. Although, Grid Search and Random Tuner achieves competitive score in the 50 trials, but its best 10 trials have a large variance. Comparatively, TPE, Anneal Tuner and SMAC achieves much higher score in its best 10 trials, in other words, these tuners are more likely to meet global optimal.