Hyperparameter Optimization
Trains is now ClearML
This documentation applies to the legacy Trains versions. For the latest documentation, see ClearML.
This page describes Trains automation, hyperparameter optimization example script hyper_parameter_optimizer.py example, which is an example script in the trains
repository, examples/optimization/hyper-parameter-optimization
directory.
Set the search strategy for optimization
We require a search strategy for the optimization, and a search strategy optimizer class to implement that strategy.
We can use one of the following search strategies:
- Optuna hyperparameter optimization - automation.optuna.optuna.OptimizerOptuna. For more information about Optuna, see the Optuna documentation.
-
BOHB - automation.hpbandster.bandster.OptimizerBOHB
BOHB performs robust and efficient hyperparameter optimization at scale by combining the speed of Hyperband searches with the guidance and guarantees of convergence of Bayesian Optimization.
Trains implements BOHB for automation with HpBandSter's bohb.py. For more information about HpBandSter BOHB, see the HpBandSter documentation.
-
Random uniform sampling of hyperparameters strategy - automation.optimization.RandomSearch
- Full grid sampling strategy of every hyper-parameter combination - Grid search automation.optimization.GridSearch.
- Custom - Use a custom class and inherit from the Trains automation base strategy class,
automation.optimization.SearchStrategy
.
The search strategy class we choose will be passed to the automation.optimization.HyperParameterOptimizer object later.
Our example code attempts to import the OptimizerOptuna
for the search strategy. If you do not have trains.automation.optuna
installed, it attempts to import OptimizerBOHB
. If you do not have trains.automation.hpbandster
installed, it uses the RandomSearch
for the search strategy.
aSearchStrategy = None
if not aSearchStrategy:
try:
from trains.automation.optuna import OptimizerOptuna
aSearchStrategy = OptimizerOptuna
except ImportError as ex:
pass
if not aSearchStrategy:
try:
from trains.automation.hpbandster import OptimizerBOHB
aSearchStrategy = OptimizerBOHB
except ImportError as ex:
pass
if not aSearchStrategy:
logging.getLogger().warning(
'Apologies, it seems you do not have \'optuna\' or \'hpbandster\' installed, '
'we will be using RandomSearch strategy instead')
aSearchStrategy = RandomSearch
Define a callback
When the optimization starts, we will provide a callback that returns the best performing set of hyperparameters. Here, we define that call method, job_complete_callback
which returns the ID of top_performance_job_id
.
def job_complete_callback(
job_id, # type: str
objective_value, # type: float
objective_iteration, # type: int
job_parameters, # type: dict
top_performance_job_id # type: str
):
print('Job completed!', job_id, objective_value, objective_iteration, job_parameters)
if job_id == top_performance_job_id:
print('WOOT WOOT we broke the record! Objective reached {}'.format(objective_value))
Initialize the optimization Task
Initialize the Trains Task which will be stored in Trains Server when the code runs. After the code runs at least once, you can rerun, reproduce, and the optimization, see Executing Experiments Remotely.
We set the Task type as optimizer, and create a new experiment (and Task object) each time the optimizer runs (reuse_last_task_id=False
).
When the code runs, you will see an experiment named Automatic Hyper-Parameter Optimization associated with the project Hyper-Parameter Optimization in the Trains Web-App (UI).
# Connecting TRAINS
task = Task.init(project_name='Hyper-Parameter Optimization',
task_name='Automatic Hyper-Parameter Optimization',
task_type=Task.TaskTypes.optimizer,
reuse_last_task_id=False)
Setup the arguments
We create an arguments dictionary that contains the ID of the Task to optimize, and a Boolean indicating whether the optimizer will run as a service, see Running as a service.
In this example, we optimize the experiment named Keras HP optimization base which must have run at least once so that it is in Trains Server.
However, since the arguments dictionary is connected to the Task, after the code runs once, you can change template_task_id and optimize different experiment, see Tuning experiments and Tuning hyperparameters.
# experiment template to optimize in the hyper-parameter optimization
args = {
'template_task_id': None,
'run_as_service': False,
}
args = task.connect(args)
# Get the template task experiment that we want to optimize
if not args['template_task_id']:
args['template_task_id'] = Task.get_task(
project_name='examples', task_name='Keras HP optimization base').id
Instantiate the optimizer object
Instantiate an automation.optimization.HyperParameterOptimizer object, setting the optimization parameters, beginning with the ID of the experiment to optimize.
an_optimizer = HyperParameterOptimizer(
# This is the experiment we want to optimize
base_task_id=args['template_task_id'],
Set the hyperparameter ranges to sample, instantiating them as Trains automation objects using automation.parameters.UniformIntegerParameterRange and automation.parameters.DiscreteParameterRange.
hyper_parameters=[
UniformIntegerParameterRange('layer_1', min_value=128, max_value=512, step_size=128),
UniformIntegerParameterRange('layer_2', min_value=128, max_value=512, step_size=128),
DiscreteParameterRange('batch_size', values=[96, 128, 160]),
DiscreteParameterRange('epochs', values=[30]),
],
Set the metric to optimize and the optimization objective.
objective_metric_title='val_acc',
objective_metric_series='val_acc',
objective_metric_sign='max',
Set the number of concurrent Tasks.
max_number_of_concurrent_tasks=2,
Set the optimization strategy, see Set the search strategy for optimization.
optimizer_class=aSearchStrategy,
Specify the queue to use for remote execution. We override this, if the optimizer runs as a service.
execution_queue='1xGPU',
Specify the remaining parameters, including the time limit per Task (minutes), period for checking the optimization (minutes), maximum number of jobs to launch, minimum and maximum number of iterations for each Task.
# Optional: Limit the execution time of a single experiment, in minutes.
# (this is optional, and if using OptimizerBOHB, it is ignored)
time_limit_per_job=10.,
# Check the experiments every 6 seconds is way too often, we should probably set it to 5 min,
# assuming a single experiment is usually hours...
pool_period_min=0.1,
# set the maximum number of jobs to launch for the optimization, default (None) unlimited
# If OptimizerBOHB is used, it defined the maximum budget in terms of full jobs
# basically the cumulative number of iterations will not exceed total_max_jobs * max_iteration_per_job
total_max_jobs=10,
# This is only applicable for OptimizerBOHB and ignore by the rest
# set the minimum number of iterations for an experiment, before early stopping
min_iteration_per_job=10,
# Set the maximum number of iterations for an experiment to execute
# (This is optional, unless using OptimizerBOHB where this is a must)
max_iteration_per_job=30,
Running as a service
The optimization can run as a service, if you set the run_as_service
argument to true
. For more information about running as a service, see Trains Agent services container on "Concepts and Architecture" page.
# if we are running as a service, just enqueue ourselves into the services queue and let it run the optimization
if args['run_as_service']:
# if this code is executed by `trains-agent` the function call does nothing.
# if executed locally, the local process will be terminated, and a remote copy will be executed instead
task.execute_remotely(queue_name='services', exit_process=True)
Optimize
The optimizer is ready. Set the report period and start it, providing the callback method to report the best performance.
# report every 12 seconds, this is way too often, but we are testing here J
an_optimizer.set_report_period(0.2)
# start the optimization process, callback function to be called every time an experiment is completed
# this function returns immediately
an_optimizer.start(job_complete_callback=job_complete_callback)
# set the time limit for the optimization process (2 hours)
Now that it is running, set a time limit for optimization, wait, get the best performance, print it, and finally stop the optimizer.
# set the time limit for the optimization process (2 hours)
an_optimizer.set_time_limit(in_minutes=90.0)
# wait until process is done (notice we are controlling the optimization process in the background)
an_optimizer.wait()
# optimization is completed, print the top performing experiments id
top_exp = an_optimizer.get_top_experiments(top_k=3)
print([t.id for t in top_exp])
# make sure background optimization stopped
an_optimizer.stop()
print('We are done, good bye')