In ML training, it is essential to understand and utilize an approach to adjusting the learning rate of a model. It helps with applying regularization to the model to prevent overfitting.

Learning rate decay is an example of a regularization technique which dynamically adjusts the learning rate of a model during its training process. It reduces the learning rate of the model over epochs or steps.

There are 2 main approaches to using learning rate schedulers in TF 2.0:

  • Using the callback LearningRateSchduler and applying your own function

  • Creating a custom subclass of tf.keras.optimizers.schedules.LearningRateSchedule

What is the difference ? The main difference is that approach 1 is meant to be called from the callbacks kwargs in the model.fit call whereas the second approach allows you to pass it as an input to the optimizer learning_rate kwarg.

1. Using the LearningRateScheduler callback

The callback class requires a function of the form:

def my_lr_scheduler(epoch, lr):
  # custom code to adjust learning rate

  # return new learning rate

The custom function needs to handle 2 parameters: epoch and lr (learning rate). This callback will be invoked at the beginning of every epoch, passing in the current epoch and optimizer learning rate. The custom function will need to return the new learning rate value, which the callback uses to update the learning rate of the optimizer

To invoke the example callback above:

from tensorflow.keras.callbacks import LearningRateScheduler
...

mymodel.fit(
	...
	callbacks=[LearningRateScheduler(my_lr_scheduler)]
)

2. Subclass the LearningRateSchedule base class

The LearningRateSchedule base class adjusts the learning rate per step / batch of training, rather than over an entire epoch. This is useful if you are training your model in steps rather than epochs. For example, in GAN training

Example of creating a custom LR scheduler class:

from tensorflow.keras.optimizers.schedules import LearningRateSchedule

class LinearLRSchedule(LearningRateSchedule):
    def __init__(self, initial_learning_rate, max_iters, **kwargs):
        super(LinearLRSchedule, self).__init__(**kwargs)
        self.initial_learning_rate = initial_learning_rate
        self.max_iters = max_iters

    def __call__(self, step):
        new_lr = self.initial_learning_rate * (1 - (step / float(self.max_iters)))
        return new_lr

    def get_config(self):
        return {
            "initial_learning_rate": self.initial_learning_rate,
            "max_iters": self.max_iters
        }

During training, the subclass would be passed directly into the learning_rate kwargs of an optimizer object:

import tensorflow as tf

optimizer = tf.keras.optimizers.SGD(learning_rate=LinearLRSchedule(1e-1, 100))

Resources

LearningRateScheduler Callback

LearningRateSchedule Class