Setting up custom learning rate schedulers in TF 2.0
In ML training, it is essential to understand and utilize an approach to adjusting the learning rate of a model. It helps with applying regularization to the model to prevent overfitting.
Learning rate decay is an example of a regularization technique which dynamically adjusts the learning rate of a model during its training process. It reduces the learning rate of the model over epochs or steps.
There are 2 main approaches to using learning rate schedulers in TF 2.0:
- 
    Using the callback LearningRateSchdulerand applying your own function
- 
    Creating a custom subclass of tf.keras.optimizers.schedules.LearningRateSchedule
What is the difference ? The main difference is that approach 1 is meant to be called from the callbacks kwargs in the model.fit call whereas the second approach allows you to pass it as an input to the optimizer learning_rate kwarg.
1. Using the LearningRateScheduler callback
The callback class requires a function of the form:
def my_lr_scheduler(epoch, lr):
  # custom code to adjust learning rate
  # return new learning rateThe custom function needs to handle 2 parameters: epoch and lr (learning rate). This callback will be invoked at the beginning of every epoch, passing in the current epoch and optimizer learning rate. The custom function will need to return the new learning rate value, which the callback uses to update the learning rate of the optimizer
To invoke the example callback above:
from tensorflow.keras.callbacks import LearningRateScheduler
...
mymodel.fit(
	...
	callbacks=[LearningRateScheduler(my_lr_scheduler)]
)2. Subclass the LearningRateSchedule base class
The LearningRateSchedule base class adjusts the learning rate per step / batch of training, rather than over an entire epoch. This is useful if you are training your model in steps rather than epochs. For example, in GAN training
Example of creating a custom LR scheduler class:
from tensorflow.keras.optimizers.schedules import LearningRateSchedule
class LinearLRSchedule(LearningRateSchedule):
    def __init__(self, initial_learning_rate, max_iters, **kwargs):
        super(LinearLRSchedule, self).__init__(**kwargs)
        self.initial_learning_rate = initial_learning_rate
        self.max_iters = max_iters
    def __call__(self, step):
        new_lr = self.initial_learning_rate * (1 - (step / float(self.max_iters)))
        return new_lr
    def get_config(self):
        return {
            "initial_learning_rate": self.initial_learning_rate,
            "max_iters": self.max_iters
        }During training, the subclass would be passed directly into the learning_rate kwargs of an optimizer object:
import tensorflow as tf
optimizer = tf.keras.optimizers.SGD(learning_rate=LinearLRSchedule(1e-1, 100))