- AdamWOptimizer#

AdamWOptimizer - 1 (


This version of the operator has been available since version 1 of domain



  • adam_mode: Modes for applying bias correction and weight decay (default 0) 0 : Weight decay is applied before weight is updated. Computation aligned with Torch AdamW. In this mode, correct_bias should be 1 to keep aligned with PyTorch.1 : Weight decay is applied after weight is updated. Computation is aligned with Huggingface AdamW. Default value is ?.

  • alpha: Coefficient of previously accumulated gradient in running average. Default value is ?.

  • beta: Coefficient of previously accumulated squared-gradient in running average. Default value is ?.

  • correct_bias: Whether or not to correct bias, enabled by default. Default value is ?.

  • epsilon: Small scalar to avoid dividing by zero. Default value is ?.

  • weight_decay: weight decay coefficient. Default value is ?.


Between 6 and 7 inputs.

  • lr (heterogeneous) - T1: The learning rate.

  • step (heterogeneous) - T2: The update count of weights. It should be a scalar.

  • weights (heterogeneous) - S_WEIGHT: Sequence of weights to optimize.

  • gradients (heterogeneous) - S_GRAD: Sequence of gradients computed in this iteration.

  • momentums_1 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical gradients.

  • momentums_2 (heterogeneous) - S_MOMENT: Sequence of exponentially averaged historical squared gradients.

  • update_signal (optional, heterogeneous) - T_BOOL: This signal indicates if weight updates are skipped, applicable to gradient infinity check in mixed precision training.


Between 1 and 4 outputs.

  • updated_flag (heterogeneous) - T2: Whether gradient is applied or not.

  • updated_weights (optional, heterogeneous) - S_WEIGHT: Sequence of weights after optimize.

  • updated_momentums_1 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_1 after optimize.

  • updated_momentums_2 (optional, heterogeneous) - S_MOMENT: Sequence of momentum_2 after optimize.