onnxruntime was initially designed to speed up inference and deployment but it can also be used to train a model. It builds a graph equivalent to the gradient function also based on onnx operators and specific gradient operators. Initializers are weights that can be trained. The gradient graph has as many as outputs as initializers. The first example compares a linear regression trained with scikit-learn and another one trained with onnxruntime-training.
The fourth example replicates what was done with the linear regression but with a neural network built by scikit-learn. It trains the network on CPU or GPU if it is available. The last example benchmarks the different approaches.