When training with DeepSpeed ZeRO Stage 2 and optimizer offload to CPU, calling engine.backward(loss_) results in empty IPG buckets during gradient reduction (e.g., bucket.buffer: []). This leads to ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results