Search
Now showing items 1-1 of 1
The Janus effects of SGD vs GD: high noise and low rank
(2023-12-21)
It was always obvious that SGD has higher fluctuations at convergence than GD. It has also been often reported that SGD in deep RELU networks has a low-rank bias in the weight matrices. A recent theoretical analysis linked ...