Neural gradients are near lognormal: Improves quantized and sparse training