Towards Further Compression of Low-Bitwidth DNNs with Permuted Diagonal Matrices