In Keras or Tensorflow clipnorm rescales large "gradients" to have a specific norm and clipvalue bounds all the values of the "gradient". But what happens if you combine one of them with moemntum or something like adam. A) Is clipnorm applied on the actual pure mathematical gradient of the loss with respect to the parameters ..

