Pytorch RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

  actor, optimization, python, pytorch, runtime-error

I am quite new to Pytorch. I am trying to solve an optimisation problem using actor critic deep reinforcement learning. Unfortunately, the following error message appears when I try to solve the problem:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 2]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I suspect that the errot enter code here`occurs in the following code snippet. In the post (RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation?) I have already read that the error is related to the fact that the .step() changes parameters that are needed to calculate the gradients of another loss function. However, I could not fix the error.

The code snippet (partly similiar to https://www.youtube.com/watch?v=G0L8SN02clA):

def optimize_model(self):
    if self.memory.mem_counter < self.batch_size:
        return 
             

    state, prob, reward, next_state, done = self.memory.sample_buffer(self.batch_size)
    

    states = torch.tensor(state).to(self.actor.device)
    probs = torch.tensor(prob).to(self.actor.device)
    rewards = torch.tensor(reward).to(self.actor.device)
    next_states = torch.tensor(next_state).to(self.actor.device)
    dones = torch.tensor(done).to(self.actor.device)
    
    critic_value = self.critic.forward(states) 
    critic_value_next = self.critic.forward(next_states) 
    
    critic_value_next[done] = 0.0 
    
    
    delta = rewards + self.gamma*critic_value_next #maximaler Wert des naechsten Zustands
    

    actor_loss = -torch.mean(probs*(delta - critic_value)) 
     
    self.actor.optimizer.zero_grad()
    actor_loss.backward()
    self.actor.optimizer.step() 
    
    critic_value = self.critic.forward(states).detach()
    delta = (rewards + self.gamma*critic_value_next).detach()

     
    critic_loss = F.mse_loss(delta, critic_value)
    
    self.critic.optimizer.zero_grad()
    critic_loss.backward(retain_graph=True)
    self.critic.optimizer.step()
    

Rewards, critic_value and critic_value_next are pytorch tensors with shape(64,2).

Source: Python Questions

LEAVE A COMMENT