I am quite new to Pytorch. I am trying to solve an optimisation problem using actor critic deep reinforcement learning. Unfortunately, the following error message appears when I try to solve the problem:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [64, 2]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!

I suspect that the errot enter code here`occurs in the following code snippet. In the post (RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation?) I have already read that the error is related to the fact that the .step() changes parameters that are needed to calculate the gradients of another loss function. However, I could not fix the error.

The code snippet (partly similiar to https://www.youtube.com/watch?v=G0L8SN02clA):

```
def optimize_model(self):
if self.memory.mem_counter < self.batch_size:
return
state, prob, reward, next_state, done = self.memory.sample_buffer(self.batch_size)
states = torch.tensor(state).to(self.actor.device)
probs = torch.tensor(prob).to(self.actor.device)
rewards = torch.tensor(reward).to(self.actor.device)
next_states = torch.tensor(next_state).to(self.actor.device)
dones = torch.tensor(done).to(self.actor.device)
critic_value = self.critic.forward(states)
critic_value_next = self.critic.forward(next_states)
critic_value_next[done] = 0.0
delta = rewards + self.gamma*critic_value_next #maximaler Wert des naechsten Zustands
actor_loss = -torch.mean(probs*(delta - critic_value))
self.actor.optimizer.zero_grad()
actor_loss.backward()
self.actor.optimizer.step()
critic_value = self.critic.forward(states).detach()
delta = (rewards + self.gamma*critic_value_next).detach()
critic_loss = F.mse_loss(delta, critic_value)
self.critic.optimizer.zero_grad()
critic_loss.backward(retain_graph=True)
self.critic.optimizer.step()
```

Rewards, critic_value and critic_value_next are pytorch tensors with shape(64,2).

Source: Python Questions