How to implement fitted Q iteration?

im trying to implement this algorithm
Fitted Q Iteration

They are two Questions:

  1. Collect D samples. Do I let the conicidence decide which action A_i the Agent takes?

  2. How do I implement argmin theta? Do I need to use gradient descent to get the it? Because tensorflow offers tf.argmin() which just takes the lowest value in a tensor which doesnt make sense here. So i suggest I have to figure theta’ out by myself?

