im trying to implement this algorithm
Fitted Q Iteration
They are two Questions:
Collect D samples. Do I let the conicidence decide which action A_i the Agent takes?
How do I implement argmin theta? Do I need to use gradient descent to get the it? Because tensorflow offers tf.argmin() which just takes the lowest value in a tensor which doesnt make sense here. So i suggest I have to figure theta’ out by myself?
Source: Python Questions