Where is training model in deepqlearning and experience replay section of algorithm
I have a problem in understanding the training section of deepqlearning in the paper of Deepmind. https://www.nature.com/nature/journal/v518/n7540/pdf/nature14236.pdf
In its algorithm, where is the section of training? How we can train and how we can test this algorithm? Also, which section is experience replay section?
See also questions close to this topic

storage and retrieve process of hop field network
Can anyone explain me about hop field network how it stores the data while learning and how it retrieves data while testing it with a given input with detailed example with respect to hop field net architecture? Is hop field net architecture meant for learning process or for retrieving process? I have read that in hop field net it is local and incremental, how is it possible with a single recurrent network layer? i want know how it will works in back end process! Please let me if you know this..... thanks in advance!....

Clarification on the Detector/Decoder Task of Multinet?
I am a totally confused about the working of the Detector/Decoder "Task" of Multinet (arxiv:1612.07695v1).
What are 2 labels/classes? According to the paper, the 1st and 2nd channel of the prediction output gives the confidence that an object of interest is present at a particular location.
the last 4 channels are the bounding box coordinate ( x0, y0, h, w).
 are those coordinates at the scale of the input image dimension, or at the scale of the (39x12) feature map?
What is "delta prediction" (the residue)? Is it the correction to be applied to the coarse estimate of the bounding box (from the
prediction
)

ValueError: Shapes (?,) and (?, 1) are incompatible
I am attempting to use some of the
tf.metrics
functions on myeval
data for a binary classification problem. I have been usingtf.metrics.accuracy()
with no errors, but when I usetf.metrics.mean_per_class_accuracy()
I get the following:ValueError: Shapes (?,) and (?, 1) are incompatible.
I have seen some related posts but they are quite old and seem to have been fixed by updating tensorflow to a (now out of date) version. I am running tensorflow 1.5.0.
eval_metric_ops = { "accuracy": tf.metrics.accuracy( labels=labels, predictions=predictions["classes"], name="accuracy"), "per_class_accuracy": tf.metrics.mean_per_class_accuracy( labels=labels, predictions=predictions["classes"], num_classes=2, name="per_class_accuracy") }
Anyone know what might be happening? Thanks in advance!

Why i need decision tree to predict my test data when i'm giving large training data to Train computer?
I'm a student of BSCS, I'm learning data mining. 1 question in mind has totally confused me. If I had made or written my coding for decision tree then why I need Large training data to predict answers of my test data? How training data is helping me when I am checking my each test data line through some lines of codes? Passing them through step by step, if at any step they are not matching my conditions I'm writing no in answer if they are passing all conditions successfully then I'm writing Yes in answer using my code. Now how training data is helping ? if it help computer to predict then why I need my decision tree or model to answer my data??

TensorFlow Model Restore and Save Multiple Times Model Error  At least two variables have the same name: beta1_power
I defined a model that I need to save and retrain multiple times given that I need to train the model with data that is upload everytime they run a function. Basically my function "step" receives the variables everytime is runned and it should train the model with this data. I am defining the graph and then I save the model and restore it to trained with the new data and then save it again and so on. My problem is that I am having errors when I am trying to save the model for a second time.
class Neural(object): def __init__(self, number_of_features, number_of_hidden, number_of_actions, initial_state, step_size=0.01): self._number_of_features = number_of_features self._number_of_hidden = number_of_hidden self._number_of_actions = number_of_actions self._last_state = initial_state self._last_action = 0 self._step_size = step_size self._index = True self._dim1 = initial_state.shape[0] self._dim2 = initial_state.shape[0] def q(self, obs): # This function should give the vector of action values for observation obs with tf.Session() as sess: saver_q = tf.train.import_meta_graph('/content/model.chkp.meta') path = "/content/model.chkp" saver_q.restore(sess,path) complete_states = np.stack((obs,obs),axis = 0) q_value = sess.run(new_action_values,feed_dict={states:completes_states}) return q_value def step(self, r, g, s): # This function should return an action # This allow us to not redifining the computational graph twice # We define the computational graph dim1 = self._dim1 dim2 = self._dim2 flat_size = self._number_of_features print(s) print(self._last_state) input_states = np.float32(np.stack((s,self._last_state),axis = 0)) print(input_states.shape) print(type(input_states)) lr = self._step_size print(self._index) if self._index: # We only run the definition of the graph once. After this we set the index to false and so we only run the traning part in a new session ### Training section of the model #create graph() tf.reset_default_graph() ### Definition of the graph states = tf.placeholder(tf.float32,[2,dim1,dim2]) # This placeholder is to store the state states_flat = tf.reshape(states,[2,flat_size]) h1 = tf.layers.dense(inputs = states_flat,units = self._number_of_hidden, activation = tf.nn.relu) action_values = tf.layers.dense(inputs = h1,units = self._number_of_actions) # This produce the action values for S' and S. It has a shape of [2,4] new_action_values = action_values[0,:] # Q(S') we obtain all the action values for S'. Shape (4,) previous_action_values = action_values[1,:] # Q(S) we obtain all the action values for S. Shape (4,) aux = epsilon_greedy(np.array(new_action_values),0.1) new_action = tf.convert_to_tensor(aux, np.int32) # We get the new action new_q_s_a = new_action_values[new_action] # The Q(S',A') to calculate the loss function old_q_s_a = previous_action_values[self._last_action] # The Q(S,A) to calculate the loss function loss = 0.5*(r+g*new_q_s_a  old_q_s_a)**2 # We define loss function train_step = tf.train.AdamOptimizer(learning_rate= lr).minimize(loss) init = tf.global_variables_initializer() with tf.Session() as sess: sess.run(init) saver = tf.train.Saver() path = "/content/model.chkp" print(path) saver.save(sess,path) with tf.Session() as sess: new_saver = tf.train.import_meta_graph('/content/model.chkp.meta') path = "/content/model.chkp" new_saver.restore(sess,path) sess.run(train_step,feed_dict={states:input_states}) action = sess.run(new_action,feed_dict={states:input_states}) saver2= tf.train.Saver() print(path) saver2.save(sess,path) self._index = False self._last_state = s self._last_action = action print(action) print(self._index) return action
When I run my code that basically run the defined the class I coded and call multiple times the function step to train the model, I am getting a mistake and is pointing me to :
saver2= tf.train.Saver() raise RuntimeError("Use save/restore instead of build in eager mode.") raise ValueError("Graph mode needs to build save and restore together.") ValueError: At least two variables have the same name: beta1_power
I would like to know which is the correct way to save and restore multiple times a model to train at different moments in the code, because I haven't been able to fix this mistake.

A supervised learning algorithm with unbalanced training data
I want to develop a method which recognizes a specific insect among others based on its thumbnail, for example :
There is a first filter based on static metrics such as color, area, length, etc. Then I want to binary classify the remaining thumbnails, that is : is my picture the specific insect or not ?
I have a database with 30 thumbnails of the insect (maybe it's not enough), and a lot of thumbnails (more than 200) of "others".
 Is it necessary to have a balanced database for the method I chose ?
 I was thinking about two methods to achieve this goal : KNN and SVM, do you think it is appropriate ?
Thank you. :)

Deep Q Network is not learning
I tried to code a Deep Q Network to play Atari games using Tensorflow and OpenAI's Gym. Here's my code:
import tensorflow as tf import gym import numpy as np import os env_name = 'Breakoutv0' env = gym.make(env_name) num_episodes = 100 input_data = tf.placeholder(tf.float32,(None,)+env.observation_space.shape) output_labels = tf.placeholder(tf.float32,(None,env.action_space.n)) def convnet(data): layer1 = tf.layers.conv2d(data,32,5,activation=tf.nn.relu) layer1_dropout = tf.nn.dropout(layer1,0.8) layer2 = tf.layers.conv2d(layer1_dropout,64,5,activation=tf.nn.relu) layer2_dropout = tf.nn.dropout(layer2,0.8) layer3 = tf.layers.conv2d(layer2_dropout,128,5,activation=tf.nn.relu) layer3_dropout = tf.nn.dropout(layer3,0.8) layer4 = tf.layers.dense(layer3_dropout,units=128,activation=tf.nn.softmax,kernel_initializer=tf.zeros_initializer) layer5 = tf.layers.flatten(layer4) layer5_dropout = tf.nn.dropout(layer5,0.8) layer6 = tf.layers.dense(layer5_dropout,units=env.action_space.n,activation=tf.nn.softmax,kernel_initializer=tf.zeros_initializer) return layer6 logits = convnet(input_data) loss = tf.losses.sigmoid_cross_entropy(output_labels,logits) train = tf.train.GradientDescentOptimizer(0.001).minimize(loss) saver = tf.train.Saver() init = tf.global_variables_initializer() discount_factor = 0.5 with tf.Session() as sess: sess.run(init) for episode in range(num_episodes): x = [] y = [] state = env.reset() feed = {input_data:np.array([state])} print('episode:', episode+1) while True: x.append(state) if (episode+1)/num_episodes > np.random.uniform(): Q = sess.run(logits,feed_dict=feed)[0] action = np.argmax(Q) else: action = env.action_space.sample() state,reward,done,info = env.step(action) Q = sess.run(logits,feed_dict=feed)[0] new_Q = np.zeros(Q.shape) new_Q[action] = reward+np.amax(Q)*discount_factor y.append(new_Q) if done: break for sample in range(len(x)): _,l = sess.run([train,loss],feed_dict={input_data:[x[sample]],output_labels:[y[sample]]}) print('training loss on sample '+str(sample+1)+': '+str(l)) saver.save(sess,os.getcwd()+'/'+env_name+'DQN.ckpt')
The Problem is that:
 The loss isn't decreasing while training and is always somewhere around 0.7 or 0.8
 When I test the network on the Breakout environment even after I trained it for 1000 episodes, the actions still seem kind of random and it rarely hits the ball.
I already tried to use different loss functions (softmax crossentropy and mean squared error), use another optimizer (Adam) and increasing the learning rate but nothing changed.
Can someone tell me how to fix this?

Why is there no nstep Qlearning algorithm in Sutton's RL book?
I think I am messing something up.
I always thought that:
 1step TD onpolicy = Sarsa
 1step TD offpolicy = QlearningThus I conclude:  nstep TD onpolicy = nstep Sarsa
 nstep TD offpolicy = nstep QlearningIn Sutton's book, however, he never introduces nstep QLearning, but he does introduce nstep offpolicy Sarsa. Now I feel confused.
Can someone help me with the naming?
Link to Sutton's book (OffPolicy nstep Sarsa on page 149)

Neural network SARSA implementation
I am trying to implement a RL agent that uses a neural network to emulate SARSA on a grid world rather than a traditional table based approach. The architecture of the network I am trying to build is a multilayer perceptron with one hidden layer. My understanding is that the output should be the action values for a given state. My issue is that I am struggling to visualise how SARSA ports over to the NN space. Say I have a step function that is given a new state, reward (from the transition to the state) and discount factor how can I use these values to train my network?
Thanks.