Implementing a loss function (MSVE) in Reinforcement learning
I am trying to build a temporal difference learning agent for Othello. While the rest of my implementation seems to run as intended I am wondering about the loss function used to train my network. In Sutton's book "Reinforcement learning: An Introduction", the Mean Squared Value Error (MSVE is presented as the standard loss function. It is basically a Mean Square Error multiplied with the on policy distribution. (Sum over all states s ( onPolicyDistribution(s) * [V(s)  V'(s,w)]² ) )
My question is now: How do I obtain this on policy distribution when my policy is an egreedy function of a learned value function? Is it even necessary and what's the issue if I just use an MSELoss instead?
I'm implementing all of this in pytorch, so bonus points for an easy implementation there :)
See also questions close to this topic

How to implement Actor Critic with policy gradient in Python problems with function approximation weight updates
I am implementing policy gradient methods using Python. The following is the pseudocode :
consider mew(s) is phi ( the variables in the state) multiplied by theta( The actor). We approximate the stateaction value function Q using w.
 Take action, observe reward, next state and then find the next action
 Delta = r + gamma*Q(sdash,adash)  Q(s,a) ( sdash and adash refer to the next states and actions
 w = w + Beta * phi
 theta = theta + alpha *((a  mew(s))*phi(s))/sigma^2
I have used the framework from http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/pg.pdf
This is an actor critic algorithm with w being the parameter to update for the Critic, and theta for the actor.
I am unsure If i have understood the update method for the weights correctly.

How to set the input for LSTM in Keras
I'm new to Keras, and I find it hard to understand the shape of input data of the LSTM layer.The Keras Documentation says that the input data should be 3D tensor with shape (nb_samples, timesteps, input_dim). I'm having trouble of understanding this format. Does the timesteps variable represent the number of timesteps the network remembers?
In my data a few time steps affect the output of the network but I do not know how many in advance i.e. I can't say that the previous 10 samples affect the output. For example the input can be words that form sentences. There is an important correlation between the words in each sentence. I don't know the length of the sentence in advance, this length also vary from one sentence to another. I do know when the sentence ends (i.e. i have a period that indicates the ending). Two different sentences has no affect one on the other  there is no need to remember the previous sentence.
I'm using the LSTM network for learning a policy in reinforcement learning, so I don't have a fixed data set. The agent's policy will change the length of the sentence.
How should I shape my data? How should it be fed into the Keras LSTM layer?

Python Tensorflow : __init__() got an unexpected keyword argument 'partition_info'
h1 = tf.get_variable("h1", shape = [n_input, n_hidden_1], dtype = tf.float32, initializer = tf.random_normal_initializer)

Load CNN Tensorflow Model
I have loaded a tensorflow model ( CNN ) but now I am not understanding how to actually make a prediction after restoring the model.
def GaussianFilterModel(data): number_of_classes = data.shape[2] # conv1 with tf.variable_scope('conv1') as scope: kernel = tf.get_variable('w2', [5,5,3,64], initializer=tf.truncated_normal_initializer(5e2), dtype=tf.float32) conv = tf.nn.conv2d(data, kernel, [1,1,1,1], padding='SAME') biases = tf.get_variable('b2', [64], initializer=tf.constant_initializer(0.1), dtype=tf.float32) pre_activation = tf.nn.bias_add(conv, biases) pre_activation = tf.nn.dropout(pre_activation, 0.6) act = tf.nn.relu(pre_activation, name=scope.name) #norm norm = tf.nn.lrn(act, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75, name='norm') #pool pool = tf.nn.max_pool(norm, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME', name='pool') with tf.variable_scope('conv4') as scope: kernel4 = tf.get_variable('w3', [1,1,64, 3], initializer=tf.truncated_normal_initializer(5e2), dtype=tf.float32) conv4 = tf.nn.conv2d(pool, kernel4, [1,1,1,1], padding='SAME') biases4 = tf.get_variable('b3', [3], initializer=tf.constant_initializer(0.1), dtype=tf.float32) pre_activation4 = tf.nn.bias_add(conv4, biases4) pre_activation4 = tf.nn.dropout(pre_activation4, 0.6) with tf.variable_scope('local1') as scope: output = tf.image.resize_images(pre_activation4, tf.Variable([50, 50], tf.int32)) return output
I have saved the model using the following code snippet:
saver = tf.train.Saver() saver.save(sess, 'my_test_model') sess = tf.Session() root_dir = '/Users/navneetmkumar/Desktop/TUM/' saver = tf.train.import_meta_graph(root_dir+'my_test_model.meta') saver.restore(sess,tf.train.latest_checkpoint('./')) graph = tf.get_default_graph() x = graph.get_tensor_by_name('x:0') y = graph.get_tensor_by_name('y:0')
Dont know what to do beyond this ?

L1 norm as regularizer in Pytorch
I need add a L1 norm as regularizer for create a sparsity condition in my neural network. I would like to train my network to classification and I'm I need add a L1 norm as regularizer for create a sparsity condition. I need to train network to classification and I'm starting with pytorch and I don't have any ideia how to do this. I tryed construct a L1 norm by myself, like here, but didn't work too.
Can someone help me? I need put this regularizer after ConvTranspose2d, I would like to do someting like this in Keras:
model.add(Dense(64, input_dim=64, kernel_regularizer=regularizers.l2(0.01), activity_regularizer=regularizers.l1(0.01)))
But my network was created in PyTorch according to the code below
upconv = nn.ConvTranspose2d(inner_nc, outer_nc, kernel_size=4, stride=2, padding=1, bias=use_bias) down = [downrelu, downconv] up = [uprelu, upconv, upnorm] model = down + up
Thanks

Is it possible to train pytorch and tensorflow model together on one GPU?
I have a pytorch model and a tensorflow model, I want to train them together on one GPU, following the process bellow:
input > pytorch model> output_pytorch > tensorflow model > output_tensorflow > pytorch model
.Is is possible to do this? If answer is yes, is there any problem which I will encounter?
Thanks in advance.

Why the loss of CV set is always lower then train set in my MLP model?
fellow is my loss curve: loss curve
my loss function is defined as:
losses = tf.square(self.predictions  self.input_y) self.loss = tf.reduce_mean(losses) + tf.reduce_mean( slim.losses.get_regularization_losses())
Model Hyperparameters is:
 dropout keep rate: 1.0
 l2 lamda: 1e15

L_0 regularizaion in TensorFlow
I would Like to use L_0 regularization for a NN in TensorFlow, meaning, regularization on the number of features among all input features which will be considered. What's the best way doing so? thanks

How to choose optimal loss function and scale in ceres solver?
Ceres solver is an open source C++ library for modeling and solving large, complicated optimization problems. It can be used to solve Nonlinear Least Squares problems with bounds constraints and general unconstrained optimization problems.
It has few loss functions to reduce outliers influence.But how to properly choose it and it scale factor?

Analysis over time comparing 2 dataframes row by row
This is a small portion of the dataframe I am working with for reference.I am working with a data frame (MG53_HanLab) in R that has a column for Time, several columns with the name "MG53" in them, several columns with the name "F2" and several with "Iono" in them. I would like to compare the means of each group for each time point. I understand that I have to subset the data and have tried doing
control < MG53_HanLab[c(2:11)] F2 < MG53_HanLab[c(12:23)] iono < MG53_HanLab[c(24:33)]
which has created 3 new dataframes.
My question is: How do I compare two dataframes row by row to see if there is a difference in the means for each table?

TD(lambda) component return weight sum up to 1?
In TD(lambda), the weights on the component return must sum to 1.
Why is this the case ?

Library or codes for using temporal difference learning?
I am working on an individual machine learning project (for the first time in my whole life) that involves implementing a temporal difference learning program in Python and I wonder if there are any libraries in Python that provide a complete temporal difference learning or if someone could provide their own implementation of a temporal difference learning program that can be used in various situations.
Thanks in advance!

Primitive AI for othello game?
So down to the point I want to create a loop that searches every part of the array and analyzing every white and black piece on the "board" [array]. Then the "AI" for the othello will choose the immediate best place to put down a white chip to get the maximum amount of chips flipped for white. My issue is I cannot think of a way how the conditions should be set up for the loops or a way how it will flip the chips in between from "B" to "W".
int bestMove(int & row, int &col, char color) // returns the max resultOfMove(row,col) { int whiteCount = 0; int bestSum = 0; for (int i = 0; i < row; i++) { for (int j = 0; j < col; j++) { if (board[i][j] == color) whiteCount++; if (board[i][j] == color) bestSum = bestSum + whiteCount; } // when a disk of color is placed at (row, col)
I know its incorrect but it gives you an idea of what I am trying to do. Any help would be greatly appreciated. Thanks :)!

What are patterns? (Othello/Chess...)
I tried to implement the evaluation functions of IAGO(1981) and BILL(1986), but failed to understand how to implement the patters/tables.
Despite extensive research I don't understand how patterns work nor how to create them and not even what they really are.
As far as I understand they help us to rate constellations of rows, columns, edges, 3x3 corners etc.
For the sake of making this question more specific:
I'd like you to explain me how a pattern would look like for an edge configuration and how I'd use it to assign it a numerical value.

Events stacking in C#
I have to do a reversi game for school and I have a problem, I do a while loop in a for loop for checking every possible moves in every direction for every pieces that are on the board and I don't know why but sometimes in some "squares" it stacks 2 time the eventclick I created and it runs twice when I click the event for taking the pieces of the other color. Sorry for my bad english. Here's my code for checking if you can put down a piece and the code for taking the pieces of the other color (I tried to do some event remove with "Controls.Contains") but couldn't fix it:
The code for checking where you can put your piece (I didn't put all the code, it's just other ifs for the 7 other directions)
private void CalculePiecePosable(int iAllierRecu) { int x = 1; for (int k = 0; k < 8; k++) { for (int l = 0; l < 8; l++) { if (tblPieces[k, l] == iAllierRecu) { /*NbPointNoir++; lblPointNoir.Text = Convert.ToString(NbPointNoir);*/ if (k  x > 0 && l  x > 0) { if (tblPieces[k  x, l  x] == iEnnemie1) { while (tblPieces[k  x, l  x] == iEnnemie1) { x++; } if (tblPieces[k  x, l  x] == 0) { if (tblPlateau[k  x, l  x].Controls.Contains(PosePiece)) { tblPlateau[k  x, l  x].Click = new EventHandler(PosePiece_Click); tblPlateau[k  x, l  x].Click += new EventHandler(PosePiece_Click); tblPlateau[k  x, l  x].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\04.png"); x = 1; } else { tblPlateau[k  x, l  x].Click += new EventHandler(PosePiece_Click); tblPlateau[k  x, l  x].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\04.png"); x = 1; } } else { x = 1; } } }
And the code for placing and taking the other pieces (same here for the if):
private void PosePiece_Click(object sender, EventArgs e) { int iPosX, iPosY, x = 1; iPosX = (sender as CasePlateau).posX; iPosY = (sender as CasePlateau).posY; if (iPosX  x > 0 && iPosY  x > 0) { if (tblPieces[iPosX  x, iPosY  x] == iEnnemie1) { while (tblPieces[iPosX  x, iPosY  x] == iEnnemie1) { x++; if (tblPieces[iPosX  x, iPosY  x] == iAllier) { x = 1; while (tblPieces[iPosX  x, iPosY  x] == iEnnemie1) { if (iAllier == 1) { tblPieces[iPosX, iPosY] = iAllier; tblPieces[iPosX  x, iPosY  x] = iAllier; tblPlateau[iPosX, iPosY].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\01.png"); tblPlateau[iPosX  x, iPosY  x].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\01.png"); } if (iAllier == 2) { tblPieces[iPosX, iPosY] = iAllier; tblPieces[iPosX  x, iPosY  x] = iAllier; tblPlateau[iPosX, iPosY].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\02.png"); tblPlateau[iPosX  x, iPosY  x].BackgroundImage = System.Drawing.Image.FromFile("Pieces\\02.png"); } x++; } } if (tblPieces[iPosX  x, iPosY  x] == 0) { break; } } x = 1; } }
Edit : The probleme with this code running twice is that it switches the turn and if it runs twice then it changes the turn twice so the player can play 2 times
So here's my question : I think my problem is when I check if I can make a move it places the event on the board, but when it comes to the piece next to the last piece I checked it can have a "place where you can click" in common so it places an other event on it and, I think, when I click on this event that I placed it runs and do the exchange with the black/white piece and change the turn but it runs a second time because there 2 event click on the same square, here's images to understand : It's the black turn
I know my code is not optimized but can you help me with what I have :( ?