How to use embedding to onehot encode before passing
I am able to train my seq2seq model when onehot encodded input is passed in the fit function. How would I achieve the same thing if input is not one hot encoded?
Following code works
def seqModel():
latent_dim = 256 # Latent dimensionality of the encoding space.
encoder_inputs = Input(shape=(None, input_vocab_size))
decoder_inputs = Input(shape=(None, output_vocab_size))
encoder = LSTM(latent_dim, return_state=True)
encoder_outputs, state_h, state_c = encoder(encoder_inputs)
encoder_states = [state_h, state_c]
decoder_inputs = Input(shape=(None, num_decoder_tokens))
decoder_lstm = LSTM(latent_dim, return_sequences=True, return_state=True)
decoder_outputs, _, _ = decoder_lstm(decoder_inputs,\
initial_state=encoder_states)
decoder_dense = Dense(num_decoder_tokens, activation='softmax')
decoder_outputs = decoder_dense(decoder_outputs)
model = Model([encoder_inputs, decoder_inputs], decoder_outputs)
return model
def train(data):
model = seqModel()
#compile and get data
model.fit([to_one_hot(input_texts, num_encoder_tokens),to_one_hot(target_text, num_decoder_tokens)], outputs,batch_size=3, epochs=5)
I am asked not to onehot encode in the train method. How would I do it in the seqModel method? Is Embedding right way to onehot encode?
See also questions close to this topic

What is the best Pythonic way to convert streaming speech to text
I have successfully implemented a small project with the help of google API to convert recorded speech to text that also includes SpeechRecognition Python module. Now I wanted to start a project wherein I have to convert live streaming speech to text. But I am not sure how to go about it. Which Python module to use? sample example. This time I do not wish to use google API.
Can somebody guide me please

Periodic Behaviour of Loss function in CNN for Object Detection
I am trying to train a CNN in TensorFlow to perform text localization in images by using regression to output a bounding box around the text. I have created a custom dataset by pasting images from the IIIT 5KWord Dataset onto various images containing no text and created labels for the boundary box in each image in the form [pos_x, pos_y, width, height]. Each image contains only one text and the network thus only need to predict one bounding box per image. E.g.:
From l2loss and blog I got the impression that tensorflow's l2_loss might be well fitted for this task. However, my loss behaves very strangely by oscillating in a periodic pattern. It looks too strange to be just from poor choice of hyperparameters and I suspect there is something wrong with how I calculate the loss(ex. below). I haven't been able to find much information about object detection implementations other than the more complex models such as YOLO and RCNN.
Here is my model:
# create_batches returns an iterator containing (images[i:i+batch_size],labels[i:i+batch_size]) # Each image has size (128, 128, 3) and each label (1,4) dataset = tf.data.Dataset.from_generator(create_batches, (tf.float32, tf.int64), ([None, 128, 128, 3], [None, 4])).repeat() iterator = dataset.make_one_shot_iterator() XX, y_ = iterator.get_next() # Define Model # 1. Define Variables and Placeholders # Convolutional Layers W_conv1 = tf.Variable(tf.truncated_normal([5, 5, 3, 4], stddev=0.1)) b_conv1 = tf.Variable(tf.constant(0.1, shape=[4])) W_conv2 = tf.Variable(tf.truncated_normal([5, 5, 4, 8], stddev=0.1)) b_conv2 = tf.Variable(tf.constant(0.1, shape=[8])) W_conv3 = tf.Variable(tf.truncated_normal([5, 5, 8, 16], stddev=0.1)) b_conv3 = tf.Variable(tf.constant(0.1, shape=[16])) # Densely Connected Layer W1 = tf.Variable(tf.truncated_normal([409600,200], stddev=0.1)) b1 = tf.Variable(tf.zeros([200])) # Output Layer W5 = tf.Variable(tf.zeros([200,4])) #truncated_normal([30,10], stddev=0.1)) b5 = tf.Variable(tf.zeros([4])) #Conv. layers y1conv = tf.nn.relu(tf.nn.conv2d(XX, W_conv1, strides=[1, 2, 2, 1], padding='SAME')+b_conv1) y2conv = tf.nn.relu(tf.nn.conv2d(y1conv, W_conv2, strides=[1, 2, 2, 1], padding='SAME')+b_conv2) y3conv = tf.nn.relu(tf.nn.conv2d(y2conv, W_conv3, strides=[1, 2, 2, 1], padding='SAME')+b_conv3) # Reshape output from convlayers conved_input = tf.reshape(y3conv, [1, 409600]) y4 = tf.nn.relu(tf.matmul(conved_input, W1) + b1) y = tf.nn.softmax(tf.matmul(y4, W5) + b5) # 3. Define the loss function y_fl = tf.to_float(y_) diff = tf.subtract(y_fl,y) loss = tf.nn.l2_loss(diff) # 4. Define an optimizer train_step = tf.train.AdamOptimizer(0.005).minimize(loss) # initialize init = tf.global_variables_initializer() sess = tf.Session() sess.run(init)
After training for 20 epochs the loss looks like this:
I tried changing loss to tf.reduce_mean(tf.nn.l2_loss(diff)) to get the average error over the batch but it produced the same kind of plot. Is there an obvious mistake in how I compute the loss for the batch?

Is there a way to nullify a specific feature in test set while evaluating a tensorflow model?
Idea behind nullifying/ignoring a feature from test set is to understand how important is it considered by the model to predict the target variable (by comparing the evaluation metric's value). For numerical variables, I thought of setting them to 0, assuming the multiplication (with weights) would be 0 and thus it would get eliminated from the set. Is this approach right, else what should be done? I am using tensorflow's DNNRegressor for modelling.

Cannot run load_model (keras) on raspberry pi 3
I'm facing an error when trying to load the model 'small_model.h5' The code is as follow
small_model = load_model('small_model.h5')
The error message is as follow
AttributeError module 'tensorflow' has no attribute 'global_variables'
I've tried to upgrade the tensorflow and keras on the board but it didn't help. Also, I've tried another way of reading the file by using
small_model = 'small_model.h5' small_modl = h5py.File(small_model,'r')
and didn't help
Any idea how to solve this? Thank you in advance.

Second derivative in Keras
For a custom loss function for a NN I need to use the equation . The function u, given a pair (t,x), is the the output of my NN. Problem is I'm stuck at how to compute the second derivative using
K.gradient
(K being the TensorFlow backend):def custom_loss(input_tensor, output_tensor): def loss(y_true, y_pred): # so far, I can only get this right, naturally: gradient = K.gradients(output_tensor, input_tensor) # here I'm falling badly: # d_t = K.gradients(output_tensor, input_tensor)[0] # dd_x = K.gradient(K.gradients(output_tensor, input_tensor), # input_tensor[1]) return gradient # obviously not useful, just for it to work return loss
All my attemps, based on
Input(shape=(2,))
, were variations of the commented lines in the snippet above, mainly trying to find the right indexation of the resulting tensor.Sure enough I lack knowledge of how exactly tensors work. By the way, I know in TensorFlow itself I could simply use
tf.hessian
, but I noticed it's just not present when using TF as a backend. 
Load Alexnet weights with keras=1.1.0 using theano backend
I account a problem which is ''Exception: You are trying to load a weight file containing 11 layers into a model with 8 layers.'' when I load Alexnet weights with keras = 1.1.0 using theano backend. The code is:
from keras.models import Model from keras.layers import Flatten, Dense, Dropout, Input from keras.layers.convolutional import Conv2D, MaxPooling2D from keras import backend as K from keras.utils.layer_utils import convert_all_kernels_in_model def alexnet_model(weights_path=None): K.set_image_dim_ordering('th') inputs = Input(shape=(3, 227, 227)) x = Conv2D(96, 11, 11, subsample=(4,4), activation='relu', border_mode='valid', name='conv1')(inputs) x = MaxPooling2D((3, 3), strides=(2, 2), name='pool1')(x) x = Conv2D(256, 5, 5, subsample=(1,1), activation='relu', border_mode='same', name='conv2')(x) x = MaxPooling2D((3, 3), strides=(2, 2), name='pool2')(x) x = Conv2D(384, 3, 3, subsample=(1,1), activation='relu', border_mode='same', name='conv3')(x) x = Conv2D(384, 3, 3, subsample=(1,1), activation='relu', border_mode='same', name='conv4')(x) x = Conv2D(256, 3, 3, subsample=(1,1), activation='relu', border_mode='same', name='conv5')(x) x = MaxPooling2D((3, 3), strides=(2, 2), name='pool5')(x) x = Flatten(name='flatten')(x) x = Dense(4096, activation='relu', name='fc1')(x) x = Dropout(0.5)(x) x = Dense(4096, activation='relu', name='fc2')(x) x = Dropout(0.5)(x) x = Dense(1000, activation='softmax', name='predictions')(x) model = Model(inputs, x) weights_path = 'alexnet_weights.h5' model.load_weights(weights_path) convert_all_kernels_in_model(model) return model if "__main__" == __name__: model = alexnet_model(weights_path = 'alexnet_weights.h5')
The file 'alexnet_weights.h5' is downloaded in 'http://files.heuritech.com/weights/alexnet_weights.h5', and in my keras.json file, 'backend' is 'theano' and 'image_dim_ordering' is 'th'. Is my alexnet model's layer not 11(5 convolutional layers, 3 pool layers and 3 fc layers)? How could I solve this error? Thanks a lot in advance.

Prediction with LSTM using Keras in Python
We are predicting Y based on X from past values. Our formatted CSV dataset has three columns (time_stamp, X and Y  where Y is the actual value). Before training the prediction model, here is how the plots of X and Y respectively look like the following.
Here is how I approached the problem with LSTM Recurrent Neural Networks in Python with Keras.
import numpy as np from keras.models import Sequential from keras.layers import LSTM, Dense from sklearn.preprocessing import MinMaxScaler import pandas as pd np.random.seed(7) # Load data df = pd.read_csv('test32_C_data.csv') n_features = 100 def create_sequences(data, window=14, step=1, prediction_distance=14): x = [] y = [] for i in range(0, len(data)  window  prediction_distance, step): x.append(data[i:i + window]) y.append(data[i + window + prediction_distance][0]) x, y = np.asarray(x), np.asarray(y) return x, y # Scaling prior to splitting scaler = MinMaxScaler(feature_range=(0.40, 0.60)) scaled_data = scaler.fit_transform(df.loc[:, ["X", "Y"]].values) # Build sequences x_sequence, y_sequence = create_sequences(scaled_data) # Create test/train split test_len = int(len(x_sequence) * 0.90) valid_len = int(len(x_sequence) * 0.90) train_end = len(x_sequence)  (test_len + valid_len) x_train, y_train = x_sequence[:train_end], y_sequence[:train_end] x_valid, y_valid = x_sequence[train_end:train_end + valid_len], y_sequence[train_end:train_end + valid_len] x_test, y_test = x_sequence[train_end + valid_len:], y_sequence[train_end + valid_len:] # Initialising the RNN model = Sequential() # Adding the input layerand the LSTM layer model.add(LSTM(4, input_shape=(14, 2))) # Adding the output layer model.add(Dense(1)) # Compiling the RNN model.compile(loss='mse', optimizer='rmsprop') # Fitting the RNN to the Training set model.fit(x_train, y_train, epochs=5) # Getting the predicted values y_pred = model.predict(x_test) #y_pred = scaler.inverse_transform(y_pred) # Plot the results pd.DataFrame({"Actual": y_test, "Predicted": np.squeeze(y_pred)}).plot()
Finally, when we run the code  the neural model seems to capture the pattern of the signal well as it is shown below.
However, it is not smoothed, as it is shown in the plot, and besides the actual plot (Y) should look like the one presented above. How can we do smoothing of the prediction so that we get something similar to the following plot? I would appreciate any tips. The main goal is to verify the predicted value against Y (the actual value) and have a plot close to the following.

Finetune lstm in Keras
I would like to compare the performance of a baseline LSTM and a finetuned model. I found many examples online, but all of them use pretrained networks, such as VGG16, ResNet50, and I was wondering if anybody can help with a language model. My network is the following:
model = Sequential() model.add(Embedding(vocab_size, embedding_size, input_length=5, weights=[pretrained_weights])) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size, return_sequences=True))) model.add(Bidirectional(LSTM(units=embedding_size))) model.add(Dense(2000, activation='softmax')) model.compile(loss='categorical_crossentropy', optimizer = RMSprop(lr=0.001), metrics=['accuracy']) # fit network model.fit(X_train, y_train, epochs=100, verbose=2, batch_size=16)
I tried to finetune it by freezing the first layers, by adding:
for layer in model.layers[:3]: layer.trainable = False
before compiling, but the accuracy drops, with the baseline I have around 10% accuracy (I want to test on a corpus that is totally different from the training and evaluate the performance), while if I freeze the layers I get something like 0.002. Can somebody help me understand what I am doing wrong or suggest me a different finetuning method? Thank you in advance!

What is the architecture behind the Keras LSTM Layer implementation?
How does the input dimensions get converted to the output dimensions for the LSTM Layer in Keras? From reading Colah's blog post, it seems as though the number of
"timesteps"
(AKA theinput_dim
or the first value in theinput_shape
) should equal the number of neurons, which should equal the number of outputs from this LSTM layer (delineated by theunits
argument for theLSTM
layer).From reading this post, I understand the input shapes. What I am baffled by is how Keras plugs the inputs into each of the LSTM "smart neurons".
Example code that baffles me:
model = Sequential() model.add(LSTM(32, input_shape=(10, 64))) model.add(Dense(2))
From this, I would think that the LSTM layer has 10 neurons and each neuron is fed a vector of length 64. However, it seems it has 32 neurons and I have no idea what is being fed into each. I understand that for the LSTM to connect to the Dense layer, we can just plug all 32 outputs to each of the 2 neurons. What confuses me is the InputLayer to the LSTM.

What's the best way to encode multiple categorical inputs to a neural network?
I want to get a classification result in a neural network and my input vector includes only categorical data, so I have been reading about onehot encoding. All the examples I found show only one category and it looks very nice. But I have some of them, what gives me a big and sparse input matrix (let's say 10 categories with 1520 items each, but there will eventually be more of them).
Let's suppose I have two categories for classifying living beings:
Number of legs: {Zero, Onetwo, Threefour, FiveSix, More than six} Color: {Black, Brown, Yellow, Red, Green, White, Blue, Gray}
And I'm trying to classify it in four possible outputs:
{Mammal, Insect, Fish, Bird}
When I do a onehot encoding for both categories, I get something like this for a twolegged yellow being (a bird or maybe a large mammal from Springfield, we would need more data):
0 1 0 0 0 0 0 1 0 0 0 0 0
And a too big network for only two values per input:
(The last node is just a maximum of the previous four, but I use the values on those four to calculate the error).
I think that if I merge both categories like this the network will get the combination of those two features and give a correct value for that combination, because the weights for the rest of the features won't be useful and won't update (they will return zero). And it should work the same way every time the same combination appears and also for every other combination.
But I can't help but wonder if this is the best option. The fact that there is no gap between every category to tell the network that the first five features are expressing a very different thing than the last eight makes me think that the network is not "understanding" it.
The point is that I'm getting bad results with this network (barely 53% accuracy) and I don't know if it can be because of this kind of input encoding.
Am I in the right way and I must just check my inputs or is there a better way to set up this network (and networks with categorical data in general)? I'm thinking of dividing the input in several networks and then put the results together, but since they all have the same output nodes, I don't know how to do it.

Creating a one hot table from a list of frequency counts in R
I want to create a table of values derived from two lists of information, but I want to only take elements from the first list that meet a condition defined in the second list.
I have two lists as data frames in R. DF_A is a list of document terms with total frequency counts. DF_B is a list of documents with frequencies for each word in the document.
Here is my R code so far, which gets me a long way towards my goal.
library(dplyr) library(tidytext) docs = read.csv("~/text.csv") # > docs # DOCID TEXT # 1 Doc1 blue blue blue blue rose rose hats hats hats hats # 2 Doc2 rose hats # 3 Doc3 tall tall tall tall tall tall tall tall tall tall tall # 4 Doc4 cups cups # 5 Doc5 tall my_unigrams < unnest_tokens(docs, unigram, TEXT, token = "ngrams", n = 1) DF_A < my_unigrams %>% count(unigram, sort = TRUE) DF_A #> DF_A # A tibble: 5 x 2 # unigram n # <chr> <int> # 1 tall 12 # 2 hats 5 # 3 blue 4 # 4 rose 3 # 5 cups 2 DF_B < my_unigrams %>% count(DOCID, unigram, sort = TRUE) # > DF_B # A tibble: 8 x 3 # DOCID unigram n # <fctr> <chr> <int> # 1 Doc3 tall 11 # 2 Doc1 blue 4 # 3 Doc1 hats 4 # 4 Doc1 rose 2 # 5 Doc4 cups 2 # 6 Doc2 hats 1 # 7 Doc2 rose 1 # 8 Doc5 tall 1 # My goal is a "one hot" table where each document ID is a row name, and the top three most frequent terms are columns (each cell should contain either 1 or 0; basically yes/no that term occurs in that document). I want a table like this: one_hot_table < table(DF_B$DOCID,DF_B$unigram) one_hot_table # one_hot_table # blue cups hats rose tall # Doc1 1 0 1 1 0 # Doc2 0 0 1 1 0 # Doc3 0 0 0 0 1 # Doc4 0 1 0 0 0 # Doc5 0 0 0 0 1
"one_hot_table" above is close to what I want, except I want a subset: just the words that are the most frequent ("tall", "blue", "hats").
What I'm hoping is that I can remove the columns I don't want automatically. In my real table, there are thousands of columns and the methods I've found to delete columns ask me to name the column. I don't want to do that for thousands of columns. Ideally I would like a method that takes the one_hot_table as input, looks up each column name in DF_A, and produces a new data frame with just the top three most frequent. Something like this:
new_one_hot_table < function(one_hot_table, DF_A, 3)
Any help will be much appreciated.

While using onehot library in R, I get an error in the model.matrix command
For label encoding I am using
model.matrix
from libraryonehot
in R. The data set is available here.I have renamed the file as
train.csv
The feature to be encoded isEducation
. It has got two levels,Graduate
andNot Graduate
. However on executing the code,library(onehot) data < read_csv("train.csv") set.seed(1234) datashuffled < data[sample(1:nrow(data)), ] datashuffled_Loan_StatusRemoved < datashuffled %>% select(starts_with("Loan_Status")) features < datashuffled_Loan_StatusRemoved sum(is.na(features$Education)) features$Education[features$Education=="Not Graduate"] < "NotGraduate" E < model.matrix(~Education1,head(features))
I get an error as
Error in contrasts<(tmp, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only to factors with 2 or more levels.