How to set multiple values of Pandas dataframe where condition
I want to set multiple values of pandas dataframe columns where a condition, but I got an error message:
df[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())]
But I got this error:
raise ValueError('Length of values does not match length of '
ValueError: Length of values does not match length of index
1 answer

Try this:
df.ix[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())]
See also questions close to this topic

How to groupby column headers using a regex?
I have a dataframe like this
S1,0 S1,0.1 S1,0.2 S1,1 S1,1.1 S1,1.2 S2,0 S2,0.1 S2,1 S2,1.1 0 4 0 3 3 3 1 3 2 4 0 1 0 4 2 1 0 1 1 0 1 4 2 3 0 3 0 2 3 0 1 3 3
Now I want to
groupby
its column headers wherebyS1,0
should be in one group,S1,1
in another one, and the same forS2
and apply certain operations on those groups.My expected outcome looks like this (in case I calculate the
mean
, calledm
and thestandard deviation
calleds
):S1,0 S1,1 S2,0 S2,1 m 0 2.333333 2.333333 2.500000 2.000000 1 2.000000 0.666667 0.500000 2.500000 2 2.000000 1.666667 0.500000 3.000000 s 0 2.081666 1.154701 0.707107 2.828427 1 2.000000 0.577350 0.707107 2.121320 2 1.732051 1.527525 0.707107 0.000000
I can get this output doing:
import pandas as pd import numpy as np np.random.seed(0) data = np.random.randint(0, 5, 30).reshape(3, 10) df = pd.DataFrame(data, columns=['S1,0', 'S1,0.1', 'S1,0.2', 'S1,1', 'S1,1.1', 'S1,1.2', 'S2,0', 'S2,0.1', 'S2,1', 'S2,1.1']) df = df.T gdf = df.groupby(lambda x: x.split('.', 1)[0])[df.columns].agg({'m': np.mean, 's': np.std}).T.sort_index()
My question is whether there is a way which avoids this
split
operation on the column names but where one can pass an actual regex? So something along the linesimport re reg = re.compile('^S\d,\d') gdf2 = df.groupby(reg)[df.columns].agg({'m': np.mean, 's': np.std}).T.sort_index()
This does not work but is anything comparable possible?

OpenCV Homography matrix calculation
I have the following image I1. I did not capture it. I downloaded it from Google
I want to apply a homography h to I1 to obtain the following image I2.
I have manually established some suitable correspondence points between I1 and I2, and from these correspondences, I calculate the homography h to get the image I2 when I apply h to I1.
However, I am getting some very small values in the homography matrix h. I have tested this with a wide range of correspondences but I am still getting extremely small values in the h matrix.
My code:
import cv2 import numpy as np import random import math from scipy import linalg no_of_images = 0 t1_y = 10 t2_y = 10 for t1_x in range(30,315,1000): #30 for top_size in range(200,355,1000): t2_x = t1_x + top_size b1_x = t1_x b2_x = t2_x for b_y in range(205,385,1000): #205,305 b1_y = b_y b2_y = b_y no_of_images+=1 image_name="road" location1 = "/path/to/image/"+image_name+".jpg" # Read source image. im_src = cv2.imread(location1) # Four corners in source image pts_src = np.array([[float(t2_x), float(t2_y)], [float(b2_x), float(b2_y)], [float(b1_x), float(b1_y)], [float(t1_x), float(t1_y)]]) location2 = "/path/to/dest/image/after homography.jpg" im_dst = cv2.imread(location2) # Four corners in destination image. for x in range(0,500,1): pts_dst = np.array([[500.0, 168.0],[640.0, 358.0],[0.0, 358.0],[500.0x, 168.0]]) #150 # Calculate Homography h, status = cv2.findHomography(pts_src, pts_dst) print "h = ", h # Warp source image to destination based on homography im_out = cv2.warpPerspective(im_src, h, (im_dst.shape[1],im_dst.shape[0]))
An example of the h matrix I obtain:
[[ 2.46712622e+00 5.07091356e03 7.29742493e+01] [ 3.54718152e16 5.63521116e01 1.60487917e+02] [ 1.34873822e18 1.11718564e03 1.00000000e+00]]
Why am I getting such tiny values in the h matrix? What am I doing wrong? Is it correct to get such small values in the h matrix? I have been told that this may be due to zero perspective distortion. If this is the case, how do I fix it?

Python pandas rolling winsorize
I have a timeseries pandas dataframe and I have calculated a new column
df['std_series']= ( df['series1']df['series1'].rolling(252).mean() )/ df['series1'].rolling(252).std()
however I want to winsorize to the 5% level before I standardize and on a rolling basis. So for any datapoint, look back 252 days if it is outside the 5% quantiles clip it to the 5% quantile and then standardize. I couldn't figure out how to make it work with
rolling.apply
. All the examples I found were winsorize the either dataframe or entire column. 
Populating an object from dataframe
Currently trying to implement Genetic Algorithm. I have built a Python class Gene I am trying to load an object Gene from a dataframe df
class Gene: def __init__(self,id,nb_trax,nb_days): self.id=id self.nb_trax=nb_trax self.nb_days=nb_days
and then create another object Chrom
class Chromosome(object): def __init__(self): self.port = [Gene() for id in range(20)]
And a second class Chromosome with 20 Gene objects as its property This is the dataframe
ID nb_obj nb_days ECGYE 10259 62.965318 NLRTM 8007 46.550562
I successfully loaded the Gene using
tester=df.apply(lambda row: Gene(row['Injection Port'],row['Avg Daily Injection'],random.randint(1,10)), axis=1)
But i cannot load Chrom class using
f=Chromosome(tester)
I get this error
Traceback (most recent call last): File "chrom.py", line 27, in <module> f=Chromosome(tester) TypeError: __init__() takes 1 positional argument but 2 were given
Any help please?

Python  DataFrame: Multiply multiple columns by another column and save in new columns
I couldn't find an efficient away of doing that. I have below DataFrame in Python with columns from A to Z
A B C ... Z 0 2.0 8.0 1.0 ... 5.0 1 3.0 9.0 0.0 ... 4.0 2 4.0 9.0 0.0 ... 3.0 3 5.0 8.0 1.0 ... 2.0 4 6.0 8.0 0.0 ... 1.0 5 7.0 9.0 1.0 ... 0.0
I need to multiply each of the columns from B to Z by A, (B x A, C x A, ..., Z x A), and save the results on new columns (R1, R2 ..., R25). I would have something like this:
A B C ... Z R1 R2 ... R25 0 2.0 8.0 1.0 ... 5.0 16.0 2.0 ... 10.0 1 3.0 9.0 0.0 ... 4.0 27.0 0.0 ... 12.0 2 4.0 9.0 0.0 ... 3.0 36.0 0.0 ... 12.0 3 5.0 8.0 1.0 ... 2.0 40.0 5.0 ... 10.0 4 6.0 8.0 0.0 ... 1.0 48.0 0.0 ... 6.0 5 7.0 9.0 1.0 ... 0.0 63.0 7.0 ... 0.0
I was able to calculate the results using below code, but from here I would need to merge with original df. Doesn't sound efficient. There must be a simple/clean way of doing that.
df.loc[:,'B':'D'].multiply(df['A'], axis="index")
That's an example, my real DataFrame has 160 columns x 16k rows.

Converting NaN to Oracle Nulls within Python Pandas
I get this error on executing the below function & It is because NaN is tried to be inserted into numeric column. Replacing NaN with 0 throughout does work but replacement with None does not. Any ideas would be appreciated
def insertData(table_name, schema_name, df): if not df.empty: #input_data[file].fillna(0, inplace=True) # This works but replacing with 0's is not ideal df.where(pd.notnull(df), None) #This does not work values = df.to_dict(orient='records') table = sa.Table(table_name , conndict['meta'], autoload=True, schema=schema_name) result_proxy = conndict['conn'].execute(table.insert(), values) return result_proxy else: return None
(cx_Oracle.DatabaseError) DPI1055: value is not a number (NaN) and cannot be used in Oracle numbers