How to set multiple values of Pandas dataframe where condition
I want to set multiple values of pandas dataframe columns where a condition, but I got an error message:
df[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())] But I got this error: raise ValueError('Length of values does not match length of ' ValueError: Length of values does not match length of index
df.ix[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())]
See also questions close to this topic
Draw line between two given points (OpenCV, Python)
I am struggling with this problem for an hour by now...
I have an image with a rectangle inside:
This is the code I wrote to find the points for the corners:
import cv2 import numpy as np img = cv2.imread('rect.png') gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY) gray = np.float32(gray) points = cv2.goodFeaturesToTrack(gray, 100, 0.01, 10) points = np.int0(points) for point in points: x, y = point.ravel() cv2.circle(img, (x, y), 3, (0, 255, 0), -1) print(points) print(points) print(points) print(points) cv2.imshow('img', img) cv2.waitKey(0) cv2.imwrite('rect.png', img)
This is the result:
As you can see, it works perfect. What I want is to draw a line along the upper/lower points (x1,x2 - x3,x4).
What I produced since now is this...
cv2.line(img, (points), (points), (0, 255, 0), thickness=3, lineType=8) cv2.imshow('img', img) cv2.waitKey(0)
But it doesn't work.
Any idea ?
The result should be like this:
The two lines must pass along the coordinates of the points.
print(points)above give the next output, as example:
[[561 168]] [[155 168]] [[561 53]] [[155 53]]
Performance difference between python libraries networkx, networkit and c++ boost
I am currently using
networkxwhich I guess is the slowest of the three. I am considering rewriting some parts for
networkitor the whole thing in c++ to use the
boostnetwork library. Does anybody have experience or even benchmarks comparing the three?
Accessing remote Oracle database via Squish(python)
I need to access a remote Oracle database during an automation process I am doing using Squish(Python). As a standalone python uses cx_oracle to access Oracle database.
Comparing a column's value with an array (or a series) of decreasing size
I have the following dataframe. (This isn't necessarily a dataframe; a solution on the numpy array
df.valueswould also be sufficient)
np.random.seed(42) df = pd.DataFrame(np.random.random((10,2)),columns=['a', 'b']) df a b 0 0.374540 0.950714 1 0.731994 0.598658 2 0.156019 0.155995 3 0.058084 0.866176 4 0.601115 0.708073 5 0.020584 0.969910 6 0.832443 0.212339 7 0.181825 0.183405 8 0.304242 0.524756 9 0.431945 0.291229
I want to include a new column that has the value as per the below logic:
True : If any of the
bvalues after a particular
avalue is greater than that partiulcar
avalue False : Otherwise
The expected output would be: (See the explanation for some of the rows below)
a b c 0 0.374540 0.950714 True 1 0.731994 0.598658 True 2 0.156019 0.155995 True 3 0.058084 0.866176 True <- np.any(0.058084 < np.array([0.708073, 0.969910, 0.212339, 0.183405, 0.524756, 0.291229])) 4 0.601115 0.708073 True <- np.any(0.601115 < np.array([0.969910, 0.212339, 0.183405, 0.524756, 0.291229])) 5 0.020584 0.969910 True <- np.any(0.020584 < np.array([0.212339, 0.183405, 0.524756, 0.291229])) 6 0.832443 0.212339 False <- np.any(0.832443 < np.array([0.183405, 0.524756, 0.291229])) 7 0.181825 0.183405 True <- np.any(0.181825 < np.array([0.524756, 0.291229])) 8 0.304242 0.524756 False <- np.any(0.304242 < np.array([0.291229])) 9 0.431945 0.291229 UNDEFINED <- Ignore this
The above should be possible with a for loop but what is the pandas/numpy way to do that?
I was trying for an approach where I apply a lambda function to
abut l couldn't find a way to get the index of the respective
avalue to do the
np.anycomparison as shown above. (I have later discovered that
applyis just syntactic sugar for a for loop, though)
df['c'] = df['a'].apply(lambda x: np.any(x < df['b'].values[<i>:])) # Where <i> is the respective index value of x; which I didn't know how to find
Pandas DataFrame to Hive Query for Insert
My question is whether there is a way to generate a HiveQL with
insertstatement with X of columns and Y of rows into something like from Pandas' DataFrame object:
query = "INSERT INTO %s SELECT %s, %s, %s, %s, %s, %s, %s from " % ( table_name, column_names, column_names, column_names, column_names, column_names, column_names, column_names)
I have the following dataframe as an example:
metric predict_date value y_date x_date ... (many columns) 0 sales 2017-10-01 7.539010e+06 2016-06-01 2017-09-01 1 sales 2017-11-01 8.364379e+06 2016-07-01 2017-09-01 2 sales 2017-12-01 9.533355e+06 2016-08-01 2017-09-01 . . . (many rows)
The use case here is just getting that query constructed.
- Pandas' to_sql() is actually writing to the SQL which I do not want for my use case.
- I believe Spark is an option, but is there a quick-and-easy way to generate this as setting up Spark cluster takes time. :-)
Bokeh vbar x axis orientation issue
I have just created a
vbar-plot using code similar to the one found here https://bokeh.pydata.org/en/latest/docs/user_guide/categorical.html
Using the below code:
import math from bokeh.io import show from bokeh.models import ColumnDataSource, FactorRange from bokeh.transform import factor_cmap from bokeh.plotting import figure from bokeh.palettes import viridis # data is a DataFrame containing categorical columns 'A' and 'B' and numerical column 'C' group = data.groupby(('A', 'B'))['C'].mean().reset_index() x = zip(group.A,group.B) y_variable = 'C' y = group[y_variable] source = ColumnDataSource(data=dict(x=x, y=y)) #convert dataframe to dict data_dict = group.to_dict(orient='list') data_index = group['B'].tolist() #colors index_cmap = factor_cmap('x', palette=viridis(20), factors=sorted(group.A.unique()), end=1) #get max possible value of plotted columns with some offset p = figure(x_range=FactorRange(*x), y_range=(0, group[[y_variable]].values.max()+100), plot_height=600, plot_width=1500, title="Something: %s" % y_variable, toolbar_location=None, tools="") p.vbar(x='x', top='y', width=0.9, source=source, line_color="white", fill_color=index_cmap) p.y_range.start = 0 p.x_range.range_padding = 0.1 p.xaxis.major_label_orientation = math.pi/2 p.xgrid.grid_line_color = None show(p)
I get the result as
Now as you can see, the "inner" x-axis labels have a 90 degree orientation, as specified with
p.xaxis.major_label_orientation = math.pi/2. The "outer" x-axis, however, is horisontal, and gets mixed up.
I can't figure out how to configure this one accordingly...
Any thoughts on how to go about this?