How to set multiple values of Pandas dataframe where condition
I want to set multiple values of pandas dataframe columns where a condition, but I got an error message:
df[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())] But I got this error: raise ValueError('Length of values does not match length of ' ValueError: Length of values does not match length of index
df.ix[df['store_id'] == 'UK00023', ['sale','startdate','enddate']] = [100, str(datetime.now()), str(datetime.now())]
See also questions close to this topic
Spark to MongoDB via kerberos
We have a MongoDB database that is kerberoised. The non-spark connection works fine. You need to specify the uri, create a MongoClient, authnticate yourself in the $external database and for each database. How does it work when the database is behind kerberos. I couldnt find any documentation?
Python || Shape-Based Image Comparison Between External Image Input & Image Library?
Being a brand-new beginner to both StackExchange/StackOverflow and Python, I am hoping I have done at least a satisfactory job of prior research and the creation of this question.
As described in the title, I am trying to create a program written in Python (specifically Python 3.5) that is capable of:
- Taking in an image (png format with transparency) and comparing it to a library of other images of the same type (stored locally in a folder)
- Displaying the top 3 most similar images to the input alongside the original
- Awaiting approval by user (yes if no other images are duplicates or too similar, no if there are images that are too similar)
- Adding the input to the image library if it is approved by the user.
What complicates the issue is that this program is being designed to compare emotes, all of which are the same colour scheme. This means (at least to me) that comparison by colour would be ineffective, as every image would return as very similar and cloud up results.
I'm finding that hours of searching have yielded nothing for me to confidently work with. During my search I read about the possible usefulness of OpenCV, and how to make a Python program that can compare images based upon a 3D RGB histogram. I also did much searching around for python packages and pre-coded python image comparison libraries that would fit my requirements, but found nothing that I could confidently use.
I am not sure if this affects the problem in any way, but after this program is developed as a standalone tool, it must be somewhat redesigned to become part of a Discord bot as well (using the Python API for Discord).
I apologise in advance for the countless novice blunders I have most likely made in creating/structuring this question. Regardless, looking forward to some experienced guidance.
joblib:Parallel always was stuck in the end
My parallel uses:
sdf = Parallel(n_jobs=3, verbose=1, pre_dispatch='1.5*n_jobs')(delayed(char_etl)(x,k1,k2,k3) for x in X)
X is a list .char_etl is a sring match function. version : python2.7.11,centos6.7,joblib 0.11 Every time was stuck,like this
[Parallel(n_jobs=3)]: Done 89 tasks | elapsed: 10.9s [Parallel(n_jobs=3)]: Done 258 tasks | elapsed: 46.8s [Parallel(n_jobs=3)]: Done 552 tasks | elapsed: 1.7min [Parallel(n_jobs=3)]: Done 902 tasks | elapsed: 2.7min [Parallel(n_jobs=3)]: Done 997 out of 1000 | elapsed: 2.9min remaining: 0.5s Thanks.
What causes the wrong count for the groupby and transform.count() operation in python pandas
I am grouping and counting on my dataframe.
This is what I get from the .describe() method:
While all the other metrics are 4. In fact, there are only 4 barcodes in this group, so the count should be 5. How can it be that the count is 5?
invoice_number barcode OFF1540673 4054673005837 count 5.0 mean 4.0 std 0.0 min 4.0 25% 4.0 50% 4.0 75% 4.0 max 4.0 4054673034394 count 5.0 mean 4.0 std 0.0 min 4.0 25% 4.0 50% 4.0 75% 4.0 max 4.0 4054673238488 count 5.0 mean 4.0 std 0.0 min 4.0 25% 4.0 50% 4.0 75% 4.0 max 4.0 4054673238822 count 5.0 mean 4.0 std 0.0 min 4.0 25% 4.0 50% 4.0 75% 4.0 max 4.0
invoice_number barcode 327378 OFF1540673 4054673238488 327379 OFF1540673 4054673034394 327380 OFF1540673 4054673238822 327381 OFF1540673 4054673005837 327382 OFF1540673 4054673238488 327383 OFF1540673 4054673034394 327384 OFF1540673 4054673238822 327385 OFF1540673 4054673005837 327386 OFF1540673 4054673238488 327387 OFF1540673 4054673034394 327388 OFF1540673 4054673238822 327389 OFF1540673 4054673005837 327390 OFF1540673 4054673238488 327391 OFF1540673 4054673034394 327392 OFF1540673 4054673238822 327393 OFF1540673 4054673005837 327394 OFF1540673 4054673238488 327395 OFF1540673 4054673034394 327396 OFF1540673 4054673238822 327397 OFF1540673 4054673005837
dtypes for both colums is "object"
this is the command to group...
Pandas - Round date to 30 minutes
I have constructed this dataframe:
import pandas as pd from pandas.compat import StringIO temp = '''A,B A,23:59:32.897000 B,17:36:09.182000 C,21:56:57.325000 D,06:16:24.482000''' df = pd.read_csv(StringIO(temp)) df['B'] = pd.to_datetime(df['B']).dt.time
So I'm wondering is it possible to round down the time on 30 minutes interval making the output into:
A,B A,23:30:00.000000 B,17:30:00.000000 C,21:30:00.000000 D,06:00:00.000000
Any help is appreciated.
How to select a n value in a column in python
I have a panda data frame looking like this:
timestamp S 2017-04-17 00:00:05 4300 2017-04-17 00:00:10 4297 2017-04-17 00:00:15 4321 2017-04-17 00:00:25 4335 ... 2017-04-17 23:59:55 4287
If the values of
df['S']could be seen as abcd, I want to do the following calculation for every row:
df['x'] = (df['S'][bcd]/1000)*(10**df['S'][a])
so i get:
timestamp S x 2017-04-17 00:00:05 4300 3000 2017-04-17 00:00:10 4297 2970 2017-04-17 00:00:15 4321 3210 2017-04-17 00:00:25 4335 3350 ... 2017-04-17 23:59:55 4287 2870
How can I do that?