Comparing two data sources pandas and python

I have a small script that loads in the same csv file twice

It then iterates through one and compares it with all the entries of the other. Since they are initially read from the exact same source, i should get a 100% match ratio. but i dont!

Any ideas as to why this might be?

import pandas as pd

_new = pd.read_csv('02 dump/reputation.csv', sep=';', float_precision='round_trip')
_data = pd.read_csv('00 data/reputation.csv', sep=';', float_precision='round_trip')


def confupdate():
    print("MATCHED")

def confnew():
    print("NOT MATCHED")



for a,b in zip(_new['LAT'].values, _new['LON'].values): 
    print(a, b)

    if a in _data['LAT'].values and b in _data.columns.values:

        confupdate()

    if a not in _data['LAT'].values or b not in _data.columns.values:

        confnew()

1 answer

  • answered 2018-03-20 16:14 cwallenpoole

    The columns property of a DataFrame is effectively a list of columns. You are checking to see if the latitude and longitude are present in the columns, instead of the contents of the frame itself.