How to find the nearest distance between two different data frames using haversine
I am trying to find the nearest distance of each facility to each geocode in the other data set.The first data frame includes geocode longitude and latitude information. The second includes longitude and latitude locations for toxic facilities. I am trying to match the nearest facility to each geocode. Both data sets are different sizes. I would like the distance in km. Ive looked into using the Haversine function but I'm unsure what I need to do after.
So far I have the following R coding:
#upload data
> facility < read.csv('~/Desktop/maxyearstoxicity.csv', header=TRUE)
> census < read.csv('~/Desktop/newNHCST.csv', header=TRUE)
#Distance calculation function
> dlatlong = function(lat1, long1, lat2, long2) {
+ R = 6371
+ dlon = long2  long1
+ dlat = lat2  lat1
+ dtr = pi/180
+ a = (sin(dlat/2*dtr))^2 + cos(lat1*dtr) * cos(lat2*dtr) * (sin(dlon/2*dtr))^2
+ c = 2 * atan2( sqrt(a), sqrt(1a) )
+ d = R * c
+ return(d)
+ }
#merge Census data with closest facility?
>for (i in 1:nrow(Census))
Census:
Facility:
1 answer

Since you have not provided a sample of your data, I am going to use the
oregon.tract
data set from theUScensus2000tract
library as a reproducible example.Here is a solution based on fast
data.table
that I get from this other answer here.# load libraries library(data.table) library(geosphere) library(UScensus2000tract) library(rgeos)
Now let's create a new
data.table
with all possible pair combinations of origins (census centroids) and destinations (facilities)# get all combinations of origin and destination pairs # Note that I'm considering here that the distance from A > B is equal from B > A. odmatrix < CJ(census$Geo_Code , facility$NPRI.ID) names(odmatrix) < c('Geo_Code', 'NPRI.ID') # update names of columns # add coordinates of census centroids (origin) odmatrix[census, c('lat_orig', 'long_orig') := list(i.Latitude, i.Longitude), on= "Geo_Code" ] # add coordinates of facilities (destination) odmatrix[facility, c('lat_dest', 'long_dest') := list(i.Latitude, i.Longitude), on= "NPRI.ID" ]
Now you just need to:
# calculate distances odmatrix[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol = 2), matrix(c(long_dest, lat_dest), ncol = 2))] # and get the nearest destinations for each origin odmatrix[, .( NPRI.ID = NPRI.ID[which.min(dist)], dist = min(dist)), by = Geo_Code]
Prepare data for this reproducible example
# load data data("oregon.tract") # get centroids as a data.frame centroids < as.data.frame(gCentroid(oregon.tract,byid=TRUE)) # Convert row names into first column setDT(centroids, keep.rownames = TRUE)[] # get two data.frames equivalent to your census and facility data frames census< copy(centroids) facility < copy(centroids) names(census) < c('Geo_Code', 'Longitude', 'Latitude') names(facility) < c('NPRI.ID', 'Longitude', 'Latitude')