# How to find the nearest distance between two different data frames using haversine

I am trying to find the nearest distance of each facility to each geocode in the other data set.The first data frame includes geocode longitude and latitude information. The second includes longitude and latitude locations for toxic facilities. I am trying to match the nearest facility to each geocode. Both data sets are different sizes. I would like the distance in km. Ive looked into using the Haversine function but I'm unsure what I need to do after.

So far I have the following R coding:

``````#upload data
#Distance calculation function
> dlatlong = function(lat1, long1, lat2, long2) {
+ R = 6371
+ dlon = long2 - long1
+ dlat = lat2 - lat1
+ dtr = pi/180
+ a = (sin(dlat/2*dtr))^2 + cos(lat1*dtr) * cos(lat2*dtr) * (sin(dlon/2*dtr))^2
+ c = 2 * atan2( sqrt(a), sqrt(1-a) )
+ d = R * c
+ return(d)
+ }
#merge Census data with closest facility?
>for (i in 1:nrow(Census))
``````

Census:

Facility:

Since you have not provided a sample of your data, I am going to use the `oregon.tract` data set from the `UScensus2000tract` library as a reproducible example.

Here is a solution based on fast `data.table` that I get from this other answer here.

``````# load libraries
library(data.table)
library(geosphere)
library(UScensus2000tract)
library(rgeos)
``````

Now let's create a new `data.table` with all possible pair combinations of origins (census centroids) and destinations (facilities)

``````# get all combinations of origin and destination pairs
# Note that I'm considering here that the distance from A -> B is equal from B -> A.
odmatrix <- CJ(census\$Geo_Code , facility\$NPRI.ID)
names(odmatrix) <- c('Geo_Code', 'NPRI.ID') # update names of columns

# add coordinates of census centroids (origin)
odmatrix[census, c('lat_orig', 'long_orig') := list(i.Latitude, i.Longitude), on= "Geo_Code" ]

# add coordinates of facilities (destination)
odmatrix[facility, c('lat_dest', 'long_dest') := list(i.Latitude, i.Longitude), on= "NPRI.ID" ]
``````

Now you just need to:

``````# calculate distances
odmatrix[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol = 2),
matrix(c(long_dest, lat_dest), ncol = 2))]

# and get the nearest destinations for each origin
odmatrix[, .(  NPRI.ID = NPRI.ID[which.min(dist)],
dist = min(dist)),
by = Geo_Code]
``````

### Prepare data for this reproducible example

``````# load data
data("oregon.tract")

# get centroids as a data.frame
centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))

# Convert row names into first column
setDT(centroids, keep.rownames = TRUE)[]

# get two data.frames equivalent to your census and facility data frames
census<- copy(centroids)
facility <- copy(centroids)

names(census) <- c('Geo_Code', 'Longitude', 'Latitude')
names(facility) <- c('NPRI.ID', 'Longitude', 'Latitude')
``````