How to find the nearest distance between two different data frames using haversine

I am trying to find the nearest distance of each facility to each geocode in the other data set.The first data frame includes geocode longitude and latitude information. The second includes longitude and latitude locations for toxic facilities. I am trying to match the nearest facility to each geocode. Both data sets are different sizes. I would like the distance in km. Ive looked into using the Haversine function but I'm unsure what I need to do after.

So far I have the following R coding:

#upload data
> facility <- read.csv('~/Desktop/maxyearstoxicity.csv', header=TRUE)   
> census <- read.csv('~/Desktop/newNHCST.csv', header=TRUE)   
#Distance calculation function
> dlatlong = function(lat1, long1, lat2, long2) {   
+ R = 6371  
+ dlon = long2 - long1  
+ dlat = lat2 - lat1  
+ dtr = pi/180  
+ a = (sin(dlat/2*dtr))^2 + cos(lat1*dtr) * cos(lat2*dtr) * (sin(dlon/2*dtr))^2  
+ c = 2 * atan2( sqrt(a), sqrt(1-a) )  
+ d = R * c  
+ return(d)  
+ }
#merge Census data with closest facility?
>for (i in 1:nrow(Census))

Census: Census data

Facility: Facility data

1 answer

  • answered 2017-06-17 19:37 rafa.pereira

    Since you have not provided a sample of your data, I am going to use the oregon.tract data set from the UScensus2000tract library as a reproducible example.

    Here is a solution based on fast data.table that I get from this other answer here.

    # load libraries
      library(data.table)
      library(geosphere)
      library(UScensus2000tract)
      library(rgeos)
    

    Now let's create a new data.table with all possible pair combinations of origins (census centroids) and destinations (facilities)

    # get all combinations of origin and destination pairs
    # Note that I'm considering here that the distance from A -> B is equal from B -> A.
      odmatrix <- CJ(census$Geo_Code , facility$NPRI.ID)
      names(odmatrix) <- c('Geo_Code', 'NPRI.ID') # update names of columns
    
    # add coordinates of census centroids (origin)
      odmatrix[census, c('lat_orig', 'long_orig') := list(i.Latitude, i.Longitude), on= "Geo_Code" ]
    
    # add coordinates of facilities (destination)
      odmatrix[facility, c('lat_dest', 'long_dest') := list(i.Latitude, i.Longitude), on= "NPRI.ID" ]
    

    Now you just need to:

    # calculate distances
      odmatrix[ , dist := distHaversine(matrix(c(long_orig, lat_orig), ncol = 2), 
                                        matrix(c(long_dest, lat_dest), ncol = 2))]
    
    # and get the nearest destinations for each origin
      odmatrix[, .(  NPRI.ID = NPRI.ID[which.min(dist)],
                        dist = min(dist)), 
                                        by = Geo_Code]
    

    Prepare data for this reproducible example

    # load data
      data("oregon.tract")
    
    # get centroids as a data.frame
      centroids <- as.data.frame(gCentroid(oregon.tract,byid=TRUE))
    
    # Convert row names into first column
      setDT(centroids, keep.rownames = TRUE)[]
    
    # get two data.frames equivalent to your census and facility data frames
      census<- copy(centroids)
      facility <- copy(centroids)
    
      names(census) <- c('Geo_Code', 'Longitude', 'Latitude')
      names(facility) <- c('NPRI.ID', 'Longitude', 'Latitude')