Ruby - Find all records with the same attributes and group them

How can I find all records with the same attributes and cluster them together?

Example: table of animal species and their types. I want to cluster same species IF they are the same type. Or maybe some other attribute, like color. So if all needed attributes are not the same, I don't want to cluster them.

Best way of showing what I wan't to do is with next example:

ID  Species  Type 

0    Bear    Grizzly
1    Wolf    Gray
2    Bear    Grizzly
3    Bear    Polar
4    Wolf    Artic
5    Wolf    Artic
6    Wolf    Gray 

Needed result:

ID  Species  Type      Cluster_id  

0    Bear    Grizzly    1
1    Bear    Grizzly    1
2    Bear    Polar      2
3    Wolf    Artic      3
4    Wolf    Artic      3
5    Wolf    Gray       4
6    Wolf    Gray       4  

Any ideas how can I write this in ruby?

Thank you for your time.

2 answers

  • answered 2018-03-11 12:38 assefamaru

    What you are looking for is group_by, which groups all records based on a set of attributes. For example, assuming you have an Animal model with species and type attributes, you can group all records based on similarity in species AND type as follows:

    Animal.all.group_by { |x| [x.species, x.type] }

    or, you can first pluck the only two attributes needed, and group the resulting array as:

    Animal.pluck(:species, :type).group_by { |x| [x[0], x[1]] }

    The result will be a hash of the form:

        ["Bear", "Grizzly"] => [["Bear", "Grizzly"], ["Bear", "Grizzly"]], 
        ["Wolf", "Gray"]    => [["Wolf", "Gray"], ["Wolf", "Gray"]], 
        ["Bear", "Polar"]   => [["Bear", "Polar"]], 
        ["Wolf", "Artic"]   => [["Wolf", "Artic"], ["Wolf", "Artic"]]

    To add the cluster info to each record, consider the following.

    If you don't already have the cluster_id field created, you can create it using a migration as follows:

    rails g migration add_cluster_id_to_animals cluster_id:int
    rake db:migrate

    Once you have the cluster_id field, you can populate it by iterating over the hash created by group_by:

    hash = Animal.all.group_by { |x| [x.species, x.type] }
    hash.each_with_index do |(key,value),index| 
        value.each do |v| 
            v.update_attribute(:cluster_id, index+1)

    or, in one line:

    Animal.all.group_by { |x| [x.species, x.type] }.each_with_index {|(key,value),index| value.each {|v| v.update_attribute(:cluster_id, index+1)}}

    For example, if initially your records looked as follows:

    [["Bear", "Grizzly"], 
     ["Wolf", "Gray"], 
     ["Bear", "Grizzly"], 
     ["Bear", "Polar"], 
     ["Wolf", "Artic"], 
     ["Wolf", "Artic"], 
     ["Wolf", "Gray"]]

    then with cluster_id populated, your records will look as follows:

    [["Bear", "Grizzly", 1], 
     ["Wolf", "Gray",    2], 
     ["Bear", "Grizzly", 1], 
     ["Bear", "Polar",   3], 
     ["Wolf", "Artic",   4], 
     ["Wolf", "Artic",   4], 
     ["Wolf", "Gray",    2]]

  • answered 2018-03-11 12:38 Ashik Salman

    If you don't want to store the cluster_id in the database as mentioned in other answer, you can just populate it for the array being grouped as follows:

    Original Array:

    array = [
      ["Bear", "Grizzly"],
      ["Wolf", "Gray"],
      ["Bear", "Grizzly"],
      ["Bear", "Polar"],
      ["Wolf", "Artic"],
      ["Wolf", "Artic"],
      ["Wolf", "Gray"]

    This can grouped by species, type with generated cluster_id as follows:

    array.group_by { |x| [x[0], x[1]] }.each.with_index {
      |(key, values), i|! { |x| x << i + 1 }

    The cluster_id values will be added to the array & looks like:

      ["Bear", "Grizzly", 1],
      ["Wolf", "Gray", 2],
      ["Bear", "Grizzly", 1],
      ["Bear", "Polar", 3],
      ["Wolf", "Artic", 4],
      ["Wolf", "Artic", 4],
      ["Wolf", "Gray", 2]

    Hope it helps !