即日起在codingBlog上分享您的技术经验即可获得积分,积分可兑换现金哦。

露比:如何查找数组中出现最多的项?

栈溢出 Matthew Schinckel 34℃ 0评论
本文目录
[隐藏]

1.原始问题:Ruby: How to find item in array which has the most occurrences?

[1, 1, 1, 2, 3].mode
=> 1

['cat', 'dog', 'snake', 'dog'].mode
=> dog

2.被采纳答案

First build a hash mapping each value in the array to its frequency…

arr = [1, 1, 1, 2, 3]

freq = arr.inject(Hash.new(0)) { |h,v| h[v] += 1; h }
#=> {1=>3, 2=>1, 3=>1}

… then use the frequency table to find the element with the highest frequency:

arr.max_by { |v| freq[v] }
#=> 1

3.其他高票答案

3.1.第1个答案

While I adore the grep solution for its elegance and for reminding (or teaching) me about a method in Enumerable that I’d forgotten (or overlooked completely), it’s slow, slow, slow. I agree 100% that creating the Array#mode method is a good idea, however – this is Ruby, we don’t need a library of functions that act on arrays, we can create a mixin that adds the necessary functions into the Array class itself.

But the inject(Hash) alternative uses a sort, which we also don’t really need: we just want the value with the highest occurrence.

Neither of the solutions address the possibility that more than one value may be the mode. Maybe that’s not an issue in the problem as stated (can’t tell). I think I’d want to know if there was a tie, though, and anyway, I think we can improve a little on the performance.

require 'benchmark'

class Array
  def mode1
    sort_by {|i| grep(i).length }.last
  end
  def mode2
    freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    sort_by { |v| freq[v] }.last    
  end
  def mode3
    freq = inject(Hash.new(0)) { |h,v| h[v] += 1; h }
    max = freq.values.max                   # we're only interested in the key(s) with the highest frequency
    freq.select { |k, f| f == max }         # extract the keys that have the max frequency
  end
end

arr = Array.new(1_000) { |i| rand(100) }    # something to test with

Benchmark.bm(30) do |r|
  res = {}
  (1..3).each do |i|
    m = "mode#{i}"
    r.report(m) do
      100.times do
        res[m] = arr.send(m).inspect
      end
    end
  end
  res.each { |k, v| puts "%10s = %s" % [k, v] }
end

And here’s output from a sample run.

                                user     system      total        real
mode1                          34.375000   0.000000  34.375000 ( 34.393000)
mode2                           0.359000   0.000000   0.359000 (  0.359000)
mode3                           0.219000   0.000000   0.219000 (  0.219000)
     mode1 = 41
     mode2 = 41
     mode3 = [[41, 17], [80, 17], [72, 17]]

The “optimised” mode3 took 60% of the time of the previous record-holder. Note also the multiple highest-frequency entries.

EDIT

A few months down the line, I noticed Nilesh’s answer, which offered this:

def mode4
  group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
end

It doesn’t work with 1.8.6 out of the box, because that version doesn’t have Array#group_by. ActiveSupport has it, for the Rails developers, although it seems about 2-3% slower than mode3 above. Using the (excellent) backports gem, though, produces a 10-12% gain, as well as delivering a whole pile of 1.8.7 and 1.9 extras.

The above applies to 1.8.6 only – and mainly only if installed on Windows. Since I have it installed, here’s what you get from IronRuby 1.0 (on .NET 4.0):

==========================   IronRuby   =====================================
(iterations bumped to **1000**)    user     system      total        real
mode1 (I didn't bother :-))
mode2                           4.265625   0.046875   4.312500 (  4.203151)
mode3                           0.828125   0.000000   0.828125 (  0.781255)
mode4                           1.203125   0.000000   1.203125 (  1.062507)

So in the event that performance is super-critical, benchmark the options on your Ruby version & OS. YMMV.

3.2.第2个答案

Mike: I found a faster method. Try this:

  class Array
    def mode4
      group_by{|i| i}.max{|x,y| x[1].length <=> y[1].length}[0]
    end
  end

The Benchmark output:

                                    user     system      total        real
mode1                          24.340000   0.070000  24.410000 ( 24.526991)
mode2                           0.200000   0.000000   0.200000 (  0.195348)
mode3                           0.120000   0.000000   0.120000 (  0.118200)
mode4                           0.050000   0.010000   0.060000 (  0.056315)
     mode1 = 76
     mode2 = 76
     mode3 = [[76, 18]]
     mode4 = 76

3.3.第3个答案

array.max_by { |i| array.count(i) }

3.4.第4个答案

arr = [ 1, 3, 44, 3 ]
most_frequent_item = arr.uniq.max_by{ |i| arr.count( i ) }
puts most_frequent_item
#=> 3

No need to even think about frequency mappings.

转载请注明:CodingBlog » 露比:如何查找数组中出现最多的项?

喜欢 (0)or分享 (0)
发表我的评论
取消评论

*

表情