IdolHands.com :: Days in the Life of an Alpha Geek
There are more ways to compare arrays in Ruby than you might think, but they're definitely not all created equal. Choosing the wrong method can negatively impact the performance of your application.
The naive approach to array comparison is to simply use the == operator. However, this kind of comparison is rather useless in most cases, because it compares elements on a position-by-position basis rather than performing a set operation. When comparing two arrays, you probably care more about the contents themselves than the order in which they appear.
For example:
So what are some possible array comparison techniques, and how fast is each one? Let's write a short program to find out.
require 'benchmark' require 'set' array_1 = (0..10000).to_a array_2 = (1..10001).to_a tests = [ 'array_1.to_set == array_2.to_set', 'array_1 | array_2 == array_1', 'array_1 - array_2 == []', '(array_1 - array_2).size == 0', '(array_1 - array_2).empty?' ] Benchmark.bm { |x| tests.each { |test| x.report(test) { 500.times{eval(test)} } } }
This little script takes two arrays, each with over 10,000 elements, and compares them using various techniques. Each comparison is run 500 times consecutively and is timed using Ruby's benchmark utility.
Here are the results of the test:
| Comparison Technique | user | system | total | real |
|---|---|---|---|---|
array_1.to_set == array_2.to_set |
11.500000 | 0.420000 | 11.920000 | 12.023735 |
array_1 | array_2 == array_1 |
1.610000 | 0.010000 | 1.620000 | 1.632412 |
array_1 - array_2 == [] |
1.310000 | 0.100000 | 1.410000 | 1.424605 |
(array_1 - array_2).size == 0 |
1.360000 | 0.080000 | 1.440000 | 1.454920 |
(array_1 - array_2).empty? |
1.380000 | 0.090000 | 1.470000 | 1.527316 |
Unfortunately, the most semantic approach— using the union operator (|)— is not the fastest method. Similarly, the clever technique of converting each array to a set costs a ridiculous amount of time.
The fastest comparison is made by subtracting one array from the other and checking to see if the result is equal to an empty array (array_1 - array_2 == []). Invoking a method like size or empty costs more time than the simpler == [] approach.
The difference of a few fractions of a second may not seem like much, but in some applications small inefficiencies can be multiplied and result in huge performance bottlenecks. Think about a reporting situation, for example, in which an instance method may get called on tens of thousands of records. In a case like this, saving a tenth of a second per method call saves minutes of processing time.