There are more ways to compare arrays in Ruby than you might think, but they're definitely not all created equal. Choosing the wrong method can negatively impact the performance of your application.

The naive approach to array comparison is to simply use the == operator. However, this kind of comparison is rather useless in most cases, because it compares elements on a position-by-position basis rather than performing a set operation. When comparing two arrays, you probably care more about the contents themselves than the order in which they appear.

For example:

> a = [1,2,3]
> b = [3,2,1]
> a == b
=> false

So what are some possible array comparison techniques, and how fast is each one? Let's write a short program to find out.

require 'benchmark'
require 'set'

array_1 = (0..10000).to_a
array_2 = (1..10001).to_a

tests = [
	'array_1.to_set == array_2.to_set',
	'array_1 | array_2 == array_1',
	'array_1 - array_2 == []',
	'(array_1 - array_2).size == 0',
	'(array_1 - array_2).empty?'
]

Benchmark.bm { |x| tests.each { |test| x.report(test) { 500.times{eval(test)} } } }

This little script takes two arrays, each with over 10,000 elements, and compares them using various techniques. Each comparison is run 500 times consecutively and is timed using Ruby's benchmark utility.

Here are the results of the test:

Comparison Technique user system total real
array_1.to_set == array_2.to_set 11.500000 0.420000 11.920000 12.023735
array_1 | array_2 == array_1 1.610000 0.010000 1.620000 1.632412
array_1 - array_2 == [] 1.310000 0.100000 1.410000 1.424605
(array_1 - array_2).size == 0 1.360000 0.080000 1.440000 1.454920
(array_1 - array_2).empty? 1.380000 0.090000 1.470000 1.527316

Unfortunately, the most semantic approach— using the union operator (|)— is not the fastest method. Similarly, the clever technique of converting each array to a set costs a ridiculous amount of time.

The fastest comparison is made by subtracting one array from the other and checking to see if the result is equal to an empty array (array_1 - array_2 == []). Invoking a method like size or empty costs more time than the simpler == [] approach.

The difference of a few fractions of a second may not seem like much, but in some applications small inefficiencies can be multiplied and result in huge performance bottlenecks. Think about a reporting situation, for example, in which an instance method may get called on tens of thousands of records. In a case like this, saving a tenth of a second per method call saves minutes of processing time.

Comments

Leave a Comment


IdolHands.com Spam-o-MeterTM
Bot
Spammer
Moron
Human






* Required fields.