Sorting Arrays in PHP Using a Key Function


nathanieltroutman - Posted on 16 September 2010

I've been flipping through the second edition of the "PHP Cookbook" by O'Reilly to see if there are any new tricks I can learn. I've worked with PHP quite a bit from small projects hacking Durpal to much larger ones such as a library management system, but there are always new things to learn. Especially when a language has a massive included API like PHP (and my favorite Python). I'd rather not invent the wheel some I'm actually one of those strange people who browse through API documentations just to see what the language can already do for me. After all, rarely is one the first person to want to do something in code. Enough with that, onto the meat of this post.

In the cookbook I found a recipe for sorting an array using a key function. The benifit of a key function is that it gets evaluated once per item in the array you are sorting, instead of everytime the sorting algorithm needs to compare two items. In Python this is trivial: my_list.sort(key=mykeyfunc) and you are done. However, PHP doesn't seem to have that particular functionality built in. If you are willing to take the performance hit of using a comparison function, then you can use usort or uasort. However, if you wish to speed things up you'll need a different method. 

The cookbook provides a 10 line function called pc_array_sort which will sort an array using a key function and its faster than usort. I thought cool, that might be handy. Right after that section though was one about array_multisort and after glancing over that I thought there must be a another way than how the cookbook did it. You see, array_multisort sorts multiple arrays like they were columns in a spread sheet, so it will sort the elements of the second array by the order of the sorted elements of the first array. So I opened up TextWrangler and wrote 3 lines that did the same thing . . . but better. 

Here is the code for pc_array_sort:

 

function pc_array_sort($array, $map_func, $sort_func = '') { 
    $mapped = array_map($map_func, $array);    // cache $map_func() values 
    if ('' == $sort_func) { 
        asort($mapped);                        // asort() is faster then usort() 
    }  else { 
        uasort($mapped, $sort_func);           // need to preserve keys 
    } 
    while (list($key) = each($mapped)) { 
        $sorted[] = $array[$key];              // use sorted keys 
    } 
    return $sorted; 
}

 

For comparison, here is my sort_with_keyfunc:

function sort_with_keyfunc($array, $keyfunc) {
    $keys = array_map($keyfunc, $array); // get the keys for each item
    array_multisort($keys, $array); // sort $array according to the sorted keys
    return $array;
}

 

My test array was array('azlpha', 'byeta', 'cxharlie', 'dwelta', 'evcho', 'fuoxtrot', 'gtolf', 'hsotel') and the keyfunction sorted by the second letter of the string.

 

function second_letter($word) {
    return substr($word, 1, 1);
}

To test it out we do:

 
$arr = array('azlpha', 'byeta', 'cxharlie', 
    'dwelta', 'evcho', 'fuoxtrot', 
    'gtolf', 'hsotel');
 
function second_letter($word) {
    return substr($word, 1, 1);
}
 
print_r(sort_with_keyfunc($arr, second_letter));

The sorted array looks like this:

Array
(
    [0] => hsotel
    [1] => gtolf
    [2] => fuoxtrot
    [3] => evcho
    [4] => dwelta
    [5] => cxharlie
    [6] => byeta
    [7] => azlpha
)

To compare to usort I wrote a similar comparison function:

 

function second_letter_cmp($a, $b) {
    return strcmp(substr($a, 1, 1), substr($b,1,1));
}

 

Lastly, here is the timing code I used:

 

function make_array() {
    return array('azlpha', 'byeta', 'cxharlie', 'dwelta', 'evcho', 'fuoxtrot', 'gtolf', 'hsotel');
}
function timeit($array_func, $sort_func, $cmp_func) {
    $num_loops = 10;
    $num_execs = 1000;
    $times = array();
    $total = 0;
    for ($j=0; $j<$num_loops; $j+=1) {
        $start = microtime(true);
        for($i=0; $i<$num_execs; $i+=1){
            $arr = $array_func();
            $sort_func($arr, $cmp_func);
        }
        $end = microtime(true);
        $times[] = ($end-$start);
        $total += ($end-$start);
    }
    $numfmt = "%0.4f";
    echo sprintf("%20s: Total=$numfmt Min=$numfmt Max=$numfmt Mean=$numfmt\n", 
        $sort_func, $total, min($times), max($times), ($total / $num_loops));
}
timeit('make_array', 'sort_with_keyfunc', 'second_letter');
timeit('make_array', 'pc_array_sort', 'second_letter');
timeit('make_array', 'usort', 'second_letter_cmp');

Here are the times:

 
sort_with_keyfunc: Total=0.2096 Min=0.0207 Max=0.0213 Mean=0.0210  
    pc_array_sort: Total=0.3201 Min=0.0311 Max=0.0342 Mean=0.0320  
            usort: Total=0.4460 Min=0.0441 Max=0.0453 Mean=0.0446  

The results show that sort_with_keyfunc is about twice as fast as usort and about 50% faster than pc_array_sort.

The next thing to check was sorting with an associative array.

'azlpha', 'B'=>'byeta', 'C'=>'cxharlie', 'D'=>'dwelta', 
        'E'=>'evcho', 'F'=>'fuoxtrot', 'G'=>'gtolf', 'H'=>'hsotel');
}

As you will see pc_array_sort didn't quite do what one would expect. The line $sorted[] = $array[$key]; is the cause as the keys are not copied over. However, sort_with_keyfunc does do what at least I would expect.

 

pc_array_sort:
Array
(
    [0] => hsotel
    [1] => gtolf
    [2] => fuoxtrot
    [3] => evcho
    [4] => dwelta
    [5] => cxharlie
    [6] => byeta
    [7] => azlpha
)
 
sort_with_keyfunc:
Array
(
    [H] => hsotel
    [G] => gtolf
    [F] => fuoxtrot
    [E] => evcho
    [D] => dwelta
    [C] => cxharlie
    [B] => byeta
    [A] => azlpha
)

Again, here are the times, but for sorting associative arrays:

sort_with_keyfunc: Total=0.2255 Min=0.0220 Max=0.0244 Mean=0.0225  
    pc_array_sort: Total=0.3295 Min=0.0327 Max=0.0335 Mean=0.0330  
            usort: Total=0.4490 Min=0.0443 Max=0.0454 Mean=0.0449  

We see similar trends to sorting packed numerical arrays.

So there you go, a faster, more flexible, function for sorting an array, either associative or numerical, using a key function. You can download all the code I used for testing here.

I'm looking at the function you wrote and I'm a little confused.  What are you passing in for $keyfunc?  I searched for it and can't find it anywhere else on this page.  

Okay, so its probably note the clearest, and I will modify the post to provide a much clearer example. But the magic is happening during the call to time it:

 

timeit('make_array', 'sort_with_keyfunc', 'second_letter');

 

The keyfunction that eventually gets passed into sort_with_keyfunc is the third argument, 'second_letter'. I'm letting PHP evaluate that into the name of the function. So timite is defined:

 

function timeit($array_func, $sort_func, $cmp_func)

 

part way through it the function calls the first argument to create the array to sort:

 

$arr = $array_func();

 

And then right after that it calls the sort function passing in the compare function to use:

 

$sort_func($arr, $cmp_func);

 

Hopefully that helps explain things. If you check the post over again, you'll see I've added a clearer example of it working together.

Ah.  That makes it a lot clearer.  So for instance, if I had an array which contained an ID, I would simply return that id from the sort function to array_map to draw out the keys and then multisort using those keys...?  correct?

This is great, except for arrays that contain alphanumeric values and need to be sorted using a "natural order" algorithm (a limitation of array_multisort). I submitted a request to add a SORT_NATURAL option to array_multisort here: bugs.php.net/bug.php?id=55158

In the meantime, you can do something like this if you need a natural order sort with this function:

function sortWithKeyFunction(array $array, $keyfunction)
{
	$keys 	= array_map($keyfunction, $array); // get the keys for each item
	natsort($keys);
	$r		= array();
	foreach($keys as $key => $value)
	{
		$r[]	= $array[$key];
	}
	return $r;
}

Post new comment

The content of this field is kept private and will not be shown publicly.
By submitting this form, you accept the Mollom privacy policy.