*This document has a standard, validated CSS2 stylesheet, which your
browser does not seem to display properly. In a browser supporting web
standards, this table of contents would be fixed at the side of the page for
easy reference.*

**Version 0.1 6, which was downloadable for a few hours on
21 September 2005, had a bug in Select. If you have that version please
update.**

This resource is intended to provide reasonable implementations of several algorithms for searching, sorting and order statistics, generally implemented out of Cormen, Leiserson, and Rivest. The procedures in this resource are all allocation-free, using only the stacks and arrays or strings passed by the caller, and can be applied interchangeably to arrays and strings, treating a string as a compact array of small integers.

Procedures are provided for efficiently finding minimum and maximum, searching sorted data, sorting small data sets by insertion, sorting larger sets or extracting order statistics using quicksort and techniques derived from it, and sorting in worst-case optimal time (though usually more slowly than quicksort) using heaps and their related operations, which are also useful for constructing other interesting structures like priority queues.

`Order`

is a `ProcSet`

resource.
To make it available to your own code, include in the setup section of
your file:

/net.anastigmatix.Order /ProcSet findresource

The `findresource`

will succeed, leaving a dictionary on the
operand stack, if you have made
the
`Order`

resource file [download]
available in any of these ways:

- You have
included the resource file in the prolog of
your own file, before the
`findresource`

(which belongs in the setup section) - You downloaded the resource file to your printer's persistent memory, since it was last powered on
- You saved the resource file in your printer's (if it has a disk or flash filesystem), or ghostscript's, resource directory under the right name
- You use
document manager software that recognizes
the
`%%DocumentNeededResources`

and`%%IncludeResource`

DSC comments, you include these comments at the right position in your file to specify that it needs net.anastigmatix.Order, your document manager software is configured to automatically insert needed resources in files being printed, and you have put the Order resource file where your document manager can find it.

The resource files are in a compact form. That is for efficiency, not to keep you from viewing them; there is a script for that on the resource packaging page.

The `Order`

dictionary may be placed on the lookup stack (with
`begin`

) for convenient access to the definitions in it, without
the bother of `get`

and `exec`

. The dictionary is
read-only, so before creating any
definitions, you will want either `userdict begin`

or *your
own dict* `begin`

so that you have a writable dictionary on top
of the dictionary stack.

This section describes the contents of the read-only dictionary that is
returned by `/net.anastigmatix.Order /ProcSet findresource`

.

A *comparison function* for most procedures in this resource is
a procedure that consumes two objects from the stack and returns only
-1 if the first object should precede the second (topmost) object, zero
if they should be considered equal, and 1 if the first should follow the
second. A comparison function should be efficient and well tested; it may
be called many times, and the stack is not protected: beneath the two objects
it is expected to consume, values internal to one of these procedures may
be visible, and problems resulting from a malfunctioning comparison that
has the wrong stack effect may be difficult to trace.

In some other packages that may be familiar, comparison functions are expected to return any negative value, zero, or any positive value, respectively. That definition must not be used with procedures in this resource. Comparison results are restricted to -1, 0, or 1 because they may be used directly to index into a choice of procedures for speed.

The deprecated procedure **InsSortStack**
is the only one in this resource taking a comparison function that can safely
use any negative or positive value, only because **InsSortStack** was
included before the restriction to -1, 0, 1 was documented. Values other than
-1, 0, 1 may work with other procedures in the current version, but are not
supported and the behavior must not be relied on.

One predefined comparison function is provided that applies the standard PostScript comparisons for integers, reals, or strings:

- PolyCmp
*num1|str1 num2|str2***PolyCmp***-1|0|1*a simple comparison procedure polymorphic to numbers or strings. Returns -1 if

*num1|str1*is less than*num2|str2*, zero if they are equal, otherwise 1.

When data can be ordered, binary searching becomes possible.

- BinarySearch
*arr|str key cmp***BinarySearch***index bool*searches for the value

*key*in the array or string*arr|str*, assumed already ordered according to the comparison function*cmp*. If a matching entry is found, return its index and*true*, otherwise return the index at which*key*could be inserted—that is, the index of the least value greater than*key*or the length of*arr|str*if none such—and*false*.

**FindMin** and **FindMax** determine both the value and the
location of the minimum or maximum entry, respectively, in an array
or string. **MinMaxCmp** and **MinMax** determine the minimum
and maximum values simultaneously, but not their locations, in about
three fourths the comparisons that would be needed to determine the
minimum and maximum separately.

- FindMin
*arr|str cmp***FindMin***index value*given array or string

*arr|str*of length at least 1, and*cmp*a comparison function, leave*index*and*value*on the stack such that*value*is the value at index*index*in*arr|str*, and for no value*x*in*arr|str*can*x value cmp*return*-1*.- FindMax
*arr|str cmp***FindMax***index value*given array or string

*arr|str*of length at least 1, and*cmp*a comparison function, leave*index*and*value*on the stack such that*value*is the value at index*index*in*arr|str*, and for no value*x*in*arr|str*can*x value cmp*return*1*.- MinMaxCmp
*array cmp***MinMaxCmp***min max*given

*array*of length at least 1, and*cmp*a comparison function, leave*min*and*max*on the stack such that each is a value from*array*, and for no value*x*in array can*x min cmp*or*max x cmp*return*-1*.This procedure differs from

**MinMaxCmp**only in taking a comparison procedure with the usual behavior (returning -1,0,1), where**MinMax**takes anything that acts like**lt**.- MinMax
*array ltproc***MinMax***min max*given

*array*of length at least 1, and*ltproc*a procedure with the semantics of**lt**, leave*min*and*max*on the stack such that each is a value from*array*, and for no value*x*in array can*x min ltproc*or*max x ltproc*return*true*.This procedure differs from

**MinMaxCmp**only in taking a comparison procedure with the behavior of**lt**instead of the general -1,0,1 comparison procedure used with most procedures in this resource.

The procedures in this family put elements in order by inserting each
in turn at the proper position among those already ordered. The simplicity
of the procedure makes it faster than the more sophisticated methods when
sorting a few items, but the time required grows as the square of the number
of items, making the technique a poor choice for larger jobs.
Internally, **QuickSort** and **Select** hand off to **InsSort**
any job below a threshold size.

- InsSort
*arr|str cmp***InsSort***arr|str*rearranges the elements of the array or string

*arr|str*in an order satisfying the comparison function*cmp*.- InsSortStk
*any*_{n-1}...any_{0}n cmp**InsSortStk***any*_{n-1'}...any_{0'}nrearranges the

*n*items*any*on the stack according to the comparison function_{n-1}...any_{0}*cmp*so that the result of*any*cannot be 1 for any_{i}any_{j}cmp*i*and*j*unless*i < j*.- InsStk
*any*_{n-1}...any_{0}n cmp anew**InsStk***any*_{n}...any_{i}anew any_{i-1}...any_{0}n+1 cmpplaces the single item

*anew*in position among the*n*items*any*already on the stack ordered according to the comparison function_{n-1}...any_{0}*cmp*as for**InsSortStk**. The count*n*, incremented, and the comparison function*cmp*remain on the stack. This procedure is intended for growing an ordered stack of items one by one.- InsSortStack
*any*_{n-1}...any_{0}cmp n**InsSortStack***any*_{n-1'}...any_{0'}n**Deprecated.**This procedure differed from**InsSortStk**only in the order of arguments (*cmp*and*n*were interchanged), and in that it accepted a*cmp*function that could return arbitrary negative or positive integers and not only -1, 0, or 1—such a function is*not supported*for other procedures in this resource. For compatibility,**InsSortStack**is retained as a wrapper that calls**InsSortStk**with transformed arguments.

A number of interesting algorithms can be built around C. A. R. Hoare's
*quicksort* or its handy *partition* operation. The sort
has an *expected* complexity of *n* log *n*, the best
possible for a comparison sort, and is also just plain faster than many
algorithms with the same theoretical bound. However, the rare input can
be in just the right order to make *quicksort* take
*n ^{2}* time. This implementation uses a (non-randomized)
median-of-3 technique to make that event less likely; in particular,
input that is already sorted will

If consistency is more important than high speed,
**HeapSort** is an alternative that
will take about 2.5 times as long as **QuickSort** on typical
inputs, but guarantees its *n* log *n*
time bound for all cases.

Useful procedures that can be built on *partition* include
**Select**, for obtaining the *i*th order statistic of
*n* elements—what would be the *i*th element if
they were in order—in expected time proportional to *n*,
which beats the obvious approach of comparison-sorting the elements
first. Like **QuickSort** itself, **Select** can be tricked by
certain unlucky inputs into exceeding the expected time bound, but is
careful to make that unlikely.

A conservative version, **SelectWC**, guarantees linear time even in the
worst case; it is
packaged separately.
As with **HeapSort** versus
**QuickSort**, you pay for that guarantee in generally longer running
times overall, about five times longer than **Select** on typical inputs.
That makes just sorting the array look much more attractive—though if
you are concerned about worst-case predictability, you will naturally be
comparing **SelectWC** to **HeapSort** rather than **QuickSort**,
and in that comparison there can still be an advantage.

Based on **Select**, **OrderStatistics** can gather any
*k* distinct order statistics from the same array in one call,
in expected time proportional to *n* log *k* (it would cost you
*nk* to just call **Select** *k* times); it exploits the
side effect that **Select** leaves the array partitioned around the
selected element. The benefit increases with array size; to get more than
a few order statistics from a small array (under 100, say), it is probably
faster to just sort the array and pick out the right elements. From the
largest possible PostScript arrays, you can gather over 100 order statistics
using **OrderStatistics** in less than the time to **QuickSort** the
array.

**OrderStatistics** takes the selection procedure to be used as a
parameter, so you can have it use **SelectWC** instead of **Select**
if you want worst-case predictability. It is also possible that you have an
array that you know is already sorted, or that is so small you are better off
sorting it, but **OrderStatistics** is still a convenient API for getting
the elements you want. In that case you can have it use **SelectOrd**, a
dummy selection that simply assumes the array is already ordered. If it
isn't, of course, you get bogus results.

- Partition
*arr|str pivot cmp***Partition***post mid pre*partitions the array or string

*arr|str*around the value*pivot*, which must be a value that occurs in*arr|str*. On completion, the initial subrange*pre*contains only values <=*pivot*, a possibly empty subrange*mid*contains only values equal to*pivot*that have been moved to their correctly ordered positions,and the final subrange*post*contains only values >=*pivot*, according to the comparison function*cmp*.Because values equal to

*pivot*are allowed anywhere in*pre*and*post*, it is never necessary for*mid*to be nonempty. The partitioning process does not necessarily move elements that equal*pivot*to their final positions, and it does not necessarily recognize all cases when it has. A nonempty*mid*is produced only when the algorithm can determine without extra work that at least one value equal to*pivot*has been put in its final place. What is guaranteed is that**Partition**will never return the entire original*arr|str*as*pre*or*post*; progress is always made. Unlike some implementations,**Partition**does not require*pivot*to be the first element, or any particular element, as long as there is at least one equal value present somewhere in*arr|str*.- QuickSort
*arr|str cmp***QuickSort***arr|str*rearranges the elements of the array or string

*arr|str*in an order satisfying the comparison function*cmp*. Below a threshold size,**QuickSort**hands off to**InsSort**.- Select
*arr|str i cmp***Select***post mid pre*locates the

*i*th order statistic from the array or string*arr|str*, that is, the value that would be at index*i-1*if*arr|str*were in order by the comparison function*cmp*. (Order statistics are 1-based by convention, while PostScript array and string indices are 0-based.) It is returned in the singleton subrange*mid*. As a side effect, all elements in the initial subrange*pre*and final subrange*post*are known to be <= and >= that element, respectively, just as if**Partition**had been called with that element as*pivot*.**FindMin**and**FindMax**are used to optimize the cases where*i = 1*or the length of*arr|str*.- SelectOrd
*arr|str i cmp***SelectOrd***post mid pre*a trivial version of

**Select**for use when*arr|str*is known to be already in order by the comparison function*cmp*. Its most likely use is as the*select*parameter to**OrderStatistics**when the input is known to be sorted.- OrderStatistics
*arr|str wanted cmp sel***OrderStatistics***wanted*uses the selection procedure

*sel*to gather from the array or string*arr|str*all of the order statistics requested in*wanted*.*wanted*is an array or string in which each element is an integer*i*between 1 and the length of*arr|str*. On completion,*wanted*is left on the stack with each element replaced by the corresponding order statistic from*arr|str*, that is, the value that would be at index*i-1*in*arr|str*if ordered by the comparison function*cmp*.*wanted*must be in ascending order and free from duplicates (requesting the same order statistic more than once is not only inefficient, but likely to cause a**rangecheck**). If*wanted*is a string, the selected elements of*arr|str*must be integers between 0 and 255, or a**rangecheck**or**typecheck**will result.**OrderStatistics**recognizes a special case when*wanted*of length two requests the first and last order statistic, and uses**MinMaxCmp**instead of*sel*. In that case, the values are found without being placed in their proper locations as a side effect, as*sel*would do.**OrderStatistics**may use this optimization during recursion as well as on the initial call, so it is in general simply not safe to assume that the values requested have arrived at their proper locations in*arr|str*when**OrderStatistics**completes.**Example.**Gather the “five-number summary” (minimum, quartiles, and maximum) from an array of 15 observations:PS>/obs [7 49 73 58 30 72 44 78 23 9 40 65 92 42 87] def PS>obs [1 4 8 12 15] //PolyCmp //Select OrderStatistics == [7 30 49 73 92]

Note that, if the number of observations is even, the definition of the five-number summary is trickier, and involves interpolation. For example, if there were 14 observations, you would request the

*six*order statistics [1 4 7 8 12 14] and average the seventh and eighth. So, a procedure to correctly compute arbitrary quantiles or five-number summaries would still be not completely trivial to write; it would have to use the array length to determine which order statistics were needed, use**OrderStatistics**to gather them, and then interpolate between adjacent ones as needed.

Not only **HeapSort** itself, but the building block **Heapify**
and **BuildHeap** operations are exported, because they can be used
in other useful data structures such as priority queues, and interesting
algorithms built on them.

**HeapSort**, though it runs about 2.5 times as long as **QuickSort**
in most cases, has the advantage that its worst-case time bound is the same
as the average case. If worst-case predictability matters more than raw
speed, **HeapSort** can be an attractive option. **HeapSort** hands
off to **QuickSort** below a certain size, where even **QuickSort**'s
worst-case time cannot be much worse than what **HeapSort** would take
anyway.

Because PostScript indexes arrays and strings starting with zero, this resource defines the following heap structure:

Root: | index 0 |

Left child from index i: |
i `1 bitshift 1 add` |

Right child from index i: |
i `1 add 1 bitshift` |

Parent from index i: |
i `1 sub -1 bitshift` |

Those operations are not exported as named procedures, because they are short and any serious application will inline them anyway.

- Heapify
*arr|str cmp heapsize i***Heapify***arr|str cmp heapsize*given array or string

*arr|str*, comparison function*cmp*, and*heapsize*such that*heapsize-1*is the greatest index at which*arr|str*contains a heap node, and*i*such that the left and right children from index*i*are already the roots of heaps, but the value at*i*may not satisfy the heap property (that is, it may compare less than its left or right child), moves elements so that the heap property is satisfied at index*i*. The time bound is given by log*heapsize*.- BuildHeap
*arr|str cmp***BuildHeap***arr|str cmp heapsize*given array or string

*arr|str*and comparison function*cmp*, arrange elements so the entire array forms a heap rooted at index zero. The length of*arr|str*is left on the stack as*heapsize*. The time bound is given by*heapsize*.- HeapSort
*arr|str cmp***HeapSort***arr|str*rearranges the elements of the array or string

*arr|str*in an order satisfying the comparison function*cmp*. Below a threshold size,**HeapSort**hands off to**QuickSort**.

- Hist
*data cutoffs bins cmp***Hist***bins*creates a histogram of the array or string

*data*in the array or string*bins*, which must have length at least*k*+1 where*k*is the length of the array or string*cutoffs*. Each element of*bins*is incremented by the number of elements of*data*less than or equal (by the comparison function*cmp*) to the corresponding element of*cutoffs*but not less than or equal to any lower element of*cutoffs*. Values greater than*cutoffs*are counted in_{k-1}*bins*. The_{k}*cutoffs*must be in ascending order.If

*bins*is an array, any of its elements 0..*k*that are not integers will be initialized to 0. No other initialization is done, so*bins*can be used to accumulate counts from multiple data sets. The largest index of any element of*bins*ever touched is*k*.As PostScript always initializes a freshly allocated string to zeros and a freshly allocated array to nulls (which

**Hist**will replace with zeros), either may be used reliably to begin a new count.- Uniq
*arr|str cmp***Uniq***arr|str'*

*arr|str cmp arr|str2***Uniq***arr|str' arr|str2'*collapses adjacent entries in

*arr|str*, when they compare equal by the comparison function*cmp*, to single entries, retaining the*first*of each group of matching entries, and returns the possibly shortened subarray/substring*arr|str'*.**Uniq**will always collapse adjacent matching entries, but of course this is guaranteed to eliminate all duplicates only if*arr|str*is initially ordered. If a literal array or string*arr|str2*is present, it should be at least as long as*arr|str*, and a possibly shortened subarray*arr|str2'*the same length as*arr|str'*will be returned, each element of which is the number of occurrences in the original input of the corresponding value in*arr|str'*.- Trichotomize
*arr|str i j***Trichotomize***post mid pre*slices the array or string

*arr|str*into three intervals, returning the interval*0..i-1*as*pre*,*i..j-1*as*mid*, and*j..length-1*as*post*, where*length*is the length of*arr|str*; the form of result returned by**Partition**and**Select**.- sload
*arr|str***sload***arr|str*_{0}... arr|str_{n-1}arr|strdoes exactly what

**aload**does, but for strings. Will work for arrays too, but**aload**is faster.- sstore
*int*_{0}... int_{n-1}arr|str**sstore***arr|str*does exactly what

**astore**does, but for strings, so the items on the stack must be integers between 0 and 255. Will work for arrays too, and then the items can be anything. But**astore**is faster.- swap
*arr|str i j***swap**–swaps the values at indices

*i*and*j*in the array or string*arr|str*.