
Sorting Algorithms
by Leif Svalgaard
Sorting is one of the fundamental devices in information
processing. It is also one of the most subtle and least understood.
The difference between a good and a mediocre sorting algorithm is
hard to see when dealing with small amounts of data. However, with
large amounts, the difference is so massive that it can totally
destroy the effectiveness of a program.
To see why this difference is so important, look at the way that
sorting works. We compare the data items with each other, and
rearrange them in order. Simply comparing two values takes a
certain amount of time. Moving data around takes significantly
longer. The best sorting algorithm does as few compares and moves as
possible.
In the worst algorithm (simple bubble sort), each data item is
compared to the remaining unsorted items, so that the maximum number
of comparisons needed for n items is n(n1). For ten items, this
means around 100 comparisons. For 1000 items, 1,000,000 comparisons.
At ten thousand items, the system is no longer responding. The
fastest sorting methods approach kn(log_{2}
n) comparisons, where k is a small constant
depending on the method. If k is 2, this means 46
comparisons for 10 items, 14 thousand comparisons for 1000 items,
and 28 million comparisons for 1 million items.
As well as reducing the number of comparisons, good sorting
algorithms try to reduce the number of moves. Typically, the data
part of an item is much larger than the key part, so moving data
around is significantly slower then comparing. A useful way to
reduce the costs of moving data is to use indirect tables.
Note that the code required to implement a really efficient
quicksort is large and complex compared to bubble sort. The sort
algorithm you use depends on the amount and type of data you may
need to sort.
In all examples of code below, we assume that the data being
sorted is defined as follows:
01 TABLETOSORT.
02 TABLESIZE PIC S9(5) COMP.
02 TABLEMAX PIC S9(5) COMP VALUE +1600.
02 TABLEITEM OCCURS 1600 TIMES
03 TABLEKEY PIC X(9).
03 TABLEDATA PIC X(11).
The size of the key and data simply reflect the data used in the
test programs. The table size, 1600 items, makes the table 32000
bytes large. COBOL does not allow tables greater than 32767 bytes
(except as a nonportable extension on some systems). A useful
device for working with larger tables (memory permitting) is to
split the table:
01 LARGETABLEITEM1.
02 TABLEKEY PIC X(9) OCCURS 2900 TIMES.
01 LARGETABLEITEM2.
02 TABLEDATA PIC X(11) OCCURS 2900 TIMES.
1. Bubble Sort
As the name suggests, bubble sort moves items up a table like
bubbles in a tube. The algorithm can be explained as follows: pass
over the data, comparing and exchanging items so that the largest
item ends up at the end of the table. Repeat for the remaining items
until the table is sorted, that is, no exchanges were made during
the latest pass.
The following code does a bubble sort:
01 VARIOUSINDICES.
02 ITEMNBR PIC S9(5) COMP.
02 SWAPNBR PIC S9(5) COMP.
02 JUMPSIZE PIC S9(5) COMP.
02 UPPERLIMIT PIC S9(5) COMP.
01 VARIOUSVALUES.
02 SWAPITEM PIC X(20).
02 SWAPINDICATOR PIC X(1).
88 NOMORESWAPS VALUE IS SPACE.
BUBBLESORTTHEARRAY.
MOVE "SWAP" TO SWAPINDICATOR
MOVE 1 TO JUMPSIZE
PERFORM BUBBLESORT
UNTIL NOMORESWAPS
.
BUBBLESORT.
MOVE SPACE TO SWAPINDICATOR
COMPUTE UPPERLIMIT = TABLESIZE  JUMPSIZE
PERFORM COMPAREANDSWAPKEYS
VARYING ITEMNBR FROM 1 BY 1
UNTIL ITEMNBR > UPPERLIMIT
.
COMPAREANDSWAPKEYS.
COMPUTE SWAPNBR = ITEMNBR + JUMPSIZE
IF TABLEKEY (ITEMNBR) > TABLEKEY (SWAPNBR)
MOVE TABLEITEM (ITEMNBR) TO SWAPITEM
MOVE TABLEITEM (SWAPNBR) TO TABLEITEM (ITEMNBR)
MOVE SWAPITEM TO TABLEITEM (SWAPNBR)
MOVE "SWAP" TO SWAPINDICATOR
.
2. Heapsort
Heapsort uses an intermediate structure, a heap, which is rather
like a binary tree, with each son less than or equal to its father.
Once the data is partially ordered into a heap, it can be sorted
very quickly. The heap is built up within the actual table being
sorted, so does not need any extra storage. The detailed workings of
heapsort are quite complex, though very efficient.
The advantage of heapsort over bubble sort is simply that it is a
lot faster for large amounts of data; 2n(log2n) comparisons are
needed on average. Additionally, heapsort is quite compact compared
to the quicksort algorithm presented later:
01 VARIOUSINDICES.
02 ITEMNBR PIC S9(5) COMP.
02 COPYPTR PIC S9(5) COMP.
02 LEFTPTR PIC S9(5) COMP.
02 RIGHTPTR PIC S9(5) COMP.
02 TOPTR PIC S9(5) COMP.
02 FROMPTR PIC S9(5) COMP.
01 VARIOUSVALUES.
02 EXCHANGEITEM.
03 EXCHANGEKEY PIC X(9).
03 EXCHANGEDATA PIC X(11).
HEAPSORTTHETABLE.
COMPUTE LEFTPTR = TABLESIZE / 2 + 1
COMPUTE RIGHTPTR = TABLESIZE
PERFORM SIFTTHEHEAPBYLEFT
UNTIL LEFTPTR < 2
PERFORM SIFTTHEHEAPBYRIGHT
UNTIL RIGHTPTR < 2
.
SIFTTHEHEAPBYLEFT.
SUBTRACT 1 FROM LEFTPTR
PERFORM SIFTTHEHEAP
.
SIFTTHEHEAP.
COMPUTE TOPTR = LEFTPTR
COMPUTE FROMPTR = LEFTPTR + LEFTPTR
MOVE TABLEITEM (TOPTR) TO EXCHANGEITEM
PERFORM BUILDITEMSINTOHEAP
UNTIL FROMPTR > RIGHTPTR
MOVE EXCHANGEITEM TO TABLEITEM (TOPTR)
.
BUILDITEMSINTOHEAP.
IF FROMPTR < RIGHTPTR
COMPUTE COPYPTR = FROMPTR + 1
IF TABLEKEY (FROMPTR) < TABLEKEY (COPYPTR)
MOVE COPYPTR TO FROMPTR
.
IF EXCHANGEKEY NOT < TABLEKEY (FROMPTR)
COMPUTE FROMPTR = RIGHTPTR + 1
ELSE
MOVE TABLEITEM (FROMPTR) TO TABLEITEM (TOPTR)
MOVE FROMPTR TO TOPTR
COMPUTE FROMPTR = TOPTR + TOPTR
.
SIFTTHEHEAPBYRIGHT.
MOVE TABLEITEM (1) TO EXCHANGEITEM
MOVE TABLEITEM (RIGHTPTR) TO TABLEITEM (1)
MOVE EXCHANGEITEM TO TABLEITEM (RIGHTPTR)
SUBTRACT 1 FROM RIGHTPTR
PERFORM SIFTTHEHEAP
.
3. Quicksort
Sorting is a problem which has been solved. The fastest sort
algorithm is quicksort. If you ever need to sort significant amounts
of data, use this method. There are some notable aspects of
quicksort that you should be aware of, even if the detailed
mechanism is not important:
 Quicksort is a recursive procedure. This is easiest to
implement in a language such as Pascal or C, but quite possible
in COBOL, as we shall see.
 Quicksort is unstable. This means that two records with the
same key may not end up in the same order after sorting.
Heapsort and Combsort are also unstable. Bubble sort is, in
contrast, stable (its one good feature). You can add stability
to the sort by extending the key to include a sequence number,
so that there are in fact no duplicate keys.
 Quicksort performs badly once the amounts of data become small
due to the overhead of recursion. This can be avoided by using a
secondary sort when the partition size is less than some magic
figure.
 Quicksort performs badly with certain types of data; this can
be improved by judicious choice of a pivot point at each sort,
f. ex. the middle point.
The following highlyoptimised implementation is a good example
of pseudorecursive programming in COBOL.
It uses an insertion sort to handle small partitions:
01 SORTHANDLING.
02 CURLOWERLIMIT PIC S9(5) COMP.
02 CURUPPERLIMIT PIC S9(5) COMP.
02 PARTITIONSIZE PIC S9(5) COMP.
02 MINIMUMPARTITION PIC S9(5) COMP VALUE +13.
02 LOWERPTR PIC S9(5) COMP.
02 MIDDLEPTR PIC S9(5) COMP.
02 UPPERPTR PIC S9(5) COMP.
02 PREVPTR PIC S9(5) COMP.
02 STACKTOP PIC S9(5) COMP VALUE +20.
02 STACKPTR PIC S9(5) COMP.
02 STACKNXT PIC S9(5) COMP.
02 QUICKSORTSTACK OCCURS 20.
03 LOWERLIMIT PIC S9(5) COMP.
03 UPPERLIMIT PIC S9(5) COMP.
02 PIVITEM.
03 PIVKEY PIC X(9).
03 PIVDATA PIC X(11).
02 COMPITEM.
03 COMPKEY PIC X(9).
03 COMPDATA PIC X(11).
02 EXCHGITEM PIC X(20).
02 COMPARERESULT PIC X.
88 COMPAREGREATERTHAN VALUE "G".
88 COMPARELESSTHAN VALUE "L".
QUICKSORTTHETABLE.
MOVE 1 TO STACKPTR, LOWERLIMIT (STACKPTR)
MOVE TABLESIZE TO UPPERLIMIT (STACKPTR)
PERFORM QUICKSORTTHEENTRIES
UNTIL STACKPTR = ZERO
PERFORM INSERTIONSORT
VARYING UPPERPTR FROM 2 BY 1
UNTIL UPPERPTR > TABLESIZE
.
QUICKSORTTHEENTRIES.
MOVE LOWERLIMIT (STACKPTR) TO CURLOWERLIMIT
MOVE UPPERLIMIT (STACKPTR) TO CURUPPERLIMIT
IF CURLOWERLIMIT < CURUPPERLIMIT
MOVE CURLOWERLIMIT TO LOWERPTR
MOVE CURUPPERLIMIT TO UPPERPTR
COMPUTE PARTITIONSIZE = UPPERPTR  LOWERPTR + 1
IF PARTITIONSIZE > MINIMUMPARTITION
PERFORM FINDAPIVOTENTRY
PERFORM PARTITIONENTRIES
UNTIL UPPERPTR < LOWERPTR
PERFORM QUICKSORTTHEPARTITIONS
PERFORM INCREASEANDCHECKSTACKPTR
ELSE
SUBTRACT 1 FROM STACKPTR
ELSE
SUBTRACT 1 FROM STACKPTR
.
INCREASEANDCHECKSTACKPTR.
IF STACKPTR < STACKTOP
ADD 1 TO STACKPTR
ELSE
MOVE ZERO TO STACKPTR
.
FINDAPIVOTENTRY.
COMPUTE MIDDLEPTR = (LOWERPTR + UPPERPTR) / 2
MOVE TABLEITEM (MIDDLEPTR) TO PIVITEM
.
PARTITIONENTRIES.
MOVE "LT" TO COMPARERESULT
PERFORM INCREASELOWER
UNTIL COMPAREGREATERTHAN
MOVE "GT" TO COMPARERESULT
PERFORM DECREASEUPPER
UNTIL COMPARELESSTHAN
IF LOWERPTR NOT > UPPERPTR
PERFORM EXCHANGEUPPERANDLOWER
ADD 1 TO LOWERPTR
SUBTRACT 1 FROM UPPERPTR
.
INCREASELOWER.
MOVE TABLEITEM (LOWERPTR) TO COMPITEM
IF COMPKEY < PIVKEY
ADD 1 TO LOWERPTR
ELSE
MOVE "GT" TO COMPARERESULT
.
DECREASEUPPER.
MOVE TABLEITEM (UPPERPTR) TO COMPITEM
IF COMPKEY > PIVKEY
SUBTRACT 1 FROM UPPERPTR
ELSE
MOVE "LT" TO COMPARERESULT
.
EXCHANGEUPPERANDLOWER.
MOVE TABLEITEM (LOWERPTR) TO EXCHGITEM
MOVE TABLEITEM (UPPERPTR) TO TABLEITEM (LOWERPTR)
MOVE EXCHGITEM TO TABLEITEM (UPPERPTR)
.
QUICKSORTTHEPARTITIONS.
COMPUTE STACKNXT = STACKPTR + 1
IF UPPERPTR  CURLOWERLIMIT < CURUPPERLIMIT  LOWERPTR
MOVE CURLOWERLIMIT TO LOWERLIMIT (STACKNXT)
MOVE UPPERPTR TO UPPERLIMIT (STACKNXT)
MOVE LOWERPTR TO LOWERLIMIT (STACKPTR)
MOVE CURUPPERLIMIT TO UPPERLIMIT (STACKPTR)
ELSE
MOVE LOWERPTR TO LOWERLIMIT (STACKNXT)
MOVE CURUPPERLIMIT TO UPPERLIMIT (STACKNXT)
MOVE CURLOWERLIMIT TO LOWERLIMIT (STACKPTR)
MOVE UPPERPTR TO UPPERLIMIT (STACKPTR)
.
INSERTIONSORT.
MOVE "GT" TO COMPARERESULT
PERFORM INSERTINPLACE
VARYING LOWERPTR FROM UPPERPTR BY 1
UNTIL NOT COMPAREGREATERTHAN
.
INSERTINPLACE.
COMPUTE PREVPTR = LOWERPTR  1
IF PREVPTR < 1
MOVE "LT" TO COMPARERESULT
ELSE
MOVE TABLEITEM (PREVPTR) TO COMPITEM
MOVE TABLEITEM (LOWERPTR) TO PIVITEM
IF COMPKEY > PIVKEY
MOVE PIVITEM TO TABLEITEM (PREVPTR)
MOVE COMPITEM TO TABLEITEM (LOWERPTR)
ELSE
MOVE "LT" TO COMPARERESULT
.
4. Combsort
Just as we thought that the last word had been said about
sorting, a breakthrough comes along and spoils everything. In the
April 1991 issue of BYTE magazine, Stephen Lacey and Richard Box
show that a simple modification to bubble sort makes it a fast and
efficient sort method on par with heapsort and quicksort.
In a bubble sort, each item is compared to the next; if the two
are out of order, they are swapped. This method is slow because it
is susceptible to the appearance of what Box and Lacey call turtles.
A turtle is a relatively low value located near the end of the
table. During a bubble sort, this element moves only one position
for each pass, so a single turtle can cause maximal slowing. Almost
every long table of items contains a turtle.
Their simple modification of bubble sort which they call `combsort'
eliminates turtles quickly by allowing the distance between compared
items to be greater than one. This distance  the JUMPSIZE  is
initially set to the TABLESIZE. Before each pass, the JUMPSIZE is
divided by 1.3 (the shrink factor). If this causes it to become less
than 1, it is simply set to 1, collapsing combsort into bubble sort.
An exchange of items moves items by JUMPSIZE positions rather than
only one position, causing turtles to jump rather than crawl. As
with any sort method where the displacement of an element can be
larger than one position, combsort is not stable  like elements do
not keep their relative positions. This is rarely a problem in
practice.
Successively shrinking the JUMPSIZE is analogous to combing
long, tangled hair  stroking first with your fingers alone, then
with a pick comb that has widely spaced teeth, followed by finer
combs with progressively closer teeth  hence the name comb sort.
Lacey and Box came up with a shrink factor of 1.3 empirically by
testing combsort on over 200,000 random tables. There is at present
no theoretical justification for this particular value; it just
works...
Here is then the magic code (reusing declarations
and a code paragraph from bubble sort). It is
clearly correct, as it (unless the table is empty) ends with
JUMPSIZE = 1 (ensured by the `+3') and therefore degenerates into
bubble sort:
COMBSORTTHEARRAY.
MOVE TABLESIZE TO JUMPSIZE
PERFORM COMBSORT
UNTIL NOMORESWAPS
AND JUMPSIZE NOT > 1
.
COMBSORT.
COMPUTE JUMPSIZE = (10 * JUMPSIZE + 3) / 13
COMPUTE UPPERLIMIT = TABLESIZE  JUMPSIZE
MOVE SPACE TO SWAPINDICATOR
PERFORM COMPAREANDSWAPKEYS
VARYING ITEMNBR FROM 1 BY 1
UNTIL ITEMNBR > UPPERLIMIT
.
The careful termination test (JUMPSIZE NOT > 1) also caters for
the case where the table is empty.
5. Comparison of Table Sorting Methods
The above implementations of bubble sort, heapsort, quicksort,
and combsort were tested on from 200 to 1600 items, each consisting
of a 9character key and 11 characters of data. The times below are
measured in 1/1000ths of a second and are the averages of sorting
four different random sequences. The test program was run on a PC
(16 MHz 386SX) and on an AS/400 (B35). After the tables had been
sorted, they were sorted again to measure the time for sorting
already sorted data.
a. Results for Bubble Sort
Items: Order: Time PC: Time AS/400:
200 random 1072 5490
sorted 9 36
400 random 4372 24126
sorted 16 62
800 random 17875 100725
sorted 28 113
1600 random 71390 401785
sorted 55 213
Bubble sort is very fast on already sorted data, but very slow on
random data. As the number of data elements grows, bubble sort slows
to a crawl and becomes useless.
b. Results for Heapsort
Items: Order: Time PC: Time AS/400:
200 random 82 342
sorted 68 362
400 random 151 770
sorted 165 813
800 random 330 1734
sorted 357 1829
1600 random 783 3857
sorted 811 4046
Heapsort is of medium complexity and does a very good job. On
already sorted data it slows down a trifle.
c. Results for Quicksort
Items: Order: Time PC: Time AS/400:
200 random 68 275
sorted 20 127
400 random 165 650
sorted 68 273
800 random 330 1410
sorted 151 613
1600 random 715 2974
sorted 316 1374
Quicksort is of high complexity and owes its performance on sorted
data to the builtin insertionsort.
d. Results for Combsort
Items: Order: Time PC: Time AS/400:
200 random 110 453
sorted 55 326
400 random 192 1103
sorted 151 814
800 random 467 2621
sorted 316 1826
1600 random 1076 6299
sorted 715 4254
Combsort is just a single, easy line more than bubble sort, but
performs spectacularly; in fact, almost as well as the grotesquely
more complex quicksort. Because of its magic simplicity, this is the
method we recommend  except for the most demanding applications
where the 50 percent improvement quicksort offers may be important.
6. Pseudosorting
Sometimes it is not necessary to do a full sort of the array.
The classical case is that of finding the median, ie. that value for
which half of the array items are greater and the other half
smaller.
The following very fast algorithm is due to C.A.R. Hoare (CACM,
Jan. 1971). The program finds the element of the array whose value
is fth in order or magnitude; and rearranges the array such
that this element is placed at index f; and furthermore
that all elements with subscripts lower than f have lesser
values, and all elements with subscripts greater than f
have greater values.
To find the median, set f = (1 + ARRAYSIZE)/2 ;
to find the first quartile, set f = (1 +
ARRAYSIZE)/4 , etc.:
FINDTHEMEDIAN.
MOVE 1 TO LOWERBOUND
MOVE ARRAYSIZE TO UPPERBOUND
COMPUTE MEDIANINDEX = (1 + ARRAYSIZE) / 2
PERFORM REDUCEMIDDLEPART
UNTIL LOWERBOUND NOT < UPPERBOUND
.
REDUCEMIDDLEPART.
MOVE ARRAYKEY (MEDIANINDEX) TO MEDIANKEY
MOVE LOWERBOUND TO LOWERINDEX
MOVE UPPERBOUND TO UPPERINDEX
PERFORM NARROWTHERANGE
UNTIL LOWERINDEX > UPPERINDEX
IF MEDIANINDEX NOT > UPPERINDEX
MOVE UPPERINDEX TO UPPERBOUND
ELSE
IF MEDIANINDEX NOT < LOWERINDEX
MOVE LOWERINDEX TO LOWERBOUND
ELSE
MOVE LOWERBOUND TO UPPERBOUND
.
NARROWTHERANGE.
PERFORM INCREASELOWERINDEX
UNTIL ARRAYKEY (LOWERINDEX) NOT < MEDIANKEY
PERFORM DECREASEUPPERINDEX
UNTIL ARRAYKEY (UPPERINDEX) NOT > MEDIANKEY
IF LOWERINDEX NOT > UPPERINDEX
MOVE ARRAYITEM (LOWERINDEX) TO SWAPITEM
MOVE ARRAYITEM (UPPERINDEX) TO ARRAYITEM (LOWERINDEX)
MOVE SWAPITEM TO ARRAYITEM (UPPERINDEX)
PERFORM INCREASELOWERINDEX
PERFORM DECREASEUPPERINDEX
.
INCREASELOWERINDEX.
ADD 1 TO LOWERINDEX
.
DECREASEUPPERINDEX.
SUBTRACT 1 FROM UPPERINDEX
.
The algorithm uses the standard array
declarations of this paper with the following additional data
declarations:
01 FINDHANDLING.
02 LOWERBOUND PIC S9(5) COMP.
02 UPPERBOUND PIC S9(5) COMP.
02 MEDIANINDEX PIC S9(5) COMP.
02 LOWERINDEX PIC S9(5) COMP.
02 UPPERINDEX PIC S9(5) COMP.
The Author
Leif Svalgaard has been a software developer for the past 35 years. Leif's software development projects include extensive work with system level APIs (application programming interfaces) for a great many systems including AS/400, Unix, and PC; implementation of large networking systems used by several telephone companies; and as codeveloper of the historically important RC4000 multiprogramming Operating System.
From 1999  2000 he served as Senior Developer at PentaSafe Security Technologies, Inc., (Houston, TX) developing securityrelated software for the IBM AS/400.
Leif served as the Director of Development from 1994  1998 at T.O.S.C. International, Inc., (Houston, TX) with responsibility for developing and maintaining the ETK (Easy ToolKit) product. ETK is a large (1,000,000 lines) application written in Cobol (writing over half the code himself) that generates Cobol code that will run on any platform that has a Cobol compiler.
At Stanford University, from 1972 through 1978 Leif worked both as a computer software specialist and as a space physics scientist, participating in the design and implementation of the operating system and realtime control software for the Wilcox Solar Observatory (then the Stanford Solar Observatory) telescope.
Other areas in which Leif has extensive experience include PC Assembler, Unix Cprogramming and operating system interfaces, AS/400 COBOL and MI (assembler), IBM/MVS/CICS Assembler and CICS command/macro interface, VAX/VMS System Internals and assembler language, Data General System interfacing and assembler language, HP3000 System interfacing, and Bull GCOS7/8 System interfacing and integration. Leif recently developed a GUI interface for legacy screens using the AWT for Java.

