COBOL Performance Anecdote - HP-UX Scripting
Arnold J. Trembley
Wednesday I had lunch with Jim, a long-time friend. Jim is Manager of Technical Services for a chain of retail stores. They have a small IBM mainframe (MP3000) and lots of Hewlett-Packard Unix boxes and NT Servers. He told me a story related to COBOL performance.
They have a report file that is transmitted from the IBM Mainframe to a Hewlett Packard Unix box. The particular file was about six megabytes in size, and they needed to split the report into smaller reports after it is received on the Unix box. Basically they needed a program to read each line of the large report file and scan for the jobname. When they find it they need to generate a new report file with the mainframe jobname as part of the Unix file name, and write all the report lines to that file until they encounter another jobname control break in the input data.
They gave this assignment to their HP-UX guru, who writes scripts. Since I am not a Unix expert, I am not sure which scripting tools would normally be used for this kind of task. I've heard of awk, sed, grep, and perl, but I don't know what was actually used. The HP-UX expert wrote a script that read in the six-megabyte file and created 25 output reports. It's simply a matter of splitting one large file into a set of smaller files, and it's a job that needs to run multiple times every day, whenever they need to send mainframe reports to the HP Unix box.
The Unix script ran for 51 minutes.
A six megabyte file is not a particularly large file on an IBM Mainframe (or even on an HP-9000), and Jim suggested that 51 minutes seemed like an excessively long time to split one file into 25 smaller files. Could it be speeded up somehow?
The Unix guy optimized the script, changed this and that, ran it again, and got it down to 41 minutes. Can you do better? After some more tweaking and testing, the script ran in 29 minutes. That still seems like a long time, can you improve it some more? Sorry, that's absolutely the best that can be done.
Now Jim likes to poke around and look into problems. Twenty-one years ago, when I was an entry-level COBOL programmer, Jim was my project leader. Jim hasn't coded a COBOL program in over 15 years, had never seen ANSI-85 COBOL, and had never before written COBOL on a non-EBCDIC computer. But they have Peoplesoft applications on the HP-9000 and they have an HP-UX variant of MicroFocus COBOL. So Jim wrote a COBOL program of about 200-300 lines to split the file.
Since he knows I still code in COBOL he asked me some questions about new COBOL features he had never seen before, like EVALUATE and explicit scope terminators. One feature of HP-UX COBOL he was unfamiliar with was "SELECT file-name ASSIGN TO variable-name". This allowed him to build the name of the output file at run time, inserting the mainframe jobname into the Unix file name. He asked if COBOL for OS/390 & VM has that. Not yet, I said, but it's a common extension in non-IBM Mainframe COBOL compilers. In any event, it allowed him to finish the job in less time than the three script attempts. He said he spent about four hours total to develop the COBOL program on the Hewlett-Packard computer.
The COBOL solution runs in 0.7 SECONDS (best time), 1.5 SECONDS (worst time). The Unix expert had not even believed the job could be done in COBOL.
Comparing the best script time (29 minutes = 1740 seconds) to the COBOL program's worst time (1.5 seconds) is approximately a 99.88% reduction in wall clock run time, which is a phenomenal performance improvement.
Of course, this result can be attributed entirely to the difference between interpreted code and compiled code. I have no idea how the COBOL program would have compared to a compiled C program. But I have heard other anecdotes that suggest I/O in Unix COBOL programs is faster than I/O in Unix C programs.
Click Here to return to my home page.