by
Shawn M. Gordon
Introduction
As hard as it is for the young programmer to believe, at least 65% of the worlds code is still written in COBOL, with 5 Billion new lines a year being written. COBOL applications process 200 times more data per day than Google does. They don’t even teach this in school any more, and it is a shame because COBOL is a fantastic language for business, much better than anything to have existed with the possible exception of Python. This is because of its robust record structure and clean string variables and conditional logic. While this article is primarily meant for COBOL programmers to understand C, you can easily reverse the examples. At some point that old COBOL code is going to have to be rewritten or maintained, one reason it is still there and needs so little hand holding because of how stable and clean COBOL is.
Remember that C and COBOL come from completely different roots, this is what makes comparisons very difficult.
Preprocessor
COBOL gives us a couple of options for sharing previously defined record layouts and variable declarations. The easiest and most straight forward is the $INCLUDE ‘filename’. This will take a file and copy it into the source file at compile time. This works just fine if you aren’t worried about modifying any of the statements and is analogous to the to the “import” in Python or the “include” in C/C++/etc.
The more versatile option is to use COBEDIT and maintain a COPYLIB. The advantage to using the COPY command to retrieve members from a COPYLIB is that you can do dynamic text replacement. So if you want to define your customer master data set twice, once for the data set and once for a file, you could have set up a prefix that in the COPYLIB that you could change at compile time by using the command;
COPY textname REPLACING ==pseudo-text-1== BY ==pseudo-text-2==.
Thus eliminating the need to maintain two different definitions in your COPYLIB.
While C does have an ‘include’ statement, it doesn’t have an equivalent to the COPY statement. However, by using the ‘#if’, ‘#ifdef’, ‘#elif’ and ‘#ifndef’ commands you can control what files get included at compile time. Say for example you are coding some sort of portable application and you have defined different file I/O functions in different files to accommodate your different platforms. You could control it very easily with the following construct;
#define SYS "Windows"
#if SYS == "Windows"
#include "windows.h"
#elif SYS == "MacOS"
#include "macos.h"
#elif SYS == "Linux"
#include "linux.h"
#endif
There are several important things here to learn; First off the #define command is actually almost the same as the $DEFINE in COBOL. They both allow you to define and name a macro, this macro will create “in-line” code, so it will generate more efficient, if slightly larger code. However the COBOL define statement can be as long as you want, whereas many C compilers restrict you to just the line that contains the #define statement. Passing parameters to a C macro is more confusing as well, just make sure to sprinkle liberally with parentheses to ensure the order of precedence.
The #define can also work rather like an 88 level variable in COBOL, for example;
#define FALSE 0
#define TRUE !FALSE
I could then say ‘if FALSE’ just like in COBOL with an 88. The advantage here is that I can define TRUE to be ‘not’ FALSE by use of the ! (which means not) in the second #define statement. This doesn’t really have an equivalent in COBOL.
Now that I have explained just enough about the #define statement to get us through our first example, why don’t I finish explaining the example.
Since we have defined SYS to be equal to the character string HP3000 the first #if statement will be true and the file hp3kio.h will be copied into the program at compile time. The #elif is a really stupid way of saying ‘else if’, only the assembler gods know why they chose to abbreviate it the way they did. So if we want to compile our code to run on a VAX we would only need to change the #define line, and then our code will run on a VAX if we have done our initial setup correctly.
Believe it or not you could accomplish almost the same thing using the little known and rarely used compiler switches available in COBOL. By using the following commands you could accomplish a similar effect.
$SET X0=OFF
$IF X0=OFF
COPY WINDOWS
$IF X0=ON
COPY LINUX
$IF
It is important to note that while COBOL is not case sensitive, C is. So myvar, Myvar, and MYVAR would actually be three different things. An informal standard that I have seen is that all your ‘regular’ C code is in lower case and any macros and sometimes functions, that you define are done in upper case to make it quick and easy to distinguish between what is part of C and what is yours. This isn’t a bad idea at all.
One last note while we are talking about getting started, is how to put comments in your code. In COBOL you put a * in column 7 and then type whatever you want, or you can use the last 5 characters of each line to put something meaningful. In C our comments must be enclosed in matching /* … */, this can occur on a line with commands, by itself, or spread across lines. Here are a few examples;
/* here is my comment */
/*
here is another comment
that spans multiple lines
*/
a=b; /* here is a comment on a line with a command */
VARIABLE DECLARATION
The most obvious next step is how to define variables in C. This is where C really has it over COBOL in some respects, as you can define global AND local variables. This would be the equivalent of having something like a state validation paragraph in a COBOL program that declared all the variables it needed at the top of the paragraph and then they went away when the paragraph returned to the calling process. Imagine, no more global variables getting declared that only got used once.
C differs significantly from COBOL in the way that variables are defined.
COBOL C --------------------------------------------------------- PIC S9(4) COMP short PIC S9(9) COMP long PIC 9 char
An important note here on declaring simple integers in C, a standard declaration would be ‘int my_counter’. Now the length of my_counter will be dependent on the native architecture of the machine it was declared on. So on a 16 bit system saying ‘int’ will declare a 16 bit integer or PIC S9(4) COMP. The problem is that if you move that code to a 32 bit system, that declaration will suddenly be 32 bit or PIC S9(9) COMP. We get around that by declaring them to be either a ‘short’ int which is 16 bits, or a ‘long’ int which is 32 bits.
Our next variable type is the string or character array. C deals with strings in an extremely different way. Because everything is a single character you have to define an array of characters, this is also how you have to reference it. Here is an example of how you could do it in COBOL and how you would have to do it in COBOL if it worked like C;
char name[8]; /* a variable called name that is eight characters */
01 NAME PIC X(08).
or if it was defined like C would have you do
01 NAME-ARRAY 05 NAME PIC X OCCURS 8 TIMES.
Actually character arrays in C will always be null terminated, so if you needed an eight character array you would need to make it nine to account for the null character at the end.
The only way to initialize a character array to spaces in C is to move single characters to each element of the array. Here would be your choices in the two languages;
01 NAME PIC X(08) VALUE SPACES.
or
MOVE SPACES TO NAME.
or
INITIALIZE NAME.
int i;
char name[9];
for (i=1; i<=9; i++)
name[i] = ” “;
Pretty nasty huh? I will explain all the things in the C statement that didn’t make sense later. C also differentiates between a character and a string. Since the ‘char’ type only really declares a single character those don’t need to be null terminated, so these two declarations are different;
char switch;
char switch2[1];
You would actually need to make ‘switch2’ an array of two characters for it to function the way you would expect it to because of the null terminator on strings. A null is defined in C with \0, and all of the string manipulation functions rely on the proper placement of the null character so that you will get the expected output.
The distinction between a character and a string in C takes a little getting used to, for example, if you want to initialize our character variable switch to Y you would enclose it in the single quote character ‘, i.e., ‘Y’. Single quotes denote that it is a single character, whereas for switch2 you would use double quotes ” to enclose the string, i.e., “Y”. It is VITAL that you keep this straight, some compilers won’t complain if you use this incorrectly and you could get some really unpredictable results.
Now that we know how to declare simple variables in C, how would we declare a record structure analogous to the 01 variable declaration in COBOL? There is what is known as the ‘struct’ in C that is used for this exact purpose, although the implementation is a little bit odd. First let’s declare a simple layout in COBOL, then I’ll do the same in C;
01 CUST-MAST 03 CM-NUMBER PIC X(06). 03 CM-NAME PIC X(30). 03 CM-AMT-OWE PIC S9(9) COMP. 03 CM-YTD-BAL PIC S9(9) COMP OCCURS 12. 03 CM-PHONE PIC X(10). struct customer_master { char cm_number[7]; char cm_name[31]; long cm_amt_owe; long cm_ytd_bal[12]; char cm_phone[10]; }; struct customer_master cust_mast;
The ‘struct’ verb declares a template of the record type that you are concerned with, once the template is declared you can then declare a variable that is a type of that structure. So the line:
struct customer_master cust_mast
declares a variable ‘cust_mast’ to be of a type ‘customer_master’. You would then reference the member’s of the structure by specifying the variable name dot member, i.e., cust_mast.cm_name This can be especially handy if you are going to reuse a structure for a different purpose. The drawback here is that there is no convenient way to initialize the variable inside of a structure without addressing each member individually. COBOL has the very handy INITIALIZE verb to do this, you could write a general purpose initialization function in C that would serve the same purpose however. You can name the structure at the same time as you declare if you don’t want to reuse it. After the } and before the ; just put any old variable name that you want it to have. The last common verb used in the Working Storage section is REDEFINES. The C version is the ‘union’ verb. Redefines is mostly handy for working with a variable as either alpha or numeric. Since byte referencing was introduced in COBOL-85, you hardly ever see a REDEFINE statement used to get at various substrings within a variable anymore. Now let’s look at how you would declare a REDEFINE and a ‘union’.
01 CUST-MASTER 03 CM-DAYS-LATE PIC XXX. 03 CM-DL-REDEFINE REDEFINES CM-DAYS-LATE. 05 CM-DL-NUM PIC 999. union redef { char days_late[4]; int dl_num; }; union redef days_late_test;
The setup and use of unions is very similar to structs, you can even put a union inside a struct, which is where you would want to use it most of the time anyway. We made ‘days_late’ a character array of 4 because we have to remember to account for the null character. You can do all sorts of strange things with union’s if you care to, but that is really all I am going to touch on.
One other type that I want to touch on is the enumerated type. By using the ‘enum’ keyword, we can create a new “type” and specify values it may have. (Actually, ‘enum’ is type ‘int’, so we really create a new name for an existing type.) The purpose of the enumerated type is to help make a program more readable, like the COBOL 88 level. The syntax is similar to that used for structures;
enum ranges {min = 10, max = 100, mid = 55};
enum ranges tester;
tester = mid;
if (tester > min)
The if statement would be true because tester would have a value of 55. I suggest that if you want to use enumerated types that you read up on them a heck of a lot more than what I just touched on here.
The last point I want to make about variable declaration is that C has very little facility for applying edit masks compared to COBOL. This makes it a less than convenient language for writing reports and such where date and dollar edit masks are used extensively.
OPERATORS
Let’s first go through the difference between = and == in C. In COBOL there are several ways to get data into a variable. A common way is the MOVE verb, the equivalent in C is =. This is confusing because if you set up a logical test and use = it will always evaluate to true because the value on the right hand side of the equal sign will be assigned to the variable on the left side. You need to use == if you want to compare values. Here is a simple mistake to make that can cause all sorts of problems. If in C I were to say
if(my_int = 5)
what would happen is that the variable ‘my_int’ would be assigned the value 5 and since the assignment went successfully then it would return a non-zero value which would indicate true. This would make the ‘if’ statement evaluate as true. So this ‘if’ statement would always be true. Simple mistake, major repercussions, and the compiler won’t complain about it because it’s a valid statement. Make sure you learn the distinction between = and == early and never forget it.
Our next boolean operator is ‘AND’. COBOL makes it very easy to compare multiple values in an IF statement;
IF (VAR1 = VAL1) AND (VAR2 = VAL2 OR VAL3) THEN
The C representation for ‘AND’ is ‘&&’, ‘OR’ is ‘||’, and ‘NOT EQUAL’ is identified by ‘!=’.
None of these are particularly difficult to learn, but they are less than intuitive when first learning the language.
Being creative you could make use of the #define to make the operator’s anything you want, for example;
#define AND &&
#define OR ||
#define NOT !=
#define EQUALS ==
Now this may tick off the other C programmers, but if it helps you to make your code more readable and easier to maintain, who’s to say it’s wrong?
ASSIGNMENT OPERATORS
Since I already talked about how the = sign in C is equivalent to the COBOL MOVE verb, I won’t talk about it again. However another option to using the = sign in C is to use the string function ‘strcpy’. This performs a ‘string copy’ into a variable. This is a good way to initialize a character array and make sure that a null is properly placed (strcpy appends the null to the end of the string copied in), especially if you want to append data to the string later. The verb in C to append strings is ‘strcat’ for ‘string concatenation’. In COBOL we have the ‘STRING’ verb, which gives you much finer control over how variables are concatenated. All the ‘strcat’ does is copy one character until it hits a null into another character array, starting at the terminating null. You can see now why having a null in the right place is so important.
There are a couple of really cool increment operator shortcuts in C that I absolutely love. In COBOL if you want to increment a variable in a loop for instance you can do it a couple of ways;
ADD 1 TO KOUNT
COMPUTE KOUNT = KOUNT + 1
ADD 1 TO KOUNT GIVING KOUNT
PERFORM VARYING KOUNT FROM 1 BY 1 UNTIL KOUNT >= MAX
In C you could say the following;
kount = kount + 1
kount += 1
kount ++
for (kount = 1; kount >= max; kount++)
The first example should be obvious, in the second example the += is a shortcut of the first example, it means to include the variable on the left side of the = sign in the computation on the right side. The third example means increment the value by 1. Our first two examples could have had any value in the addition, but the third one simply increments by one. The last example is identical to the last COBOL example. What is interesting is the last parameter ‘kount++’, by putting the ++ on the right side of the variable we are saying to increment the variable after the test is made as to weather it is greater than or equal to ‘max’. If the ++ is put to the left side, as in ‘++kount’, it means to increment BEFORE the test is made. This is the same as the PERFORM directive ‘WITH TEST BEFORE’ or ‘WITH TEST AFTER’. The – sign can be used to decrement in the exact same fashion, i.e.,= or –.
LOGICAL CONSTRUCTS
Hopefully everyone here is familiar with the COBOL PERFORM statement. It has more variations than I am willing to get into, but it is the only looping construct it has, unless you count using GO TO. C offers several different looping controls, there is;
while
do..while
for
Now COBOL nicely lumps the functionality of both the ‘while’ and ‘for’ loops into the PERFORM. The ‘do..while’ loop however is not explicitly the same, you can simulate it by controlling your variables correctly. In essence, the difference between ‘while’ and ‘do..while’ is that in the ‘do..while’ loop, it will ‘do’ the loop at least once since the test isn’t until AFTER the loop has been executed once. In the ‘while’ loop your test may not be valid the first time you do it so you may never actually go through the loop.
int i = 21;
while(i++ < 20) {
printf("%d\n",i);
}
/* the above line will display nothing since i is already greater than 20 */
do
{
printf("%d\n",i);
} while(i++ < 20);
/* the above line will display 21 */
Let’s talk about the ‘if’ statement, here is an example of how confusing a string comparison operation can be. In COBOL the following statement is very straight forward
IF STRING1 IS EQUAL TO STRING2 or IF STRING1 = STRING2
You cannot compare strings that way in C. There is a function in the ‘string.h’ header file that will compare two strings, however to get the same result it would have to be worded as follows;
if (!strncmp(string1,string2)) {
Let me explain, first off ‘strncmp’ is a string compare function, if string1 is less than string2 then a value less than zero will be returned (sort of like using Condition Code in COBOL). If string1 is equal to string2 then a value of zero is returned. If string1 is greater than string2 then a value greater than zero is returned. The problem is that if you use strncmp in an IF statement and the strings are equal then zero is returned, zero indicates that the IF statement is false, that is why we prefaced the ‘strncmp’ with ‘!’ which means not. This has the net result of returning a non-zero value, which is TRUE for the IF statement, if string1 and string2 are equal. This also further illustrates how logical expressions can be embedded in the ‘if’ statement.
I/O
C is blessed with having no I/O facilities built into the language at all. So how do you do any sort of terminal or file I/O? Fortunately somebody back in the dark ages of C programming wrote the Standard I/O header file. So if you want to do any I/O you must include <stdio.h>. I will talk about some of the more basic features and functions included in stdio.
The most commonly used functions from stdio are ‘printf’ and ‘scanf’. ‘printf’ is used to display information to STDLIST and ‘scanf’ is used to read information from STDIN. Both of these functions have extensive formatting capabilities included in their usage. COBOL is nice because you can use DISPLAY and ACCEPT to read or write virtually anything you want. You can’t however do type conversion or variable formatting in the statements themselves, you would have to declare a formatting variable in working storage first. While C gives very little to no capability for declaring formatting variables, it does give you extensive control over formatting your output. A simple example would be displaying an integer that has two decimal points.
01 EDIT-INT PIC ZZ9.99
01 MY-INT PIC S9(3)V99.
MOVE MY-INT TO EDIT-INT.
DISLAY EDIT-INT.
float my_int;
printf(“%3.2f\n”,my_int);
As you can see, it’s simpler to format in C, if somewhat less intuitive. The way ‘printf’ works is that it takes a literal and/or formatting string in the first parameter, which is the part inside the quotes. And then takes variable substitution the second parameter. Then \n means issue a new line at that point. Here is an example of embedded text;
printf("I have %d apples and %d oranges",count_apple, count_orange);
This would substitute ‘count_apple’ into the first parameter and ‘count_orange’ into the second. There are almost a dozen different formatters available in the ‘printf’.
Another interesting feature of ‘printf’ is it’s ability to do type conversion, for this example you need to know that %d means print an integer and %c means print a character;
printf("%c %d\n", 'A', 'A');
what do you think the output of this would be? Odds are you guessed wrong, you would see “A 65” because by specifying %d for the alpha character ‘A’ it would format it to the decimal ASCII code for ‘A’, which is 65.
Like ‘printf()’, ‘scanf()’ uses a control string followed by a list of arguments. The main difference is in the argument list. Printf() uses variable names, constants, and expressions. Scanf() uses pointers to variables. Fortunately, we don’t have to know anything about pointers to use the function. Just remember these two rules:
1. If you want to read a value for a basic variable type, precede the variable name with an &.
2. If you want to read a string variable, don’t use an &.
Here is a short example of displaying output and prompting for input:
main()
{
int age;
float assets;
char pet[30];
printf("enter your age, assets, and favorite pet.\n");
scanf("%d %f", &age, &assets);
scanf("%s", pet); /* no & for char array */
printf("%d $%.0f %s\n ", age, assets, pet);
}
Which would look something like this if you were to run it.
enter your age, assets, and favorite pet.
32
10507.32
penquin
32 $10507 penguin
An interesting point here is that the scanf() can read more than one variable in at a time, unlike the COBOL ACCEPT verb. One last point on printf(), the following 2 statements work identically:
printf("Enter your option ");
DISPLAY "Enter your option " NO ADVANCING.
As I mentioned earlier, C is geared towards single characters, not strings. So there is a whole set of functions in stdio that are geared towards reading and writing single characters. Since COBOL cares very little about how big an array you use to read or write I am not going to get into the specifics, just remember these four functions names;
getchar
putchar
getc
putc
I am going to cover just two more COBOL verb comparisons before I get into showing some small program shells. Two of my favorite verbs are STRING and UNSTRING, they are used to concatenate variables and literals and to parse strings based on user defined tokens. In general they are very easy to use, their C counterparts however aren’t. I will just run through the COBOL example and then show the exact same code in C.
01 FULL-NAME. 03 FN-FILE PIC X(08) VALUE "MYFILE". 03 FN-GROUP PIC X(08) VALUE "MYGROUP". 03 FN-ACCOUNT PIC X(08) VALUE "MYACCT". 01 WS-FULL-NAME PIC X(26) VALUE SPACES. PROCEDURE DIVISION. A1000-START. DISPLAY FULL-NAME. * displays "MYFILE MYGROUP MYACCT" STRING FN-FILE DELIMITED BY SPACES "." DELIMITED BY SIZE FN-GROUP DELIMITED BY SPACES "." DELIMITED BY SIZE FN-ACCT DELIMITED BY SPACES INTO WS-FULL-NAME. DISPLAY WS-FULL-NAME. * displays "MYFILE.MYGROUP.MYACCT" MOVE SPACES TO FULL-NAME. UNSTRING WS-FULL-NAME DELIMITED BY "." INTO FN-FILE FN-GROUP FN-ACCT. DISPLAY FULL-NAME. * displays "MYFILE MYGROUP MYACCT" #include <stdio> #include <string> main() { char ws_full_name[27]; struct full_file_name { char file[9]; char group[9]; char acct[9]; } fn; strcpy(fn.file,"MYFILE"); strcpy(fn.group,"MYGROUP"); strcpy(fn.acct,"MYACCT"); strcpy(ws_full_name,fn_file); strcat(ws_full_name,"."); strcat(ws_full_name,fn_group); strcat(ws_full_name,"."); strcat(ws_full_name,fn_acct); printf("%s\n",ws_full_name); /* displays "MYFILE.MYGROUP.MYACCT" */
You know, I don’t think I am going to show the equivalent of UNSTRING, it is just too confusing if you are just getting started with C. The function is called ‘strtok’ and it requires that you use pointers to strings, and since I didn’t really get into pointers at all I don’t want to confuse the issue. Keep in mind however that you MUST understand how pointers in C work or you will never be able to use the language effectively. It’s just that a full discussion of pointers is beyond the scope of this paper.
Anyway, in our C example we used the ‘strcpy’ command to copy a string into a variable. The function also adds the null terminator and essentially initializes our character array. Using the ‘strcat’ function, concatenates the string in the second parameter to the variable named in the first parameter. It looks for the null terminator and then starts writing the string onto it.
As you can see, it is more difficult and roundabout to deal with strings in C than COBOL. Now don’t get the wrong idea, I think C is great for some things, it’s just that if you are coding a standard business type application a lot of those things aren’t necessary.
CODE SKELETONS
OK, so just what do you have to do to write a program in C? As you know, COBOL has it’s four divisions that must be used;
IDENTIFICATION DIVISION
ENVIRONMENT DIVISION
DATA DIVISION
PROCEDURE DIVISION
Each of this has it’s own purpose in life as to what it describes. In C everything is based on the function. As you saw in my previous example I had ‘main()’ towards the top of my program. You can think of main() as the PROCEDURE DIVISION. It is a function just like everything else in your C program will be, and it is usually the only function you MUST have. I say usually because if you are writing a series of subroutines that are going to go into an RBM you don’t need to name function main().
So to write the classic ‘Hello World’ program in C, this is all you would need to do.
#include <stdio.h>
main();
{
printf("Hello World\n");
}
That is a lot less code than it would take under COBOL, but it’s still more than you would have to do in BASIC. The problem that I see here is that having to do the include of stdio.h just for the one printf() statement, will cause your program to be fairly large for something that is so trivial.
CODING FOR PORTABILITY OR SPEED
You hear a lot about how portable C is, and how you should code for portability. There are just so many things that change from platform to platform that it could be really tedious to code for portability. I know there are many toolkits that help with that, like Qt, but we’re just talking raw code here.
If you code to take advantage of your CPU’s native architecture you will see a great increase in speed and reduced code size. If you code for portability by using the languages native constructs you will have a larger, slower program. If you want the best of both worlds you will want to code your own intermediate include files that allow you to switch between architectures.
SUMMARY
So what do I conclude from all this? Before I tell you, let me tell you about a letter I read that someone had sent into to a Computer Language magazine. The gentleman was outraged that they had bothered to include COBOL in their magazine, and if they were going to do that they might as well include RPG as well since neither one was a real language. His criteria for a real language was pretty much comprised of the few things you can do in C that you can’t do in COBOL. Such as using pointers and declaring local variables. So why does this gentleman love a language so much that has no built in I/O facility, and deals with strings as an array of bytes?
Now this had to have been one of the more asinine statements I have ever seen. COBOL is one of the most popular languages for business applications, and while I don’t care for RPG, IBM has made sure that it has maintained a huge installed base as well. So why do I bother to mention this? Because as I said in my opener, there is still a ton of COBOL out there, and it is probably managing your paycheck and investments, so don’t be too quick to discount it.