JOSHUA HOLBROOK github : twitter

Why COBOL is (almost) worth learning (posted 04 Feb 2012)

Every once in a while, I decide to dive into something obscure. Y’know, for the lulz. A few days ago, I decided to take a look at cobol. After all, opencobol makes it pretty easy to run cobol scripts.

I have to admit though, I was surprised by cobol. I can’t say it’d be my first choice for a non-lulz project, but was almost surprisingly innovative considering its pedigree.

First, cobol is a relatively small language (here’s an almost painfully complete programmer’s guide). Despite being compiled (at least in this case), cobol almost feels like a scripting language, on par with basic. Part of this could be due to the heavy use of keywords, in what seems like a naive effort to make the language more readable (modern examples include inform 7), but I believe a lot of it is due to its unified data model. This data model is simultaneously cobol’s greatest insight, and yet its greatest downfall.

A little bit of history

As we’ve all probably heard, cobol was basically invented by Grace Hopper a really long time ago. In fact, the first official cobol standard was developed in [Fall 1959]. As it turns out, cobol is one of the oldest higher-level languages:

Each of these languages pioneered in different fields. Fortran is best for fast computation, whereas lisp is good for dealing with the sort of problems that tickle a hacker’s brain. Cobol, despite its Milton-esque reputation, was the earliest way to tackle the sort of problems that “CRUD apps” have to handle every day, and in fact its influence is still being felt in the sexiest of systems architectures today.

Tables in COBOL

Since day one, one of cobol’s main uses was handling persistent data. In the beginning, this meant reading and writing to a flat file.

Cobol has a special syntax for for defining “buffers” that looks something like this:

      *> Think of this as a bytes buffer.
       01 SomeString.
         03 SomeSubstring pic x(4) value 'foo '.
         03 SomeOtherSubstring pic x(7) value 'bar baz'.

Conceptually, you end up with a buffer here that looks like:

[foo bar baz]
  |     |_____SomeOtherSubstring

This method of describing buffers is used univerally, with a similar interface for all of the following:

  • variables
  • text files
  • the computer screen
  • ISAM indexed files (think berkeley db)

Databases and COBOL

So, some of the first applications written in cobol basically used these tables, stored in files to varying levels of sophistication, to store data. In fact, the same people that defined cobol also worked on some of the first databases. This work largely preceded relational databases, but it was with cobol that the concepts of not just databases but data description languages and data query languages were developed.

In a sense, cobol’s ISAM-based tables form some kind of reverse ORM.


This exact same mechanism was also used to build user interfaces. For instance, here’s a quick and dirty html templating example I wrote so I could figure out how all this stuff works:

      *> ***************************************************************
      *> Holy shit cobol totally had templates
      *> ***************************************************************
       identification division.
       program-id. template.

       environment division.
       configuration section.

       input-output section.

       data division.
      *> file section.
      *> fd .
      *>     01 .

       working-storage section.
      *> Cobol allows you to map variable names to substrings. This is used
      *> (abused) to build up GUIs and such, and *also* to modify text in
      *> predictable locations of text files.
       01 HtmlTemplate.
          03 L1 pic X(6)          value '<html>'.
          03 L2 pic X(9)          value '  <head>'.
          03 L3 pic X(11)         value '    <title>'.
          03 L4.
            05 filler pic X(6)    value spaces.
            05 PageTitleX pic X(40).
          03 L5 pic X(12)         value '    </title>'.
          03 L6 pic X(10)         value '  </head>'.
          03 L7 pic X(8)          value '  <body>'.
          03 L8.
            05 filler pic x(8)    value '    <h1>'.
            05 PageTextX pic x(40).
          03 L9  pic x(9)         value '    </h1>'.
          03 L10 pic x(9)         value '  </body>'.
          03 L11 pic x(7)         value '</html>'.

      *> Here's some variables!
       01 PageTitle pic x(22) value 'This is a page title.'.
       01 PageText  pic x(23) value 'This is some page text!'.

       procedure division.

      *> Just copy the buffers over, right?
         move PageTitle to PageTitleX.
         move PageText to PageTextX.

      *> There's a better, more idiomatic way to do this, but I haven't quite
      *> figured it out yet. C'est la vie.
         display L1.
         display L2.
         display L3.
         display L4.
         display L5.
         display L6.
         display L7.
         display L8.
         display L9.
         display L10.
         display L11.

       end program template.

Using a similar approach, one may also build those text user interfaces we still see on old cash registers from time to time.

Basically, Cobol On Cogs should exist.

In many ways, programming for the web today isn’t all that different from it was in cobol, conceptually. Of course, anything you can do in cobol, you can probably do better in [perl|python|ruby|javascript|*]. That said, with a few targeted open source libraries cobol could probably feel almost modern:

  • Throw some MVC on that shit. Due to the ease at which cobol can be copypasted together, cobol tends to develop into spaghetti very quickly. Some sort of heavy-handed framework for code organization would probably help things a lot.
  • Build a templating edsl. I suspect that you could build something like a hacky version of haml using copybooks. You have to admit that, even if it sucked, it would be totally badass for just existing.
  • Get your cgi on. OpenCobol actually already supports cgi, and fans are working on this aspect already. An alternate approach would be to call cobc-generated .so libraries from c or similar.