Friday 22 November 2013

Refactoring

Clearly the business of opening a text file, reading through it line by line, and returning some result is going to be a general process.  Therefore let us factor out the general process from the specific application.  How about this:

The top function to process the file takes a file name, and some function to be applied to each line of the file, and a starting value to be threaded through the various lines of the file. We open the file and then call the sub-function that executes the loop, passing across the stream, and the function, and the starting value.  After this returns we get back the final value, so we can close the file and return our final value, so:

process_file(Filename, Fun, StartValue) ->
    {ok, Stream} = file:open(Filename, [read]),
    FinalValue = process_file_loop(Stream, Fun, StartValue),
    file:close(Stream),
    FinalValue.
 
Our function that executes the loop will require the stream and the function and an initial value.  It attempts to read a line but if no more is available it returns the value it was given, nothing more to be said.  However if it gets a line it can call the supplied function with the line and with the starting value, and it will get back some new value.  It can then call itself with this new value to continue the loop, like this:

process_file_loop(Stream, Fun, Value) ->
    case io:get_line(Stream, "") of
        eof ->
            Value;
        Line ->
            NewValue = Fun(Line, Value),
            process_file_loop(Stream, Fun, NewValue)
    end.

Right, that's the general case code.  In our specific application the function we want to apply is that bit of code that writes the line to the terminal with the line number and then increments the line number. We don't do anything with the line number in the end but it has to be threaded through.  We don't have to define this as a named function - with the syntax fun(...)->... we can create it as an anonymous function just when we need it.  So we set the process in motion by calling the process_file function passing the file name, the function we want to apply, and the starting value which is the line number 1.  So our main() function is this:


main([Filename]) ->
    process_file(Filename, 
         fun(Line, LineNumber)->
             io:format("~4.10.0B ~s", [LineNumber, Line]),
             LineNumber+1
         end,
         1).

Still works?

C:\Users\polly\Erlang>escript listout.erl hello.erl
0001
0002 main([]) ->
0003     io:format("Hello World");
0004 main([Arg]) ->
0005     io:format("Hello ~s", [Arg]).
C:\Users\polly\Erlang>

Yep.  Hey, I just did a lambda.  Thank you madam, your G&T is on its way.

Thursday 21 November 2013

Retaining a Value

Ok that's fine but now I want to print out a line number against those output lines.  Therefore the line number is an additional argument to the looper function:  set it to 1 on the first call from the main function, and then add one to it each time it is re-called from inside the looper.  Basic.  Like this:


main([Filename]) ->
    {ok, Stream} = file:open(Filename, [read]),
    listout_loop(Stream, 1),
    file:close(Stream).

listout_loop(Stream, LineNumber) ->
    case io:get_line(Stream, "") of
        eof ->
            ok;
        Line ->
            io:format("~b ~s", [LineNumber, Line]),
            listout_loop(Stream, LineNumber+1)
    end.

Speaking in rodent terms, then, in a procedural language you might keep the line number in a variable that you increment each time through the loop, squirrel-like, burying your treasure in the ground for easy access later: whereas in a functional language you keep your variable quantity balanced on your call parameters, hamster-like, running with your essentials stuffed in your cheeks.

So, just for the exercise, how would I return a result from the process?  Say I wanted to see the number of lines?  Well, when I hit eof I can return the LineNumber value - actually no, I will want LineNumber-1 because if I got eof then there was no line number LineNumber.  Then write this out at the end, no reason to except to prove I can do it.

Note incidentally that we don't close the file in the loop function when it gets the eof flag:  i.e. we don't say

    eof ->
        file:close(Stream).

By the Toybox Rule (If You Get It Out You Put It Away) it's the function that opens the file that should close it.


main([Filename]) ->
    {ok, Stream} = file:open(Filename, [read]),
    LineCount = listout_loop(Stream, 1),
    file:close(Stream),
    io:format("(~b lines)", [LineCount]).

listout_loop(Stream, LineNumber) ->
    case io:get_line(Stream, "") of
        eof ->
            LineNumber-1;
        Line ->
            io:format("~4.10.0B ~s", [LineNumber, Line]),
            listout_loop(Stream, LineNumber+1)
    end.

The mysterious format definition "~4.10.0B" requests a field four digits long in base 10 with 0 as the fill-up character showing the value of some Binary ie integer quantity.

So it looks like this:

C:\Users\polly\Erlang>escript listout.erl hello.erl
0001
0002 main([]) ->
0003     io:format("Hello World");
0004 main([Arg]) ->
0005     io:format("Hello ~s", [Arg]).
(5 lines)
C:\Users\polly\Erlang>

I admit I'm getting to like the idea that starting-lower-case identifiers are just atoms that mean themselves and can go enywhere with impunity.  Something has always told me that Nothing Of Importance Should Depend On The Case Of An Identifier but maybe the advantages justify breaking the rule.  These atoms do what symbols do in Lisp but in Lisp I have sometimes been caught out by forgetting whether I have quoted something or by quoting a quote.  Anyway.

Exercise: make the size of the line number field a parameter that can be passed in on the command line:-)

Wednesday 20 November 2013

Reading Through a File

Leaving behind the idea of reading a file from the standard input... let's open a file and read through it and list it out to the terminal.  We open a file for reading via the function file:open/2.  This module (file) is documented in the Kernel documentation not the Stdlib one.

The first version is like this:

main([Filename]) ->
    {ok, Stream} = file:open(Filename, [read]),
    listout_loop(Stream),
    file:close(Stream).

listout_loop(Stream) ->
    case io:get_line(Stream, "") of
eof ->
            ok;
Line ->
            io:format("~s", [Line]),
            listout_loop(Stream)
    end.

So.  The function file:open takes a file name and a list of options, of which we require just the option to read the file.  It returns an error indication, a possibility which I ignore here, or else a tuple containing the atom ok and a stream which I assign to the identifier Stream.  Then I pass this stream to the function listout_loop, which will be recursive in order to iterate through the file.

In my listout_loop I first call io:get_line, passing the Stream and also an empty string - this latter parameter would be a prompt string if I were getting input from the terminal not a file.  This get_line function now returns possibly an error which again I ignore or else more interestingly either the atom eof or a line of input text which here I assign to the identifier Line.

The construction with the case...of...end statement branches according to the results of the first expression.

If the return value is the atom eof then the function can exit, returning for form's sake the atom ok. Otherwise we print out our Line and the re-call the function to process another.

When the looper function returns of course we close the file.  Job done.

C:\Users\polly\Erlang>escript listout.erl hello.erl

main([]) ->
    io:format("Hello World");
main([Arg]) ->
    io:format("Hello ~s", [Arg]).

Tuesday 19 November 2013

Input from the standard input

Can you read the standard input stream in an escript?

Yes you can but the obvious script to copy a file:

main([]) ->
    copyLoop().

copyLoop() ->
    case io:get_chars("", 1) of
C ->
       io:put_chars(C),
       copyLoop();
eof ->
       ok
    end.

This does read and write, but does not halt when it gets to the end of a file that you pipe in.  Hmm.  I'm doing plenty of file reading but I'll open the files in code.

Thursday 14 November 2013

Temperature Table

In this example I've taken the temperature table from section 1.2 of Kernighan & Ritchie.  The objective is to write a table of temperatures in Fahrenheit and Centigrade, so we will need to write a loop.  In traditional functional programming style I'm going for some recursion.  I realise this is not the guru way to do this but I'm not a guru.

To write out the results I need another option in the format/2 function.  The placeholder ~6.1f writes a floating point number in a field 6 characters wide with 1 digit after the decimal place.  format/2 doesn't allow a float with no decimal places to I deviate here slightly from K&R.  Also we want the placeholder ~n which just adds a new line to the output.  The documentation  for the format command is in the file which in my machine is saved at C:/Program Files/erl5.10.2/lib/stdlib-1.19.2/doc/pdf/stflib-1.19.2.pdf

Hence it is nice to round up all the PDF files into a documents folder.

Incidentally, who thought of the option to "hide file extension on known file types" and, more importantly, why would anyone want to do this?

The conversion is calculated so:

   Celsius = (5.0/9.0)*(Fahr-32.0),

Now, identifier starting with an upper case character can be assigned values.  They are not variables because you can't vary them - for example you can't say

     Celsius = Celsius + 20.0.

They can be assigned values once like this:

     Celsius = 20.0

Or they can be assigned values when they are arguments to a function and  you call the function.  This is what puzzled me when I started reading up on functional programming - - if you can't re-assign a value how can you do any computation?  Answer: values get assigned when functions are called.

Identifiers that start with a lower case letter are atoms - these can be passed around willy nilly and they evaluate to themselves, they are just like symbols in Lisp.

Anyway the line to displays the result goes:

   io:format("~6.1f ~6.1f~n", [Fahr, Celsius]),

We want to loop up to some Upper temperature figure, adding some Step figure to our starting temperature each time.  To get the loop to work we put the important bits in a function called, say, tempTableLoop:

tempTableLoop(Fahr, Step, Upper) ->
   Celsius = (5.0/9.0)*(Fahr-32.0),
   io:format("~6.1f ~6.1f~n", [Fahr, Celsius])

and then to make the function repeat we can call it again, with the appropriate alteration to the first argument, the Fahrenheit figure:

   tempTableLoop(Fahr+Step, Step, Upper)

Obviously we also want the loop to stop, so check that we are still within the Upper limit by checking Fahr against Upper, in an if... expression:

    if 
Fahr =< Upper ->
       Celsius = (5.0/9.0)*(Fahr-32.0),
       io:format("~6.1f ~6.1f~n", [Fahr, Celsius]),
       tempTableLoop(Fahr+Step, Step, Upper);
    end.

The end statement there just closes the if contruction.  Also we need to provide the alternative branch for the if, for the case when we do finally go over the limit - we don't need to do anything so we just return the
atom ok.

Then our main/0 function can just start the loop function with the initial value, and the complete script looks like this:

%%blank, remember
main([]) ->
    tempTableLoop(0.0, 20.0, 300.0).

tempTableLoop(Fahr, Step, Upper) ->
    if 
Fahr =< Upper ->
       Celsius = (5.0/9.0)*(Fahr-32.0),
       io:format("~6.1f ~6.1f~n", [Fahr, Celsius]),
       tempTableLoop(Fahr+Step, Step, Upper);
true ->
       ok
    end.

The alternatives to the if statement are separated by semicolons.

I don't think you can define the looping function inside the main function as you might in Scheme say.

Anyway:

C:\Users\polly\Erlang>escript temperature.erl
   0.0  -17.8
  20.0   -6.7
  40.0    4.4
  60.0   15.6
  80.0   26.7
 100.0   37.8
 120.0   48.9
 140.0   60.0
 160.0   71.1
 180.0   82.2
 200.0   93.3
 220.0  104.4
 240.0  115.6
 260.0  126.7
 280.0  137.8
 300.0  148.9

Cool:-)

Monday 11 November 2013

Incidentally

I don't subscribe to the opinion, incidentally, that the purpose of a Hello World program is to illustrate the character of a programming language.  No, the point of Hello World is to remove as far as possible any foibles or characteristics peculiar to the language so that you can focus on understanding your development process irrespective of what you are developing.  Can I create a source, compile it, link it, load it, run it and find where the output went?  It's a kind of tracer bullet for your tool set, a barium meal for your development environment.

Friday 8 November 2013

Hello Again World

So let us start at the Place Where People Start.  Write a script in Erlang that will write the text "Hello World" to the terminal.

I call it a "script" because that seems  appropriate when the program is short and is being executed directly from the source code.

Our Hello World script looks like this:

main([]) ->
    io:format("Hello World").

This goes in a file called hello.erl.

Now, the first line of the script is left blank.  This is because escript considers this line to be reserved for system-specific commands to the shell to nominate the program that runs the script.  I'll leave this blank as I'm on Windows 7 here and that trick does not work.

The rest of the file consists of a definition of the function main().  I told you, it's like C.  The square brackets [] are a list in which the system will pass the arguments on the command line.  At first we'll take this to be empty and not use any arguments.  The symbol -> introduces the body of the definition.  This is a call to the format function in the io library, indicated with a colon thus io:format.  To this function we pass a string containing placeholders and a list of identifiers whose values will be slotted into the string - so it does what printf would do in C.  In this case we are just printing a single string, so no placeholders and no list of arguments.  The full top . terminates the function definition.

The function io:format is documented in the Stdlib document.  It returns just the atom ok so of course here we are using a function for its side-effects, that is to say writing text to the terminal.  You can call io:format three ways: the first way just pass a string, to be output:  the second way also pass a list
of values to substitute into the string; and the third way you first specify the output channel, which would otherwise default to standard output.  So these are all the same:

    io:format("Hello World").
    io:format("Hello World", []).
    io:format(standard_io, "Hello World", []).

In the Erlang jargon these would be described as format/1, format/2 and format/3.

When you execute the script from the command line with the escript command, escript executes the program starting from the main function.

So in Erlang jargon we would call this the function main/0, meaning it's called main and it takes no arguments.

You run this with the command escript like this:

C:\Users\polly\Erlang>escript hello.erl
Hello World

So what about passing arguments?  What happens if I add something to the command line?

C:\Users\polly\Erlang>escript hello.erl charlie
escript: exception error: {function_clause,[{local,main,[["charlie"]]}]}
  in function  escript:code_handler/4 (escript.erl, line 838)
  in call from erl_eval:local_func/5 (erl_eval.erl, line 467)
  in call from escript:interpret/4 (escript.erl, line 774)
  in call from escript:start/1 (escript.erl, line 277)
  in call from init:start_it/1 (init.erl, line 1054)
  in call from init:start_em/1 (init.erl, line 1034)

The escript returns a run time error message because although we have defined a function main/0 to process no arguments we have tried to call main/1, the same function  with one argument, which has not been defined.

Erlang tries a pattern match of the arguments it has in its hands and the possible sets of arguments to the main function and finding none that match it declares an error and halts processing.

So we can now allow for an argument to the function by adding a clause to our existing definition:  remove the full stop at the end and change this to a semicolon and we can add a clause for the case with an argument, which we call Arg:

main([]) ->
    io:format("Hello World");
main([Arg]) ->
    io:format("Hello ~s", [Arg]).

So now [Arg] is a list containing a single identifier Arg which will be set to the argument on the command line after the name of the script.

Then within the format() function we add a second argument, the list  containing [Arg], and add a placeholder ~s in the string to indicate where we want this to be slotted in to the string.

So now this will work with or without an extra argument:

C:\Users\polly\Erlang>escript hello.erl
Hello World
C:\Users\polly\Erlang>escript hello.erl charlie
Hello charlie
C:\Users\polly\Erlang>

Which of course now poses the question:

C:\Users\polly\Erlang>escript hello.erl curly larry mo
escript: exception error: {function_clause,[{local,main,[["curly","larry","mo"]]
}]}
  in function  escript:code_handler/4 (escript.erl, line 838)
  in call from erl_eval:local_func/5 (erl_eval.erl, line 467)
  in call from escript:interpret/4 (escript.erl, line 774)
  in call from escript:start/1 (escript.erl, line 277)
  in call from init:start_it/1 (init.erl, line 1054)
  in call from init:start_em/1 (init.erl, line 1034)

We've allowed for one argument but more than one is still an error.  No pattern match for it, you see.

So, we want the final option to take care of two or more items on the command line.  The expression [X|XS] is the Erlang code for a list whose first element is X and the rest of which is XS.  So we want to match against this as follows:

main([]) ->
    io:format("Hello World");
main([Arg]) ->
    io:format("Hello ~s", [Arg]);
main([Arg|More]) ->
    io:format("Hello ~s and~n", [Arg]),
    main(More).

Right, so here we have matched against [Arg|More] in our argument list for the main function.  Note that this doesn't just match the pattern - it doesn't just say, yes, your argument list matches the pattern [Arg|More] - it also assigns the parts of the argument list, the head and tail, to the identifiers you supply, all in one statement.  This I have to admit is neat, remembering that in Lisp I would first check that I had a list with a head and a tail and then if this were so go back and get the CAR and the CDR a couple of lines later.  Not so neat.

How to handle this case?  We write the first element Arg to the output and then re-call the main() function to process More, the rest of the argument list.  The first line here now ends with a comma, meaning there are further lines within this block, to be executed in sequence - - so the comma does what a PROGN would do in Lisp.

C:\Users\polly>escript hello.erl
Hello World
C:\Users\polly>escript hello.erl charlie
Hello charlie
C:\Users\polly>escript hello.erl curly larry moe
Hello curly and
Hello larry and
Hello moe
C:\Users\polly>

Thursday 7 November 2013

See How It Runs!

The Erlang compiler erlc compiles Erlang code into semicompiled code that runs on a virtual machine.  Each module of code occupies one file of source code with the extension .erl and compiles to byte code in a file with extension .beam.  The beam files are executed by a virtual machine under there somewhere.

You can load these into the Erlang Shell for evaluation.

However there is also a modest tool called escript that will execute a file of raw Erlang code if you write it up appropriately - with this you can try out a few short Erlang scripts, maybe open your Kernighan & Ritchie and imagine it's like C.

The REPL interface is fine for its purpose but I also like the DBM/JDI user interface (Don't Bother Me, Just Do It).

Escript will also execute compiled code if you set it out correctly. So someone could issue an Erlang application as a compiled escript file (which is where this is heading...).

Wednesday 6 November 2013

See! It lives!

Everything you need to start development with the Erlang language is available as a download from www.erlang.org at http://www.erlang.org/download.html. The download is 90MB ish. The download installs all the tools - compiler, REPL shell etc - and documentation and the "OTP" which provides comprehensive libraries including even code to build a database and create GUI applications. What they call a platform. Anyway they seem to have thought of everything.

BTW, OTP stands for Outlaw Techo Psychobitch - there is an excellent video presentation also available online (Erlang The Movie The Sequel) that clarifies the marketing decisions behind unexpectedly vivid nomenclature.

When it's installed you get a link to a REPL Shell and a link to the Documentation. The documentation link takes you to an HTML home page within the documentation with links to reference files and to a Quickstart Guide. The Documentation you get with the download is comprehensive. There are matching HTML and PDF versions of the documentation for each module. These are housed in the sub-folders for each module so I swept up all the PDF files in the whole Erlang installation and copied them into one central documents folder to make them easier to browse through. Personally I like documents in PDF best.

The Quick Start tells you how to write some trial functions in a file and load them into your Erlang Shell. A Quick Start is important to give the right first impressions. It needs to (a) be quick but more importantly it needs to (b) start. This one annoyed me at first because I ran the Windows Erlang shell from the default setup prompt. You can't load the example programs into that because you are not in the right folder. Hah!  You can use the cd() shell function to change your working folder but I didn't discover this until later. The Help option on the shell just shows you the version number. However there is a command help() in the shell that lists the commands, but I didn't find that until later either.

You can't define functions in the shell - - which some have complained about.  I don't think you can define functions in the Haskell shell either.  Best place for a function definition is in a separate little file anyway, surely?

Anyway the command to exit the shell is q(). The full stop at the end matters.  Better to run the command line Erlang shell with the command erl from the command line and then you can be in whatever folder you want:

C:\Users\polly\Erlang>erl
Eshell V5.10.2 (abort with ^G) 
1> 2+2. 

2> q(). 
ok 
3> 
C:\Users\polly\Erlang> 

See, Igor! It lives! It lives!