Friday, April 15, 2016

A Beginner's Super Simple Tutorial for GNU Make

(GNU) Make has been around for a very long time, has been superseded (as some may argue) by CMake, but it is still an incredibly useful tool for... for what? For developing code on the command line. Yes, that's right. We still use editors and terminals to develop code. Integrated "windowy" development environments (IDEs) are great, but often overkill. And it's really hard to run them when you are developing remotely on a supercomputer.

But even a hard-core terminal shell & editor developer needs some help for productivity. Make is a key productivity tool for developing code.

Back in the old days (say just after punch cards were no more; way before my time), most people would write code in one big source file and then would just invoke the compiler to build the executable. So your entire code would be in a single file. Not very convenient for editing. Certainly not convenient for working collaboratively. When codes got bigger, it just became impractical. Compile times became huge, since every time a single line changed in one routine, the entire code needed to be compiled.

So people started breaking up their code into pieces, each of them living in its own source file. That's natural, since code is usually broken down into some kind of logical parts (e.g., subroutines and functions in procedural programming, classes/objects in object-oriented programming). So it makes sense to put logical units of the code into their own source files. But how do you get your executable built from an array of source files? How do you prevent that not everything has to be recompiled all the time?

This is where Make comes in!

Let's assume we are working with Fortran 90+ here and compile things using gfortran (if you like C better, just replace everything Fortran specific with C; or with whateva language you like). Let's say you have three source files.

main_program.F90
subroutine_one.F90
subroutine_two.F90

Note that I am using upper case F90, instead the more common f90. The upper case tells the compiler to pre-process the source files before compiling them. In this way, we can use preprocessor directives, see here if you are interested: https://en.wikipedia.org/wiki/Directive_(programming). We won't use these directives here, but perhaps I'll write a post about them in the future.

So that you can follow along easily, let's actually write these routines:

main_program.F90:

program myprogram

  implicit none

  call subroutine_one
  call subroutine_two

end program myprogram

subroutine_one.F90:

subroutine subroutine_one
  implicit none
  write(*,*) "Hi, I am subroutine one"
end subroutine_one

subroutine_two.F90: 

subroutine subroutine_two
  implicit none
  write(*,*) "Hi, I am subroutine two"
end subroutine_two

Naively, you could build your program this way:
gfortran -g -O3 -o myprogram main_program.F90 subroutine_one.F90 subroutine_two.F90
So every single time you compile, everything gets compiled. This produces an executable called myprogram. By the way, the options I am using in the compile line are: "-g" -- add debug symbols and "-O3" -- optimize aggressively (not really needed here, but I do it out of habit (I want fast code!)).

Recompiling is of course no problem if the routines don't do anything of substance as in our example. But let's for a moment assume that each of these routines does something incredibly complicated that takes minutes or longer to compile. You don't want to have to recompile everything if you made a change in only one of the source files. The trick to avoid this is to first build object files (file suffix ".o"), then to link them together and only remake those whose source files changed.

Let Make take care of figuring out what changed. You tell Make what to do by writing a makefile in which you specify what you want to make and what it depends on. Here is the makefile for our example. Call it Makefile, then Make will automatically find it when you type make on the command line. Here is what will be in your makefile:
FC=gfortran
FCFLAGS=-g -O3

myprogram: main_program.o subroutine_one.o subroutine_two.o
     $(FC) $(FCFLAGS) -o myprogram main_program.o subroutine_one.o subroutine_two.o

main_program.o: main_program.F90
     $(FC) $(FCFLAGS) -c main_program.F90

subroutine_one.o: subroutine_one.F90
     $(FC) $(FCFLAGS) -c subroutine_one.F90

subroutine_two.o: subroutine_two.F90
     $(FC) $(FCFLAGS) -c subroutine_two.F90

clean:
     rm -f *.o

There is a very clear structure to this. The first two lines are definitions of variables you are using later -- the compiler and the compiler flags (you could call these variables anything you want). Then you have lines that start with the target on the left, followed by a colon. After the colon, you list the things that the target depends on (the dependencies). This tells Make what needs to be done before working on the current target. So in our case, myprogram depends on a bunch of object files. The rules for making these sub-targets are specified subsequently. One important thing to remember: In the line where you tell Make what to do for a given target (once the dependencies are met and up-to-date), the first character on that line has to be a tabulator character (i.e., you have to hit your "tab" key). Only this gives the actual command the correct indentation (that's something historic, I guess). But clearly this is not a show stopper!

Note the "clean" target -- I added this so that you can easily clean things up.

So, for example, let's say you want to compile your code, but you want to compile it freshly from scratch. The first thing you do is say (on the command line) make clean. This will get rid of all previously compiled object files. Next you just type make and Make will compile and link everything for you! Now you go and work on one of the source files. You save it and then you want to recompile. Just type make. This time only the source file that you changed is recompiled and then linked together with the other object files to make the executable. Voilá!

I hope you see the benefits of using Make now! Make is of course much more powerful than this simple example can tell. You can read more about more advanced uses of Make here and here and here and at many other places that your favorite search engine will find for you.

Before I close, there is one more thing I want to mention: Make is also great for compiling LaTeX documents, in particular if you are using cross-referencing and a bibliography software like BibTeX.
Here is an example makefile for compiling a LaTeX document called paper.tex to paper.pdf with BibTeX. I assume that you have your BibTeX bibliography entries in a file called paper.bib:
paper.pdf: paper.tex paper.bib
    pdflatex paper.tex
    bibtex paper
    pdflatex paper.tex
    pdflatex paper.tex
This should work. Enjoy!

Saturday, March 28, 2015

A super simple Introduction to version control with Git for busy Astrophysicists

Welcome, collaborator! You may be reading this, because you, I, and others are working on some kind of joint document. It could be a manuscript to be submitted to a journal, some kind of vision document, or (as is often the case) a proposal.

Sending around LaTeX files via email is so 1990s. Let's not do this. Let's also not use Dropbox. Dropbox is great for syncing files between computers and serves as a great backup solution. But it's terrible for collaborating on text documents -- it can't handle conflicts (two people editing the same file at the same time creates a second copy of the file in the dropbox; Dropbox won't merge text for you).

Let's move on and use version control.  It will make working together on the same document infinitely easier and will minimize the time we have to deal with merging different versions of the file we are working on together. There are multiple version control packages. For our project we will use git. You can learn more about git and version control in general on http://git-scm.com. If you are unconvinced, read this: Why use a Version Control System?

Here is how it works:

(1) Preparations
  1. Make sure you have git installed on your computer (I presume you have either a Mac or a PC that runs a flavor of Linux). Just open a terminal, type git, hit enter, and see if the command is found. If not, you need to install git.

  2. Make and send me a password-protected public ssh key as an email attachment (not cut & paste!). The standard path and file name for this are ~/.ssh/id_rsa.pub or ~/.ssh/id_dsa.pub. If you don't have such a key or you don't remember if it's password protected (or you dont remember your password...), type on the command line:
    ssh-keygen -t rsa
    
    and follow the instructions. By all means, give your key a strong password. Then send me the file ~/.ssh/id_rsa.pub as an email attachment (do not cut & paste it into the email body) 

(2) Cloning the repository from the server
By now I have told you details about the repository and sent you the command needed to clone it from the server. Go into your preferred working directory (that's where you keep projects, papers, proposals etc.). Type:
git clone [GITUSER]@[MACHINE HOSTNAME]:[REPOSITORY NAME] 
This will clone the repo -- in other words, it will create a directory with the repository name in your working directory and will download the contents of the repo into it. If this does not happen and instead you are asked to enter a password, then your ssh key is not loaded. In this case, load your ssh key like this:
ssh-add ~/.ssh/id_rsa
Then try again. If this still does not do the trick or if you get an error message about no "ssh-agent" running, then try
ssh-agent
This will spit out a few lines of text. Copy them and paste them back into the prompt, hit enter, then repeat
ssh-add ~/.ssh/id_rsa
This should do the trick!

(3) Pulling in changes.
Before starting to work on a section of the document, you need to make sure to get the most recent changes. You must absolutely always do this before starting to work! Go into the repository directory.
git pull -r 
(the "-r" is to make the pull a "rebase" -- don't worry about the meaning of this, just do it).  

(4) Committing and pushing changes.
Whenever you have completed a significant addition or change to the document -- this may be a completed or edited paragraph, something that takes no longer than about 30 minutes of work, you need to commit and push your changes so that others can pull them onto their computers.
git add [FILE YOU WERE EDITING]
git commit -m "Your commit message goes here"
git pull -r           # this is to pull in any changes by others!
git push
Please give sensible commit messages so that we others know what you changed!

(5) Looking at the history of changes and checking the status of your repo.
That's very easy! First pull in all recent changes via git pull -r. Then use git log to browse through the repository history. Try git status to see what the status of your repo is. This will tell you which files have changed and should be committed. It will also tell you which files you have not yet added (i.e. "untracked" files).

(6) Golden Rules
Working with version controlled documents is easy and painless as long as you obey two simple rules:
  1. Always git pull -r before starting to edit.
  2.  Extremely frequently:
    git add [FILE]
    git commit -m "[COMMIT MESSAGE]"
    git pull -r
    git push
If you stick to these rules, working collaboratively on the same document will be easy and conflicts will be rare.

(7) Adding additional files to the repo
That's easy!
git add [FILENAME]
git commit -m "Added file [FILENAME] to repo"
git pull -r
git push
(8) Resolving merge conflicts
Oh boy. Somebody was editing the same portion of text you were editing (did you perhaps not sufficiently frequently commit and push?!?). You now need to resolve the conflict. Let's do this by example. Let's say you were editing a file called Readme.md. You added, committed, then pulled and ended up with the following git merge error:
remote: Counting objects: 5, done.
remote: Total 3 (delta 0), reused 3 (delta 0) Unpacking objects: 100% (3/3), done.
From github.com:hypercott/ay190test
23ae277..b13d53f master -> origin/master Auto-merging README.md
CONFLICT (content): Merge conflict in README.md
Automatic merge failed; fix conflicts and then commit the result.
Good luck! You now need to resolve the conflict. Let's look at the (hypothetical) file in question.
$ cat README.md 
ay190test 
========= 
<<<<<<< HEAD 
Test345 
=======
Test3
>>>>>>> b13d53f9cf6a8fe5b58e3c9c103f1dab84026161
git has conveniently inserted information on what the conflict is. The stuff right beneath <<<<<< HEAD is what is currently in the remote repository and the stuff just below the ======= is what we have locally. Let’s say you know your local change is correct. Then you edit the file and remove the remote stuff and all conflict stuff around it.
$ cat README.md 
ay190test 
=========
Test3
Next you git add and git commit the file, then you execute git push to update the remote with your local change. This resolves the conflict and all is good!