I haven’t posted in along time, and the reason is that I’ve been trying to get a production run going for the research on HPCVL (big computing cluster) and it’s always trying to port your code to another machine with a different compiler. So here’s what happened.
Three weeks ago we decided to get a big MCMC run going on HPCVL, which required me to compile code on a Solaris machine using the Sun compiler. My programs use automake, which means I had lots of fun figuring out how to configure and install them into my home directory on HPCVL (aha! ‘prefix’ flag required!), and had my memory refreshed on changing environment variables/linking libraries many times. So after running configure, ‘make’ choked in several different places, not all of which seemed logical:
- Problem: CC can’t figure out what ’sqrt’, ’sin’, etc mean. Fine, I need a #include <cmath> statement in a header file. Makes sense, but it’s still not clear to me why it compiles on my computer at Stirling. Possibly something to do with gcc 3.2, because it won’t compile without the aforementioned directive on my home computer with gcc 4. In any case, that’s an easy fix.
- Problem: There’s some kind of scope problem involving trying to define a member of a class definition and putting an extraneous scope operator. I think. I still have no idea what the error message meant, but removing the extraneous scope operator made it go away. Good enough.
- Problem: I can’t compile with the cosmology header file as it is because the compiler won’t accept initializing const static int (the Hubble constant) members in the ‘protected:’ section of a class definition. Since this pushes the limits of my knowledge of C++, I try to initialize them in random places elsewhere in the class file, notably under where it says ‘public:’, using a constructor, keeping the original declarations in place. This works, in the sense that the original compiler error message is gone and I now get a different one. This time the error occurred in the linking stage, with one of those horrific error message that looks like a long string of random letters and numbers that says absolutely nothing helpful, as opposed to the cryptic compilation error message I got earlier that at least told me what line the error was on. It’s complaining about doubly declared variables or something like that, and I notice that the end of the long string of meaningless characters, the names of the rogue const static ints are appended. This tells me that the problem remains with the initialization of these members, but nothing else. After multiple ham-handed attempts at working around the problems using all sorts of syntactical gymnastics, I finally declared them as global variables. Problem solved.
- After finally getting everything to compile correctly, I tried a few things. Notice problems with the calculation of the rotation curve and density profiles of sample systems. The rotation curve problem is easy, because somehow the wrong calculation for the tangential velocity was included in the original source (long since fixed on my computer here). The density is slightly more difficult, because, for no clear reason, it calculates the density profile correctly, and then spits out the wrong value. More specifically, it gets the correct density values while in the for loop that loops through each bin (which I checked by outputting the density in every bin within the for loop), but outputs the wrong values immediately on exiting the for loop (which i checked by outputting the density values in a separate for loop). I still haven’t figured that one out. I could just output the density within the original for loop but I would like to know why it chokes after leaving the original loop.
So the analysis programs (more or less) work, or seem to. Compiling the N-body code itself goes smoothly and seems to work fine as well. Now to compile the galaxy-building programs.
- There is no excuse in this day and age to be forced to submit to a 72-character per line limit (really 66 character because you can’t use the first six columns except for special cases) like fortran 77 requires. So I have to change a flag in the makefile. Fine. Except the Sun compiler’s flag can only handle up to 132 characters per line. I shouldn’t be making lines that long; fair enough.
- After compilation, I compare two of the same models generated on HPCVL and on my computer here at Stirling. Slight discrepancy in the central potential. Hmm. Larry suggests doing some detective work to uncover the cause. After multiple write(*,*) statements, I discover that tiny roundoff errors on each machine can lead to big differences when they’re used as arguments to logs and are really close to zero. Shouldn’t be too big a deal.
- I then try building a galaxy. Everything goes (more or less) well. Analyse the model, density still doesn’t work. Phooey. But I’ll manage. Everything else seems okay. So after getting a trial MCMC run started, I did a little more inspecting of the resulting models. I look at one model, generate the disk, bulge and halo, and work out the rotation curve. I notice a slight anomaly: the halo rotation is not being calculated correctly: it’s too low. Output the mass as a function of radius. Only then do I discover, to my horror, that the halo has giant hole in the centre. Everything else is correct – in particular, the total mass and tidal radius are correct. It’s not distributing the mass correctly. There’s just nothing that ends up in the centre. Nada. I try changing one halo parameter. This time, everything is fine. I try changing the whole parameter set. Everything is fine. Okay. So what’s going wrong? Larry thinks it’s a bug in the interpolation routines that would take a month to find. Since we can’t control where a MCMC chain goes in parameter space, there’s no telling if it’ll find one of these regions where the bug will manifest itself. What a mess. I’m toying with the idea of spending my Christmas holidays rewriting all this code in C. But for now we’ll do the MCMC run on my regular computer and use HPCVL for the simulations.
Now I’m trying to figure out how to account for the asymmetric drift in the galaxy we’re trying to model. Which, you may notice, is a problem that has actual physics involved. It’s nice to have one of those once in a while – cuz, ya know, I am technically an astrophysicist and not a programmer.