Sunday, August 28, 2011

YouTube Lecture: Novel Enzymes, Rapid Structure Determination, and an Online Computer Game

This is a very interesting video about the general field of computational protein structure prediction - the field in which I am involved. It's a talk by Professor David Baker, Dept. of Biochemistry, University of Washington - probably better known as the guy behind stuff like ROSETTA and FoldIt.

 

Thursday, August 25, 2011

Installing MOPAC2009 on Ubuntu 10.04 LTS - Lucid Lynx (And keep source codes open!)

If you try and install MOPAC2009 on a standard Ubuntu 10.04 LTS system, you will get a very weird problem. I unzipped MOPAC2009 in /opt/mopac as described in the installation guide. However, when I tried to run the executable I got a very odd message:

-bash: ./MOPAC2009.exe: No such file or directory

How odd is that? I could see the executable with ls and I even tabbed my way to the full filename. Unrelated to the actual error message, it turns out, that the MOPAC2009.exe executable is compiled as 32-bit, and not 64-bit as you'd expect in 2011, and it is actually here the problem lies - not a missing file! First, let me give the remedy, then my rant. You need to install a set of 32-bit compatible libraries that will let 32-bit code run. Since we're running Ubuntu, this is of course easy as pie!

sudo apt-get install ia32-libs

But very odd, that I get a complaint saying "No such file or directory"! Really! I was going nuts and had started blaming the NFS file system. A big thanks to my favorite web-site ubuntuforums.org, where I found other possible causes for missing files, other than actual missing files.


However, the take home message of my post here really is: Start distributing source codes! I'm sure it's possible for users to compile Mopac (written in Fortran 90/95), if a decent Makefile is included. This would be the only drawback in distributing source codes. Another thing is, that binaries makes it impossible to optimize code at compiletime and link to optimal BLAS routines, etc.
 But these are not the big issues here. The things that really matter are: (1) You will never discover bugs from binaries and (2) you will never know what exact setting your algorithms are using from binaries. What if you needed to slightly tweak a parameter somewhere? What is you get unexpected results, and you know the input is good? You should of course turn to the source code! I have myself found bugs/quirks in both Dalton and GAMESS.

I had a Dalton calculation that kept crashing, and found a coefficient which was mis-typed. I quickly contacted the authors and things were solved SAME DAY, and the fix lives on in the new DALTON2011!
In GAMESS (and this is an even better story!) I discovered how to have semi-empirical methods run in parallel after snooping around in the code for a couple of days, looking for the Fock-matrix diagonalization routines. Turns out, that the only thing that prevents parallel semi-empirical methods is a flag in the code - remove that and voila!

Unfortunately cheerful stories as these will never happen to programs such as MOPAC. And I'm willing to bet money, that Mr. MOPAC would really prefer to receive a bug report along with a code patch, rather than just the bug reports.

MOPAC2009 is even free for academic use, and Jimmy Stewart (AKA Mr. MOPAC) has even been helpful in getting us started with a GAMESS implementation of his newest PM6 method. Thanks Jimmy!

UPDATE: Another thing concerning installation of MOPAC2009. The installation guide recommends using 777 permission for the installation. This is dangerous, and means that EVERYONE has write permissions to that folder. Stick with 755, which only gives the owner write permission - otherwise that file is an easy target for a hacker. 777 - bad, unless really necessary !

Tuesday, August 23, 2011

Debian Clusters for Education and Research: The Missing Manual

This is the recommendation of a great site I just found. I wish I had found it earlier! I have just been setting up a research/education cluster at Center for Molecular Computational Sciences at KU, with everything from NIS to NFS and Torque and even Sun Ray terminals.

This great site guides you through everything you need, in order to have a full, working cluster - only using free software. The author calls it "The Missing Manual", and I couldn't agree more!

LINK:


Friday, August 19, 2011

My Talk at the COMS Seminar August 18, 2011.

Here are the slides and videos I showed at my recent talk at a COMS seminar. COMS is the virtual Center for Comutational Molecular Sciences at KU. I'm using youtube to share the videos. The videos were created using the excellent PyMOL MovieSchool. Remember to have FreeMOL installed when you wish to save the movie as an MPEG video file. I'm using the public folder in my DropBox to share the slides. Couldn't be much easier ..

... or could it be any easier? In my next presentation I will include a link to this blog using a QR code on the last slide like this (created using http://qrcode.kaywa.com/ - for free):



 Towards protein structures that agree
with spectroscopic data

Chemical shifts assisted protein structure refinement

Download the slides from my talk here:


Protein G MC simulations - OPLS/AA with NMR restraints





Top: OPLS/AA + CamShift 1.35
Bottom: OPLS/AA + H(N) Chemical Shifts