Key Bioinformatics Computer Skills

I’ve been asked several times about which computer skills are critical for bioinformatics. Important – note that I am just addressing the “computer skills” side of things here. This is my list for being a functional, comfortable bioinformatician.

  1. SQL and knowledge of databases. I always recommend that people start with MySQL, because it is crossplatform, very popular, and extremely well developed.
  2. Perl or Python. Preferably perl. It kills me to write this, because I like python so much more than perl, but from a “getting the most useful skills” perspective, I think you have to choose perl.
  3. basic Linux. Actually, being at a semi-sys admin level is even better. I always tell people to go “cold turkey” and just install Linux on their computer and commit to using it exclusively for a while. (Due to OpenOffice etc, this should be mostly doable these days). This will force a person to get comfortable. Learning to use a Mac from the command line is an ok second option, as is Solaris etc. Still, I’d have to say Linux would be preferred.
  4. basic bash shell scripting. There are still too many cases where this ends up being “just the thing to do”. And of course, this all applies to Mac.
  5. Some experience with Java or other “traditional languages” or a real understanding of  modern programming paradigms. This may seem lame or vague. But it is important to understand how traditional programming languages approach problems. At minimum, this ensures some exposure to concepts like object-oriented programming, functional programming, libraries, etc. I know that one can get all of this with python and, yes, even perl – but I fear that many bioinformatics people get away without knowing these things to their detriment.
  6. R + Bioconductor. So many great packages in Bioconductor. Comfort with R can solve a lot of problems quickly. R is only growing; if I could buy stock in R, I would!

This may seem like a lot, but many of these items fit together very well. For example, one could go “cold turkey” and just use Linux and commit to doing bioinformatics by using a combination of R, perl and shell scripting, and an SQL-based database (MySQL). It is very common in bioinformatics to link these pieces, so… not so bad, in the end, I think.

As always, comments welcome…

Advertisements

2 Responses

  1. See also http://shirleywho.wordpress.com/2009/02/11/tips-and-tricks-for-software-engineering-in-bioinformatics-talk-by-joel-dudley/

    My own two cents;

    2. I disagree. Don’t start with Perl. Start with Python or Ruby. I am biased, but Perl is on its way out. We’re getting tired of not being able to read other’s (and our own) code.

    3 & 4. http://showmedo.com/videotutorials/series?name=pQZLHo5Df

    5. I would emphasize learning programming to make use of distributed systems (i.e., functional programming) over low-level programming of “what’s a pointer?”

    Critically missing:

    * Use a version control system, and learn to use it effectively. If you don’t already know one already, skip straight ahead to using Git, Mercurial, or Bazaar VCS. If you’re stuck using Subversion, here’s a tutorial.: http://showmedo.com/videotutorials/series?name=bfNi2X3Xg

    * Learn and make use of unit testing. Research should be reproducible, and that includes computer programs.

  2. Thanks, Chris, for the interesting comments. I’ll respond here point by point.

    1. Thanks for the reference to “shirleywho” blog entry on a very similar topic. I encourage readers to look at it.

    2. Why did I say perl? Perl, right now, seems more universally used in bioinformatics than Python and certainly more than Ruby. So there is the “popularity issue” – especially if someone wants to move quickly into a job. Second, a frequent criticism of perl is that it is harder to understand (“syntax noise”) than python or ruby. Ironically, this – to me – indicates that it is important to learn first.
    As for Ruby – well, I think Ruby is cool, but… do you really think that telling someone to become focused on Ruby is good career advice in bioinformatics? I don’t.

    5. (your number 5). I wrote essentially “learn Java or something like it”. Given that perl/python/R/bash is pretty much all I need, why would I argue for Java? First, a lot of bioinformatics programming does use Java. Knowing some really helps, then. Second, as of two years ago, I knew of a large company which used Java for all its bioinformatics. At least having some exposure to Java would be critical. Third, well, I think a knowledge of things like pointers does help.
    But, yeah, there is room to argue here.

    Your suggestions:
    I agree entirely that version control systems and unit testing are essential for good software engineering. I’ll have to write another post on this sometime. I would stress, though, that your comment that “Research should be reproducible, and that includes computer programs.” while absolutely correct, does not reflect the field’s view of things. As long as you can offer the software/website, most journals seem pretty happy.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: