2009 post: Key Bioinformatics Computer Skills

Note: this was written in 2009 so… out of date somewhat!

I’ve been asked several times about which computer skills are critical for bioinformatics. Important – note that I am just addressing the “computer skills” side of things here. This is my list for being a functional, comfortable bioinformatician.

  1. SQL and knowledge of databases. I always recommend that people start with MySQL, because it is crossplatform, very popular, and extremely well developed.
  2. Perl or Python. Python wins now! (2017 update!)  Preferably perl. It kills me to write this, because I like python so much more than perl, but from a “getting the most useful skills” perspective, I think you have to choose perl.
  3. basic Linux. Actually, being at a semi-sys admin level is even better. I always tell people to go “cold turkey” and just install Linux on their computer and commit to using it exclusively for a while. (Due to OpenOffice etc, this should be mostly doable these days). This will force a person to get comfortable. Learning to use a Mac from the command line is an ok second option, as is Solaris etc. Still, I’d have to say Linux would be preferred.
  4. basic bash shell scripting. There are still too many cases where this ends up being “just the thing to do”. And of course, this all applies to Mac.
  5. Some experience with Java or other “traditional languages” or a real understanding of  modern programming paradigms. This may seem lame or vague. But it is important to understand how traditional programming languages approach problems. At minimum, this ensures some exposure to concepts like object-oriented programming, functional programming, libraries, etc. I know that one can get all of this with python and, yes, even perl – but I fear that some many bioinformatics people get away without knowing these things to their detriment.
  6. R + Bioconductor. So many great packages in Bioconductor. Comfort with R can solve a lot of problems quickly. R is only growing; if I could buy stock in R, I would!

This may seem like a lot, but many of these items fit together very well. For example, one could go “cold turkey” and just use Linux and commit to doing bioinformatics by using a combination of R, perl and shell scripting, and an SQL-based database (MySQL). It is very common in bioinformatics to link these pieces, so… not so bad, in the end, I think.

As always, comments welcome…

Advertisements

2008 post: Python for Perl Programmers (and Bioinformatics people)

Uh, this is so old that it should be skipped, I think… I’m keeping it up for archival sake

keywords:

Mark Bieda python getting started quick tips hints tutorial

I wanted to write a short post about getting started in python.

What you will like about Python as a perl person:
(1) A great thing is the interpreter. This will allow really rapid learning of python. For a perl person, python should come really fast. I was very, very surprised at how quickly I was writing actually useful (not toy) programs to manipulate things.
(2) It is easy to install in windows and has a decent editor/run environment (IDLE). Python is now a standard part of Linux distros, except for the smallest ones (perl is everywhere, so an advantage to perl here, but only a small one).

Some key things:
(1) The online manuals for python are good (but maybe not great). The Guido tutorial is key; make sure that you get the latest one.
(2) If you like to have a book on the python around (I always do for my programming language du jour), then make sure that you have the most recent one.
(3) Why the emphasis on the most recent? Python has added key new features in recent times – like even since version 2.4! So make sure that you have the latest documentation.

Installation and Usage:
(1) For windows people, use the IDLE editor. Really. You will find it very easy to use and efficient. It comes in the download, so no installation deal.
(2) To learn python really fast, just play with commands in the interpreter window. It really is easy and efficient – a very quick way to get up to speed on things.

Some key things for bioinformatics people, in particular:
(1) Sets. Sets are very nice. Intersection, union… all that stuff that you want to use.
(2) A lot of string manipulation functions (actually methods, technically) are available. These will do a lot of what you would do with regular expressions, but see the next point.
(3) Unfortunately, regular expressions are in an external (but standard library) and are a bit different from perl in usage/implementation.
(4) Like perl, the built-in sorting in python is weird (and annoying to set up to do anything beyond simple), but very useful. Again, here, make sure that you look at the latest documentation.
(5) Sqlite library is now part of the standard package. I haven’t used it yet as part of python – but given that this is a standard part of the distribution, it seems like I could write code that uses it and not worry about portability issues. This is well worth looking at for bioinformatics people.
(6) Remember that tuples are unchangeable (immutable) and lists are changeable. So far, this has led me to be pretty list-oriented, but I am new to this.

I’ll leave it at that for now. I’ll write more about python later on.