2009 post: Key Bioinformatics Computer Skills

Note: this was written in 2009 so… out of date somewhat!

I’ve been asked several times about which computer skills are critical for bioinformatics. Important – note that I am just addressing the “computer skills” side of things here. This is my list for being a functional, comfortable bioinformatician.

  1. SQL and knowledge of databases. I always recommend that people start with MySQL, because it is crossplatform, very popular, and extremely well developed.
  2. Perl or Python. Python wins now! (2017 update!)  Preferably perl. It kills me to write this, because I like python so much more than perl, but from a “getting the most useful skills” perspective, I think you have to choose perl.
  3. basic Linux. Actually, being at a semi-sys admin level is even better. I always tell people to go “cold turkey” and just install Linux on their computer and commit to using it exclusively for a while. (Due to OpenOffice etc, this should be mostly doable these days). This will force a person to get comfortable. Learning to use a Mac from the command line is an ok second option, as is Solaris etc. Still, I’d have to say Linux would be preferred.
  4. basic bash shell scripting. There are still too many cases where this ends up being “just the thing to do”. And of course, this all applies to Mac.
  5. Some experience with Java or other “traditional languages” or a real understanding of  modern programming paradigms. This may seem lame or vague. But it is important to understand how traditional programming languages approach problems. At minimum, this ensures some exposure to concepts like object-oriented programming, functional programming, libraries, etc. I know that one can get all of this with python and, yes, even perl – but I fear that some many bioinformatics people get away without knowing these things to their detriment.
  6. R + Bioconductor. So many great packages in Bioconductor. Comfort with R can solve a lot of problems quickly. R is only growing; if I could buy stock in R, I would!

This may seem like a lot, but many of these items fit together very well. For example, one could go “cold turkey” and just use Linux and commit to doing bioinformatics by using a combination of R, perl and shell scripting, and an SQL-based database (MySQL). It is very common in bioinformatics to link these pieces, so… not so bad, in the end, I think.

As always, comments welcome…

Advertisements

2009 post: Free, easy, quick, great PDF creation: Try OpenOffice

keywords: free software, opensource, OpenOffice, grantwriting

I try to give credit where credit is due.

I have written before about using OpenOffice (version 2.4) for “real professional work.” In an earlier post, I wrote about successfully writing an entire grant application using OpenOffice for wordprocessing and figure creation in conjuntion with Zotero for references (and the grant was funded, so…).

PDF creation from OpenOffice (use “Export to PDF” in the File menu) simply works great. It is very fast and the pdf quality is excellent. One note – it does not open the pdf automatically – it just stores the file – so pay attention to this. This works much better than printing to a pdf using the Adobe PDF printer or using the Microsoft Office 2007 export to pdf functions (which, besides being slow, caused Microsoft Office to crash occasionally on my machine).

Also, before I forget, I really like OpenOffice Draw for scientific figure creation – I use it a lot in my work and I have been quite happy with it. I’m using Microsoft Office a fair amount now, but I still use draw to make figures. I’ve used Zotero and Draw for well over a year now, with fairly intense use.

Note: This is almost entirely based on using OpenOffice 2.4. The current version is 3.0, which I just downloaded.

2008 post: Free Multiplatform Reference Management? Try Zotero

So 2009… remember? So a lot of this is probably out of date, sorry!

Mark Bieda zotero references computer software citations

You use Endnote, refman, or one of the others. You want a free alternative because (1) you don’t want to worry about licensing issues (like buying a new copy for each computer) (2) you want something that will run under windows, linux, and mac os x (3) you just don’t want to pay or (4) you want to move your references from place to place without having to adapt to the local software choice (i.e. some places will have Endnote, others will have RefMan, others will have other solutions) or (5) you just believe stuff like this should be free.

So: I have been using Zotero for over a year. Zotero is great for everyday web stuff, but here I will just talk about it as a reference manager.

As with my other software comments, this is based on my real experience. I recently wrote an entire grant using Zotero as my only reference manager. And it worked well.

A key thing:
Zotero is heavily and institutionally supported (see the webpage). From the forum comments, you can see that many users are in academe. So it should only get better

Problems/Weaknesses:
(1) This is clearly still in development. But, as I said, I wrote a grant with it – and it worked well for me, but it is not as smooth as EndNote in many ways.
(2) There are a limited number of citation styles, but this number is growing – and you can define your own. For things like grants, usually you get to choose a style. For a typical paper, you won’t have a large number of references, and a little manual editing. Still, because of this, Endnote really still has a big edge.

Getting it:
(1) Zotero is a firefox extension and, when you go to the site, seems more geared toward web-based research.
(2) Installation is superfast and easy. Firefox is the way to go. No internet explorer version.
(3) You will also need to download plug-ins for either Microsoft Word or Openoffice Writer. I used OpenOffice Writer for my grant.

Basics:
(1) There is a tutorial on the website, unfortunately oriented mostly toward the MS Word usage. The same rules apply.
(2) IF you are using OpenOffice Writer, here is something to be careful with: don’t save your files in .doc (MS word) format. I usually do, because I need to send files to colleagues, all of who have MS word but not OpenOffice Writer. If you do this, you will lose the ability to handle your citations.

Getting going:
download and install Zotero from the Zotero website
download and install the appropriate word processing plugin

To get citations:
(1) you can import from many, many sites – like Pubmed, notably.You just click on a button when you find something you like and it gets imported into Zotero.

Recommendations:
(1) When I last looked (about April, 2008), the documentation for Zotero was generally very good, but the documentation for the citation/reference aspects was very poor. So I strongly suggest that you download a few references and play with a pretend, test document to get a sense of how zotero works and your results. I did this and it really helped me use it. Only took a few minutes of playing around.

2008 post: Linux Installation on HP Pavilion Desktop (June 2008 purchase)

This may be helpful to someone, so I’ll keep this post alive.

Mark Bieda HP Linux install installation

This is just a brief post about my (read: my student’s) experience with installing linux on a new HP Pavilion. This is a standard model available at Futureshop and BestBuy: intel quadcore Q6660 processor, 640 Gb harddisk, 3 Gb RAM. Nice machine, only $899 here in Canada (sure to be cheaper in the USA).

So I’ve installed linux on several laptops and desktops, including Mandriva, Red Hat, Fedora, Suse. And of course I have run Knoppix and, as indicated in an earlier post, have been using DSL (Damn Small Linux) under VMPlayer for a while now.

So this time, let the undergrad do it!

Here are the notes:
(1) this computer had Windows Vista on it. Home Premium edition. We wanted to keep windows, not because I love windows, but because I have some key software that only runs on windows (e.g. NimbleGen SignalMap for looking at data).
(2) Installation of OpenSuse 10.3 caused a conflict with the windows system which led to a restore operation (nothing was lost, no big deal). So we dropped working on this one – and went to working on Ubuntu 8.04 LTS.
(3) The big problem was that the ethernet card, built into the motherboard, has known problems with talking to current linux distros. The joy of a new computer!
(4) Ubuntu installed well except for the ethernet card deal, which is a big problem.
(5) To solve the ethernet card problem, we just ended up buying a new card for the computer – it was only $19.76 at our friendly University of Calgary MicroIT store. Model is “Gigabit Ethernet PCI Card” from startech.com. The model number appears to be ST1000BT32. This solved the problem, although MFU (My Friendly Undergrad) had to do something to disable the BIOS from trying to connect to the one in the motherboard (which was not deadly, but led to one of those long pauses in bootup).

The Results
Everything seems to run very well. The computer is happy, it talks to the internet (from both windows and linux) and, as usual, everything runs just a bit (or a lot, depending) on the linux side vs the windows side.

On KDE
I am a longtime KDE user, and I really like KDE in this distribution (downloaded and installed as packages in Ubuntu). I guess it is technically Kubuntu, but like I said, the undergrad was doing the installation so… I got to skip on thinking about this stuff.

2008 post: Python for Perl Programmers (and Bioinformatics people)

Uh, this is so old that it should be skipped, I think… I’m keeping it up for archival sake

keywords:

Mark Bieda python getting started quick tips hints tutorial

I wanted to write a short post about getting started in python.

What you will like about Python as a perl person:
(1) A great thing is the interpreter. This will allow really rapid learning of python. For a perl person, python should come really fast. I was very, very surprised at how quickly I was writing actually useful (not toy) programs to manipulate things.
(2) It is easy to install in windows and has a decent editor/run environment (IDLE). Python is now a standard part of Linux distros, except for the smallest ones (perl is everywhere, so an advantage to perl here, but only a small one).

Some key things:
(1) The online manuals for python are good (but maybe not great). The Guido tutorial is key; make sure that you get the latest one.
(2) If you like to have a book on the python around (I always do for my programming language du jour), then make sure that you have the most recent one.
(3) Why the emphasis on the most recent? Python has added key new features in recent times – like even since version 2.4! So make sure that you have the latest documentation.

Installation and Usage:
(1) For windows people, use the IDLE editor. Really. You will find it very easy to use and efficient. It comes in the download, so no installation deal.
(2) To learn python really fast, just play with commands in the interpreter window. It really is easy and efficient – a very quick way to get up to speed on things.

Some key things for bioinformatics people, in particular:
(1) Sets. Sets are very nice. Intersection, union… all that stuff that you want to use.
(2) A lot of string manipulation functions (actually methods, technically) are available. These will do a lot of what you would do with regular expressions, but see the next point.
(3) Unfortunately, regular expressions are in an external (but standard library) and are a bit different from perl in usage/implementation.
(4) Like perl, the built-in sorting in python is weird (and annoying to set up to do anything beyond simple), but very useful. Again, here, make sure that you look at the latest documentation.
(5) Sqlite library is now part of the standard package. I haven’t used it yet as part of python – but given that this is a standard part of the distribution, it seems like I could write code that uses it and not worry about portability issues. This is well worth looking at for bioinformatics people.
(6) Remember that tuples are unchangeable (immutable) and lists are changeable. So far, this has led me to be pretty list-oriented, but I am new to this.

I’ll leave it at that for now. I’ll write more about python later on.

2008 post: I wish I had… started with python earlier…

So far, my bioinformatics work has used a melange of perl, R, and bash scripting. While this has worked pretty well, it does have limits. For one, it is very not portable (bash scripting). I’ve already had problems with distributing software.

I wanted something that I could distribute in an easier way, yet had the advantages of perl. I found Jython, which is Python-in-Java. For me, the big deal is not use of Java libraries, but rather that the language would compile to Java byte-code and hence would be easy to distribute.

But I found that Python is much more than this: the interactive environment, for one, makes me ok with not having my unix/linux toolbox when I am stuck on the windows side.

And Python has a lot of nice features for bioinformatics work, including convenient types like sets (as of version 2.4) and even comes with sqlite (which I have not used from python, but want to)…

Anyways, for now, I am a fan.

2008 post: Sqlite (Sqlite3) quick tips: if you know SQL already

Mark Bieda SQL Sqlite Sqlite3

I’m a long-time MySQL user, but recently I’ve been using sqlite (sqlite3).
This is a sqlite tutorial, in a sense, if you know SQL.

As with my other stuff, this is based on my real experience of using this system

Why use sqlite?
The basic thing is that it installs super fast (unbelievably, you just download a .exe file for windows and run it). This is in contrast to the big MySQL model. You get to skip all that client-server business (which is really important in many cases, but not for most stuff that I do).

installation and getting started
1. download and (on windows) just place the .exe somewhere. I like to place it in C:\sqlite3\
2. (windows) At the Start button, click Run and cmd as the run command. Go to C:\sqlite3 and run
sqlite3 temp.db

Critical stuff to know
.help — gives the list of dot commands. Important and useful
.separator "," — means to separate input and output columns (fields) by commas
.separator "\t" — same but with tabs
(important) – you have to set the separator before attempting to load data from a file into the database
.output myresults.txt — starts directing all query (like SELECT statements) output to myresults.txt
.output stdout — starts directing all query (like SELECT statements) output to stdout; will close any previous output file
.import gooddata.csv mytable — imports data from gooddata.csv to mytable using the current separator value to separate fields
.tables — a list of the tables in the database
.databases — a list of the databases
.schema mytable — statements used to create mytable; will also list indexes (useful!)

Control of Sqlite3:
Ctrl-c — ends Sqlite3
; –a semicolon must be used to end a line

A typical session
Note: I “made up” this session, so there could be a few small bugs…
create mytable (idnum varchar(20), salary float, age int);
.separator "\t";
.import persondata.txt mytable;
create index idex on mytable(idnum);
select * from mytable where age<30;

How I use sqlite3:
I know SQL “by heart”, so it is pretty easy for me to do things quickly with files, especially when I have to correlate values in files. Sometimes I reformat files in bash, perl, or more recently, Python.

Note that “sets” in Python (introduced after version 2.4) give really good database like behavior. And sets are fast, in my experience.