2009 post: Free, easy, quick, great PDF creation: Try OpenOffice

keywords: free software, opensource, OpenOffice, grantwriting

I try to give credit where credit is due.

I have written before about using OpenOffice (version 2.4) for “real professional work.” In an earlier post, I wrote about successfully writing an entire grant application using OpenOffice for wordprocessing and figure creation in conjuntion with Zotero for references (and the grant was funded, so…).

PDF creation from OpenOffice (use “Export to PDF” in the File menu) simply works great. It is very fast and the pdf quality is excellent. One note – it does not open the pdf automatically – it just stores the file – so pay attention to this. This works much better than printing to a pdf using the Adobe PDF printer or using the Microsoft Office 2007 export to pdf functions (which, besides being slow, caused Microsoft Office to crash occasionally on my machine).

Also, before I forget, I really like OpenOffice Draw for scientific figure creation – I use it a lot in my work and I have been quite happy with it. I’m using Microsoft Office a fair amount now, but I still use draw to make figures. I’ve used Zotero and Draw for well over a year now, with fairly intense use.

Note: This is almost entirely based on using OpenOffice 2.4. The current version is 3.0, which I just downloaded.

Advertisements

2008 post: Bioinformatics: Sequence Alignment Is Central…?

Keywords: Illumina, Sequence Alignment, algorithms, teaching, next-generation sequencing

I haven’t posted in a while; I have been busy teaching bioinformatics. I do receive an occasional email or question about learning bioinformatics, so why don’t I just write what I taught here?

Here, at least, was my thinking on the subject. Remember that I was teaching second year students with a variety of backgrounds.

The first point is that sequence analysis/alignment is the heart of bioinformatics. Ok, you can argue with me on this. But I think that sequence alignment is, without question, a major – if not THE major – success in bioinformatics. Why do I say this?

1. Sequence alignment is non-trivial.

2. Sequence alignment approaches derive from a solid mathematical basis.

3. There are well worked out statistics for sequence alignment.

4. Sequence alignment is extremely prevalent and popular as an application of bioinformatics – not least of which is evolutionary studies of gene change and, of course, analysis of the rapidly growing number of fully sequenced genomes (or even partially sequenced ones, for that matter).

5. New situations that are variants/subsets/offshoots of sequence alignment are emerging that have already produced new algorithmic/computational frameworks. So, although this is arguably a fairly mature area of study (I think so), there is new work being done. Specifically, I am thinking of new sequence alignment approaches for next-generation sequence data (esp. short reads like Illumina, ABI) and (probably) also for metagenomics data. In the case of next-generation sequencing, mostly we want to align near-perfect reads – optimizing this for tens of millions of reads is non-trivial. Some recent work that looks good is ZOOM! in Bioinformatics 2008 24:2431 and SeqMap in Bioinformatics 2008 24:2395. (But note that I have not used either at all yet).

As a route to teaching bioinformatics, I also like sequence alignment because it touches on major topics in bioinformatics/biology: alignment itself, evolution of sequence (including phylogenetic tree construction), hidden markov models (profile HMMs, pair HMMs, PAM for alignment), etc. So just by examining sequence alignment, I end up introducing major “techniques” in bioinformatics (note that this point is certainly not original; you see it in the famous Durbin et al. book Biological Sequence Analysis and in other books like Mount’s text Bioinformatics).

2008 post: TAMALg: is the package available?

I’ve received a lot of questions recently about TAMALg availability. Unfortunately, there is only a difficult-to-install package available right now; I sent it to someone recently and they had a terrible time getting it going.

I do describe the algorithm in the supplementary materials to the ENCODE spike-in competition paper (Johnson et al, Genome Research 2008).

I would love to have a simple package to distribute, but this is little supported in today’s granting environment; in fact, I don’t think that making algorithms widely available has ever been well-supported by any US funding agency. And I doubt the situation is different here in Canada.

I may be getting another undergrad soon and would task that person with working on the package. As a new faculty member, I am simply overwhelmed with basics like getting my lab going right now.

I do hope that this situation changes and thanks to all for patience.

As I have noted previously, the L2L3combo predictions produced by the TAMALPAIS server (see previous posts on this or just search for “TAMALPAIS Bieda” – no quotes, though) are the same predictions as made by TAMALg. TAMALg also adds the step of estimating enrichment via using maxfour type methodology.

So you can get good TAMALg predictions of sites just by using the webserver. I suggest going this route.

And to repeat – TAMALg is almost certainly NOT what you want for promoter arrays. Except if you have a factor in only a tiny fraction of promoters or one of the newer designs with very long promoter regions (e.g. for 10 kb promoters, might be ok).

2008 post: Jobs: Postdoctoral Positions in my lab

Update: I just hired an experimental postdoc – thanks to all that applied – and I am temporarily suspending the search for a computational postdoc.

I’m looking for two postdocs: one computational (bioinformatics) postdoc and one molecular biology postdoc.

I just posted this ad to naturejobs, so here is the info:

Positions:

2 Postdoctoral Positions Total

1 Computational (Bioinformatics) Postdoctoral Fellow

1 Experimental (Molecular Biology) Postdoctoral Fellow

Description:

These positions are in the laboratory of Mark Bieda. The lab focuses on (1) development of novel statistical and computational approaches to ChIP-seq and ChIP-chip data and (2) investigating the changes in chromatin marks in cancer using chromatin immunoprecipitation and related molecular biology approaches. These positions offer an excellent opportunity for cross-training (e.g. bioinformatics training for an experimentalist, experimental training for a computational postdoc).

Bioinformatics Position: The computational position will focus on novel statistical and algorithmic methods for analysis of microarray (ChIP-chip) and high-throughput sequencing (ChIP-seq) experiments. This project will afford the opportunity for large-scale experimental validation of predictions within the lab. The successful computational candidate will be comfortable thinking statistically and have good programming skills with a keen interest in large-scale data analysis. Experimental Position: The experimental position will focus on examining chromatin organization in brain tumor models (primarily gliomas). There is also opportunity for work on other projects in neurogenomics. Previous experience/familiarity with neuroscience is a plus, but not required. The successful candidate will have experience with a wide range of molecular biology techniques.

Both positions offer opportunities for both formal collaborative and informal interactions with other strong research groups, including a very active Brain Tumor Group at the university. The PI is committed to developing the careers of members of the laboratory.

The University of Calgary offers an excellent environment with a rapidly growing pool of biomedical research labs and significant shared facilities. We encourage all qualified persons to apply. The University of Calgary hires on the basis of merit and is committed to employment equity. However, Canadians and permanent residents of Canada will be given priority.

Calgary is a city of ~1 million people and is located only about 1.5 hours from world-renowned recreational areas (Banff and Jasper).

To apply, please send (1) cover letter, (2) CV and (3) names and contact information for three references to Aarif Edoo (aedo@ucalgary.ca). PDF format for application materials is preferred. Letters should be addressed to Mark Bieda, Ph.D.

2008 post: Free Multiplatform Reference Management? Try Zotero

So 2009… remember? So a lot of this is probably out of date, sorry!

Mark Bieda zotero references computer software citations

You use Endnote, refman, or one of the others. You want a free alternative because (1) you don’t want to worry about licensing issues (like buying a new copy for each computer) (2) you want something that will run under windows, linux, and mac os x (3) you just don’t want to pay or (4) you want to move your references from place to place without having to adapt to the local software choice (i.e. some places will have Endnote, others will have RefMan, others will have other solutions) or (5) you just believe stuff like this should be free.

So: I have been using Zotero for over a year. Zotero is great for everyday web stuff, but here I will just talk about it as a reference manager.

As with my other software comments, this is based on my real experience. I recently wrote an entire grant using Zotero as my only reference manager. And it worked well.

A key thing:
Zotero is heavily and institutionally supported (see the webpage). From the forum comments, you can see that many users are in academe. So it should only get better

Problems/Weaknesses:
(1) This is clearly still in development. But, as I said, I wrote a grant with it – and it worked well for me, but it is not as smooth as EndNote in many ways.
(2) There are a limited number of citation styles, but this number is growing – and you can define your own. For things like grants, usually you get to choose a style. For a typical paper, you won’t have a large number of references, and a little manual editing. Still, because of this, Endnote really still has a big edge.

Getting it:
(1) Zotero is a firefox extension and, when you go to the site, seems more geared toward web-based research.
(2) Installation is superfast and easy. Firefox is the way to go. No internet explorer version.
(3) You will also need to download plug-ins for either Microsoft Word or Openoffice Writer. I used OpenOffice Writer for my grant.

Basics:
(1) There is a tutorial on the website, unfortunately oriented mostly toward the MS Word usage. The same rules apply.
(2) IF you are using OpenOffice Writer, here is something to be careful with: don’t save your files in .doc (MS word) format. I usually do, because I need to send files to colleagues, all of who have MS word but not OpenOffice Writer. If you do this, you will lose the ability to handle your citations.

Getting going:
download and install Zotero from the Zotero website
download and install the appropriate word processing plugin

To get citations:
(1) you can import from many, many sites – like Pubmed, notably.You just click on a button when you find something you like and it gets imported into Zotero.

Recommendations:
(1) When I last looked (about April, 2008), the documentation for Zotero was generally very good, but the documentation for the citation/reference aspects was very poor. So I strongly suggest that you download a few references and play with a pretend, test document to get a sense of how zotero works and your results. I did this and it really helped me use it. Only took a few minutes of playing around.

2008 post: Linux Installation on HP Pavilion Desktop (June 2008 purchase)

This may be helpful to someone, so I’ll keep this post alive.

Mark Bieda HP Linux install installation

This is just a brief post about my (read: my student’s) experience with installing linux on a new HP Pavilion. This is a standard model available at Futureshop and BestBuy: intel quadcore Q6660 processor, 640 Gb harddisk, 3 Gb RAM. Nice machine, only $899 here in Canada (sure to be cheaper in the USA).

So I’ve installed linux on several laptops and desktops, including Mandriva, Red Hat, Fedora, Suse. And of course I have run Knoppix and, as indicated in an earlier post, have been using DSL (Damn Small Linux) under VMPlayer for a while now.

So this time, let the undergrad do it!

Here are the notes:
(1) this computer had Windows Vista on it. Home Premium edition. We wanted to keep windows, not because I love windows, but because I have some key software that only runs on windows (e.g. NimbleGen SignalMap for looking at data).
(2) Installation of OpenSuse 10.3 caused a conflict with the windows system which led to a restore operation (nothing was lost, no big deal). So we dropped working on this one – and went to working on Ubuntu 8.04 LTS.
(3) The big problem was that the ethernet card, built into the motherboard, has known problems with talking to current linux distros. The joy of a new computer!
(4) Ubuntu installed well except for the ethernet card deal, which is a big problem.
(5) To solve the ethernet card problem, we just ended up buying a new card for the computer – it was only $19.76 at our friendly University of Calgary MicroIT store. Model is “Gigabit Ethernet PCI Card” from startech.com. The model number appears to be ST1000BT32. This solved the problem, although MFU (My Friendly Undergrad) had to do something to disable the BIOS from trying to connect to the one in the motherboard (which was not deadly, but led to one of those long pauses in bootup).

The Results
Everything seems to run very well. The computer is happy, it talks to the internet (from both windows and linux) and, as usual, everything runs just a bit (or a lot, depending) on the linux side vs the windows side.

On KDE
I am a longtime KDE user, and I really like KDE in this distribution (downloaded and installed as packages in Ubuntu). I guess it is technically Kubuntu, but like I said, the undergrad was doing the installation so… I got to skip on thinking about this stuff.

2008 post: Python for Perl Programmers (and Bioinformatics people)

Uh, this is so old that it should be skipped, I think… I’m keeping it up for archival sake

keywords:

Mark Bieda python getting started quick tips hints tutorial

I wanted to write a short post about getting started in python.

What you will like about Python as a perl person:
(1) A great thing is the interpreter. This will allow really rapid learning of python. For a perl person, python should come really fast. I was very, very surprised at how quickly I was writing actually useful (not toy) programs to manipulate things.
(2) It is easy to install in windows and has a decent editor/run environment (IDLE). Python is now a standard part of Linux distros, except for the smallest ones (perl is everywhere, so an advantage to perl here, but only a small one).

Some key things:
(1) The online manuals for python are good (but maybe not great). The Guido tutorial is key; make sure that you get the latest one.
(2) If you like to have a book on the python around (I always do for my programming language du jour), then make sure that you have the most recent one.
(3) Why the emphasis on the most recent? Python has added key new features in recent times – like even since version 2.4! So make sure that you have the latest documentation.

Installation and Usage:
(1) For windows people, use the IDLE editor. Really. You will find it very easy to use and efficient. It comes in the download, so no installation deal.
(2) To learn python really fast, just play with commands in the interpreter window. It really is easy and efficient – a very quick way to get up to speed on things.

Some key things for bioinformatics people, in particular:
(1) Sets. Sets are very nice. Intersection, union… all that stuff that you want to use.
(2) A lot of string manipulation functions (actually methods, technically) are available. These will do a lot of what you would do with regular expressions, but see the next point.
(3) Unfortunately, regular expressions are in an external (but standard library) and are a bit different from perl in usage/implementation.
(4) Like perl, the built-in sorting in python is weird (and annoying to set up to do anything beyond simple), but very useful. Again, here, make sure that you look at the latest documentation.
(5) Sqlite library is now part of the standard package. I haven’t used it yet as part of python – but given that this is a standard part of the distribution, it seems like I could write code that uses it and not worry about portability issues. This is well worth looking at for bioinformatics people.
(6) Remember that tuples are unchangeable (immutable) and lists are changeable. So far, this has led me to be pretty list-oriented, but I am new to this.

I’ll leave it at that for now. I’ll write more about python later on.