TAMALg: is the package available?

I’ve received a lot of questions recently about TAMALg availability. Unfortunately, there is only a difficult-to-install package available right now; I sent it to someone recently and they had a terrible time getting it going.

I do describe the algorithm in the supplementary materials to the ENCODE spike-in competition paper (Johnson et al, Genome Research 2008).

I would love to have a simple package to distribute, but this is little supported in today’s granting environment; in fact, I don’t think that making algorithms widely available has ever been well-supported by any US funding agency. And I doubt the situation is different here in Canada.

I may be getting another undergrad soon and would task that person with working on the package. As a new faculty member, I am simply overwhelmed with basics like getting my lab going right now.

I do hope that this situation changes and thanks to all for patience.

As I have noted previously, the L2L3combo predictions produced by the TAMALPAIS server (see previous posts on this or just search for “TAMALPAIS Bieda” – no quotes, though) are the same predictions as made by TAMALg. TAMALg also adds the step of estimating enrichment via using maxfour type methodology.

So you can get good TAMALg predictions of sites just by using the webserver. I suggest going this route.

And to repeat – TAMALg is almost certainly NOT what you want for promoter arrays. Except if you have a factor in only a tiny fraction of promoters or one of the newer designs with very long promoter regions (e.g. for 10 kb promoters, might be ok).

TAMALPAIS and promoter arrays

TAMALPAIS NimbleGen Promoter Arrays Array Analysis Problems Mark Bieda

I’ve been receiving some questions on TAMALPAIS usage for promoter arrays via email.

On the TAMALPAIS website, I say “Do not use this for promoter arrays.

This is actually not quite true; there are a limited number of cases in which TAMALPAIS will perform well for promoter arrays. In this post, I discuss this.

When TAMALPAIS is ok for promoter arrays:
In short:
1. If your factor only binds to a tiny portion of the promoters (<5%), then TAMALPAIS will perform ok.
2. More correct – and important – if only a small number of probes on the array are within binding sites for your factor, then you are ok. So: for promoter array designs with long promoters, you might have 15% of the promoters with a binding site. But only a small number of probes in the binding sites. (Hopefully this makes sense.)

Why do I say “Do not use TAMALPAIS for promoter arrays”?
If you have a factor that binds to (or exists in) a lot of promoter regions – like POLII or some histone modifications – then TAMALPAIS will give you bad results. I don’t want that to happen. Right now, study of histone mods and POLII are a big deal, so I don’t want people to be unhappy.

If not TAMALPAIS, then what?
There are a number of options. I developed maxfour to score promoters (see Krig et al. 2007 in JBC). I will be releasing an easy to use version of this software by the fall 2008 (planned, not a promise). This is really the best option with NimbleGen’s current crop of designs, in my opinion. Someone else may have some great promoter array analysis software; I’m not aware of this right now – feel free to email me or leave comments. I don’t mean to be unfair to other bioinformaticians with this.

What about the promoter array analysis server?
Ah, yes. This does very limited analysis – see my post on it in this blog (click the promoter array category button on the sidepanel).

TAMALPAIS known limitation: must be by chromosome

TAMALPAIS Mark Bieda

TAMALPAIS KNOWN LIMITATION:

1. The first field of the gff file must be by chromosome; in particular, it probably needs to be like chr1
or like chr1, chrX, chrY, chr20.

Further details:

I suspect (but am not sure) that anything of the form chr(anything) will work. But I am not sure of this. Note that use of non-standard chr names do have the limitation that the optional secondary analyses like location and gene finding would not work.

Non-standard name examples:
Like chr99 might be ok. Or chrMYGOODONE.

What am I talking about?

If you look at the first lines of your gff, you will see in the first column the location designation. For most gffs, this is like chr1. To see examples, go to the sample data page on the website. You will see that these files are by chr.

To look at your own gff files, it is easy to load them into a text editor in Linux, or for windows, I strongly suggest that you use the excellent Notepad++ (do a google search, it is completely free).

TAMALg and TAMALPAIS: NimbleGen data analysis

Ok, I wanted to write about the relationship between TAMALPAIS and TAMALg.

keywords: Mark Bieda, TAMALPAIS, TAMALg, NimbleGen, ChIP-chip

Background
A major part of my research is developing algorithms and statistical models for analysis of ChIP-chip experiments – specifically those done with NimbleGen arrays.
TAMALPAIS (available here) predicts binding sites from NimbleGen array data and also does some basic secondary analyses like localization of binding sites in reference to transcription start sites and which genes have a binding site in the proximal promoter. The website version gives a lot of output.
TAMALg (TAMALpais generalized) recently was ranked #1 in an unbiased competition between algorithms. It uses the same exact prediction approach as TAMALPAIS (technically, it uses the L2L3combo set of predictions – to get these predictions, go to the TAMALPAIS website here). Then, in a second step, it uses the maxfour approach that I developed for promoter arrays (Krig et al., 2007 in JBC) to predict the actual amount of enrichment per binding site.

So the relationship between the TAMALPAIS and TAMALg is this:
TAMALPAIS produces the same high-quality peak predictions as TAMALg (and I say high quality because the competition showed this; see this paper abstract). But TAMALPAIS does not do the enrichment prediction. Remember to look at the L2L3combo set from TAMALPAIS to get the same predictions as TAMALg.

Future Stuff
I am planning on producing a downloadable version of TAMALg (probably Jython-based so that it will easily run on many platforms).

Remember! TAMALPAIS and TAMALg are not good for most promoter arrays!

If you have questions, you should contact me (see About tab on this site for contact info),

TAMALPAIS: howto open files

key words: TAMALPAIS, NimbleGen, Mark Bieda, ChIP, server

Background:
TAMALPAIS is the webserver that I created to analyze NimbleGen ChIP-chip data (note that it is not for promoter data). You can find it at:

http://chipanalysis.genomecenter.ucdavis.edu/cgi-bin/tamalpais.cgi

I’ve received queries from a number of people on opening files from my TAMALPAIS server.

Some people have trouble opening the files from the TAMALPAIS server, here are instructions:

Mac:
1. on the mac (modern macs with OSX, not ancient macs), this should be easy – just click on the file

Windows:
(one option: transfer the files to a Mac (see above). If you don’t want to do this (I wouldn’t), then continue)
1. download the FREE 7ZIP program from www.7-zip.org
2. install 7ZIP
3. right-click on the file from TAMALPAIS and select 7ZIP from the menu, select “Open archive”
4. click on the files that show up in the archive window. At any point, you can click on the “extract” button in the toolbar in the window (it is the the large “minus sign” that is blue/purple).
5. for any of the files ending with .tar.gz, or ending with .tar, or ending with .zip, you can continue to do this procedure (starting with step #3).

There are a bunch of files in subarchives (that is, in other .tar.gz files within the archive).

Problems?

If you have problems, contact me using the contact information on the About page of this blog.