I’ve received a lot of questions recently about TAMALg availability. Unfortunately, there is only a difficult-to-install package available right now; I sent it to someone recently and they had a terrible time getting it going.
I do describe the algorithm in the supplementary materials to the ENCODE spike-in competition paper (Johnson et al, Genome Research 2008).
I would love to have a simple package to distribute, but this is little supported in today’s granting environment; in fact, I don’t think that making algorithms widely available has ever been well-supported by any US funding agency. And I doubt the situation is different here in Canada.
I may be getting another undergrad soon and would task that person with working on the package. As a new faculty member, I am simply overwhelmed with basics like getting my lab going right now.
I do hope that this situation changes and thanks to all for patience.
As I have noted previously, the L2L3combo predictions produced by the TAMALPAIS server (see previous posts on this or just search for “TAMALPAIS Bieda” – no quotes, though) are the same predictions as made by TAMALg. TAMALg also adds the step of estimating enrichment via using maxfour type methodology.
So you can get good TAMALg predictions of sites just by using the webserver. I suggest going this route.
And to repeat – TAMALg is almost certainly NOT what you want for promoter arrays. Except if you have a factor in only a tiny fraction of promoters or one of the newer designs with very long promoter regions (e.g. for 10 kb promoters, might be ok).
TAMALPAIS NimbleGen Promoter Arrays Array Analysis Problems Mark Bieda
I’ve been receiving some questions on TAMALPAIS usage for promoter arrays via email.
On the TAMALPAIS website, I say “Do not use this for promoter arrays.”
This is actually not quite true; there are a limited number of cases in which TAMALPAIS will perform well for promoter arrays. In this post, I discuss this.
When TAMALPAIS is ok for promoter arrays:
1. If your factor only binds to a tiny portion of the promoters (<5%), then TAMALPAIS will perform ok.
2. More correct – and important – if only a small number of probes on the array are within binding sites for your factor, then you are ok. So: for promoter array designs with long promoters, you might have 15% of the promoters with a binding site. But only a small number of probes in the binding sites. (Hopefully this makes sense.)
Why do I say “Do not use TAMALPAIS for promoter arrays”?
If you have a factor that binds to (or exists in) a lot of promoter regions – like POLII or some histone modifications – then TAMALPAIS will give you bad results. I don’t want that to happen. Right now, study of histone mods and POLII are a big deal, so I don’t want people to be unhappy.
If not TAMALPAIS, then what?
There are a number of options. I developed maxfour to score promoters (see Krig et al. 2007 in JBC). I will be releasing an easy to use version of this software by the fall 2008 (planned, not a promise). This is really the best option with NimbleGen’s current crop of designs, in my opinion. Someone else may have some great promoter array analysis software; I’m not aware of this right now – feel free to email me or leave comments. I don’t mean to be unfair to other bioinformaticians with this.
What about the promoter array analysis server?
Ah, yes. This does very limited analysis – see my post on it in this blog (click the promoter array category button on the sidepanel).
NimbleGen Promoter Arrays Mark Bieda server Analysis
Note: minor corrections on June 9, 2008
The promoter array server is located at this site
IMPORTANT USAGE NOTE: USE FIREFOX (Internet explorer appears to create issues)
What does it do?
1. This does a simple list comparison using NimbleGen .tab files
In other words, it just outputs the number of entries that are the same in the lists for the top100, top 200, etc.
2. this is a very simple application; just a convenience, really
3. this does not do an analysis like TAMALPAIS (see the category on TAMALPAIS on this blog).
this is based on the .tab file format from NimbleGen; it has to have a dummy line to begin.
here is a sample of an ok file format:
first dummy line
genenameholder CHR10_100017497_100020197 maxfourv02 0.2025
genenameholder CHR10_100164431_100167131 maxfourv02 0.7775
genenameholder CHR10_100196194_100198894 maxfourv02 0.6625
Notes on the format:
1. IMPORTANT: all fields are separated by tabs
2. The first field can vary and be meaningful.
3. The third field can vary and be meaningful.
4. The second field is the promoter name used for comparisons
5. The fourth field is the promoter value (numerical value) used for sorting (that is, determining order).
6. It’s ok to have more fields than the four. In other words, files of 10 fields are ok too. But the program will only look at the second and fourth fields.
Notes on usage:
1. the data can be entered unsorted
TAMALPAIS Mark Bieda
TAMALPAIS KNOWN LIMITATION:
1. The first field of the gff file must be by chromosome; in particular, it probably needs to be like chr1
or like chr1, chrX, chrY, chr20.
I suspect (but am not sure) that anything of the form chr(anything) will work. But I am not sure of this. Note that use of non-standard chr names do have the limitation that the optional secondary analyses like location and gene finding would not work.
Non-standard name examples:
Like chr99 might be ok. Or chrMYGOODONE.
What am I talking about?
If you look at the first lines of your gff, you will see in the first column the location designation. For most gffs, this is like chr1. To see examples, go to the sample data page on the website. You will see that these files are by chr.
To look at your own gff files, it is easy to load them into a text editor in Linux, or for windows, I strongly suggest that you use the excellent Notepad++ (do a google search, it is completely free).
Ok, I wanted to write about the relationship between TAMALPAIS and TAMALg.
keywords: Mark Bieda, TAMALPAIS, TAMALg, NimbleGen, ChIP-chip
A major part of my research is developing algorithms and statistical models for analysis of ChIP-chip experiments – specifically those done with NimbleGen arrays.
TAMALPAIS (available here) predicts binding sites from NimbleGen array data and also does some basic secondary analyses like localization of binding sites in reference to transcription start sites and which genes have a binding site in the proximal promoter. The website version gives a lot of output.
TAMALg (TAMALpais generalized) recently was ranked #1 in an unbiased competition between algorithms. It uses the same exact prediction approach as TAMALPAIS (technically, it uses the L2L3combo set of predictions – to get these predictions, go to the TAMALPAIS website here). Then, in a second step, it uses the maxfour approach that I developed for promoter arrays (Krig et al., 2007 in JBC) to predict the actual amount of enrichment per binding site.
So the relationship between the TAMALPAIS and TAMALg is this:
TAMALPAIS produces the same high-quality peak predictions as TAMALg (and I say high quality because the competition showed this; see this paper abstract). But TAMALPAIS does not do the enrichment prediction. Remember to look at the L2L3combo set from TAMALPAIS to get the same predictions as TAMALg.
I am planning on producing a downloadable version of TAMALg (probably Jython-based so that it will easily run on many platforms).
Remember! TAMALPAIS and TAMALg are not good for most promoter arrays!
If you have questions, you should contact me (see About tab on this site for contact info),
key words: TAMALPAIS, NimbleGen, Mark Bieda, ChIP, server
TAMALPAIS is the webserver that I created to analyze NimbleGen ChIP-chip data (note that it is not for promoter data). You can find it at:
I’ve received queries from a number of people on opening files from my TAMALPAIS server.
Some people have trouble opening the files from the TAMALPAIS server, here are instructions:
1. on the mac (modern macs with OSX, not ancient macs), this should be easy – just click on the file
(one option: transfer the files to a Mac (see above). If you don’t want to do this (I wouldn’t), then continue)
1. download the FREE 7ZIP program from www.7-zip.org
2. install 7ZIP
3. right-click on the file from TAMALPAIS and select 7ZIP from the menu, select “Open archive”
4. click on the files that show up in the archive window. At any point, you can click on the “extract” button in the toolbar in the window (it is the the large “minus sign” that is blue/purple).
5. for any of the files ending with .tar.gz, or ending with .tar, or ending with .zip, you can continue to do this procedure (starting with step #3).
There are a bunch of files in subarchives (that is, in other .tar.gz files within the archive).
If you have problems, contact me using the contact information on the About page of this blog.