TAMALPAIS and promoter arrays

TAMALPAIS NimbleGen Promoter Arrays Array Analysis Problems Mark Bieda

I’ve been receiving some questions on TAMALPAIS usage for promoter arrays via email.

On the TAMALPAIS website, I say “Do not use this for promoter arrays.

This is actually not quite true; there are a limited number of cases in which TAMALPAIS will perform well for promoter arrays. In this post, I discuss this.

When TAMALPAIS is ok for promoter arrays:
In short:
1. If your factor only binds to a tiny portion of the promoters (<5%), then TAMALPAIS will perform ok.
2. More correct – and important – if only a small number of probes on the array are within binding sites for your factor, then you are ok. So: for promoter array designs with long promoters, you might have 15% of the promoters with a binding site. But only a small number of probes in the binding sites. (Hopefully this makes sense.)

Why do I say “Do not use TAMALPAIS for promoter arrays”?
If you have a factor that binds to (or exists in) a lot of promoter regions – like POLII or some histone modifications – then TAMALPAIS will give you bad results. I don’t want that to happen. Right now, study of histone mods and POLII are a big deal, so I don’t want people to be unhappy.

If not TAMALPAIS, then what?
There are a number of options. I developed maxfour to score promoters (see Krig et al. 2007 in JBC). I will be releasing an easy to use version of this software by the fall 2008 (planned, not a promise). This is really the best option with NimbleGen’s current crop of designs, in my opinion. Someone else may have some great promoter array analysis software; I’m not aware of this right now – feel free to email me or leave comments. I don’t mean to be unfair to other bioinformaticians with this.

What about the promoter array analysis server?
Ah, yes. This does very limited analysis – see my post on it in this blog (click the promoter array category button on the sidepanel).

On the promoter array analysis server

NimbleGen Promoter Arrays Mark Bieda server Analysis

Note: minor corrections on June 9, 2008
The promoter array server is located at this site

IMPORTANT USAGE NOTE: USE FIREFOX (Internet explorer appears to create issues)

What does it do?

1. This does a simple list comparison using NimbleGen .tab files
In other words, it just outputs the number of entries that are the same in the lists for the top100, top 200, etc.
2. this is a very simple application; just a convenience, really
3. this does not do an analysis like TAMALPAIS (see the category on TAMALPAIS on this blog).

File format:
this is based on the .tab file format from NimbleGen; it has to have a dummy line to begin.
here is a sample of an ok file format:

first dummy line
genenameholder CHR10_100017497_100020197 maxfourv02 0.2025
genenameholder CHR10_100164431_100167131 maxfourv02 0.7775
genenameholder CHR10_100196194_100198894 maxfourv02 0.6625

Notes on the format:
1. IMPORTANT: all fields are separated by tabs
2. The first field can vary and be meaningful.
3. The third field can vary and be meaningful.
4. The second field is the promoter name used for comparisons
5. The fourth field is the promoter value (numerical value) used for sorting (that is, determining order).
6. It’s ok to have more fields than the four. In other words, files of 10 fields are ok too. But the program will only look at the second and fourth fields.

Notes on usage:
1. the data can be entered unsorted