Commentary: ChIP-chip vs ChIP-seq and $$

Ok, so with the rush to ChIP-seq and all the hype (much of it deserved) around “next-generation” sequencing generally, you might think that arrays are dead as used for ChIP (i.e. ChIP-chip).

I don’t think this is going to happen for simple cost reasons. For the near future, there will be lots of genome-scale ChIP studies and, for these, I strongly support ChIP-seq. It is a lot cheaper for better data. But I see a strong trend toward ChIP studies targeted toward specific biological questions and often questions requiring large sample numbers (e.g. epigenetic changes is cancer).

The financial math really isn’t that hard; with ChIP-seq running ~$5000 for external users and ChIP-chip running at $660 for external users (NimbleGen single arrays), it seems pretty clear that if a fair number of samples are involved, ChIP-chip is the way to go. That is, unless high-res whole genome coverage is absolutely necessary (usually not).

Furthermore, for taking chances on experiments, $660/sample is a lot more appealing on a lab budget than $5000/sample, particularly when you consider that, in the real world, even poor testing of a speculative idea is going to take 2 or 3 samples at minimum (=~$15,000 for ChIP-seq vs $1980 for ChIP-chip). A lot of labs can blow $2000; blowing $15,000 really hurts.

Given this analysis, it seems to me that NimbleGen should really push the low end of the market – in other words, try to get the cost even lower on a per sample basis (for fewer spots). I think they are on the right track with their multiplex arrays, but development of these has been disappointingly slow, and last time I looked, the cost structure around the 4plex with 70K/quadrant really wasn’t very attractive.

I may revisit this topic another time, but that is it for now.

TAMALPAIS and promoter arrays

TAMALPAIS NimbleGen Promoter Arrays Array Analysis Problems Mark Bieda

I’ve been receiving some questions on TAMALPAIS usage for promoter arrays via email.

On the TAMALPAIS website, I say “Do not use this for promoter arrays.

This is actually not quite true; there are a limited number of cases in which TAMALPAIS will perform well for promoter arrays. In this post, I discuss this.

When TAMALPAIS is ok for promoter arrays:
In short:
1. If your factor only binds to a tiny portion of the promoters (<5%), then TAMALPAIS will perform ok.
2. More correct – and important – if only a small number of probes on the array are within binding sites for your factor, then you are ok. So: for promoter array designs with long promoters, you might have 15% of the promoters with a binding site. But only a small number of probes in the binding sites. (Hopefully this makes sense.)

Why do I say “Do not use TAMALPAIS for promoter arrays”?
If you have a factor that binds to (or exists in) a lot of promoter regions – like POLII or some histone modifications – then TAMALPAIS will give you bad results. I don’t want that to happen. Right now, study of histone mods and POLII are a big deal, so I don’t want people to be unhappy.

If not TAMALPAIS, then what?
There are a number of options. I developed maxfour to score promoters (see Krig et al. 2007 in JBC). I will be releasing an easy to use version of this software by the fall 2008 (planned, not a promise). This is really the best option with NimbleGen’s current crop of designs, in my opinion. Someone else may have some great promoter array analysis software; I’m not aware of this right now – feel free to email me or leave comments. I don’t mean to be unfair to other bioinformaticians with this.

What about the promoter array analysis server?
Ah, yes. This does very limited analysis – see my post on it in this blog (click the promoter array category button on the sidepanel).

On the promoter array analysis server

NimbleGen Promoter Arrays Mark Bieda server Analysis

Note: minor corrections on June 9, 2008
The promoter array server is located at this site

IMPORTANT USAGE NOTE: USE FIREFOX (Internet explorer appears to create issues)

What does it do?

1. This does a simple list comparison using NimbleGen .tab files
In other words, it just outputs the number of entries that are the same in the lists for the top100, top 200, etc.
2. this is a very simple application; just a convenience, really
3. this does not do an analysis like TAMALPAIS (see the category on TAMALPAIS on this blog).

File format:
this is based on the .tab file format from NimbleGen; it has to have a dummy line to begin.
here is a sample of an ok file format:

first dummy line
genenameholder CHR10_100017497_100020197 maxfourv02 0.2025
genenameholder CHR10_100164431_100167131 maxfourv02 0.7775
genenameholder CHR10_100196194_100198894 maxfourv02 0.6625

Notes on the format:
1. IMPORTANT: all fields are separated by tabs
2. The first field can vary and be meaningful.
3. The third field can vary and be meaningful.
4. The second field is the promoter name used for comparisons
5. The fourth field is the promoter value (numerical value) used for sorting (that is, determining order).
6. It’s ok to have more fields than the four. In other words, files of 10 fields are ok too. But the program will only look at the second and fourth fields.

Notes on usage:
1. the data can be entered unsorted