Archive for August, 2009

h1

Spot the difference (for geeks)

August 22, 2009

The purpose of this post is to give an example of using the ensembl genome database to investigate the genomes by playing spot the difference between species.

Disclaimer

“I don’t know what the hell I’m talking about!” ~ Me, August 2009

I’m not a biologist. Qualification wise I stopped at O level biology (and I don’t think my biology teacher had a good grasp of the subject either).

I’m primarily a software developer and recreational mathematician.  So my job is exploring data, constructing models and testing stuff.

Anti-disclaimer

So what if I’m not a biologist? I’m also not a great fan of accepting opinion on the basis of authority. I read books, magazines and journals. I’m not completely stupid. I do accept expert opinion, but I also like to see data.

This means you shouldn’t take any of my opinions on board, which is just as well because I’m not offering any (did I mention that I don’t know what the hell I’m talking about?).

Objectives

  • I just want to look at some data and use it to influence my opinion. This has got to be better than either “Dawkins” told me or “God” told me.
  • I want people to tell me how to improve my understanding of the Ensembl database (although given the small number of people that look at this blog, that’s probably a tall order!).

(my) methodology

I find this fun but I am a geek and your mileage may vary

Go to http://www.ensembl.org/

image

In the dropdown where it says “All genomes” (which is clearly not ALL genomes, but whatever!) I’ll pick “Human” because I’m human and I guess there’s quite a lot of data in there.

Okay. Now I get this:

image

There are some sample entry points (whatever THAT means). Yeah, like I’ll pick those! The purpose of this is to explore, not follow a map.

Now the search box is pretty cool. You seem to be able to put all sorts of biology stuff in there and get results back.

Pick a "biologyish” word. Haemoglobin, cardio, limbic, … . Choose whatever you want. Some words don’t get many results (lung seemed to be a dead-end, whereas pulmonary wasn’t – the moral of the story might be to use big words).

I’ll choose marrow because it was the first word that popped into my head.

If you ONLY choose marrow too then I’ve failed. The purpose of this is to explore not follow my map (especially as I don’t really know where I’m going).

If your want, choose marrow now, but try other things later.

image

You’ll get something like this back.

image 

You see the little entries at the top? They tend to be dead ends (but I don’t know this to be true).

But the big one at the bottom looks like the database has some info.

Click on that and get a frightening screen.

image

Okay – I want to play spot the difference. See that “Genomic alignments (38)” link? Click on that.

Ah – now we’re getting somewhere.

image

There’re a bunch of letters. I know DNA uses the letters ACG and T. This looks like data.

(bear with me – nearly there)

Now let’s play spot the difference – against the Chimpanzee.

image

Some clever software lines them up against each other (sometimes there are blanks that the software shuffles in for you).

image

See that gene compared with the Chimp and Human? Their DNA is not identical.

Pretty close though. With humans and chimps it’s a bit like playing “Where’s Wally?”

Humans and dogs next.

Some bits are not even close:

image

Some bits are frighteningly close:

image

Next steps

You could ask the system to show you on a graph how close the different species are in terms of shared genome.

But isn’t it so much more fun to look at the streams of letters representing molecules making up the definition of a human and a chimp?

So now, try different genes (I picked genes involved with marrow at random).

Try other stuff too (we’ve only looked at a few of the features of this fantastic site). Explore the site. Have some fun.

Conclusion

Don’t talk saft! I don’t have a conclusion.

Get your own conclusions – that’s the point.