
Spot the difference (for geeks)
August 22, 2009The purpose of this post is to give an example of using the ensembl genome database to investigate the genomes by playing spot the difference between species.
Disclaimer
“I don’t know what the hell I’m talking about!” ~ Me, August 2009
I’m not a biologist. Qualification wise I stopped at O level biology (and I don’t think my biology teacher had a good grasp of the subject either).
I’m primarily a software developer and recreational mathematician. So my job is exploring data, constructing models and testing stuff.
Anti-disclaimer
So what if I’m not a biologist? I’m also not a great fan of accepting opinion on the basis of authority. I read books, magazines and journals. I’m not completely stupid. I do accept expert opinion, but I also like to see data.
This means you shouldn’t take any of my opinions on board, which is just as well because I’m not offering any (did I mention that I don’t know what the hell I’m talking about?).
Objectives
- I just want to look at some data and use it to influence my opinion. This has got to be better than either “Dawkins” told me or “God” told me.
- I want people to tell me how to improve my understanding of the Ensembl database (although given the small number of people that look at this blog, that’s probably a tall order!).
(my) methodology
I find this fun but I am a geek and your mileage may vary
Go to http://www.ensembl.org/
In the dropdown where it says “All genomes” (which is clearly not ALL genomes, but whatever!) I’ll pick “Human” because I’m human and I guess there’s quite a lot of data in there.
Okay. Now I get this:
There are some sample entry points (whatever THAT means). Yeah, like I’ll pick those! The purpose of this is to explore, not follow a map.
Now the search box is pretty cool. You seem to be able to put all sorts of biology stuff in there and get results back.
Pick a "biologyish” word. Haemoglobin, cardio, limbic, … . Choose whatever you want. Some words don’t get many results (lung seemed to be a dead-end, whereas pulmonary wasn’t – the moral of the story might be to use big words).
I’ll choose marrow because it was the first word that popped into my head.
If you ONLY choose marrow too then I’ve failed. The purpose of this is to explore not follow my map (especially as I don’t really know where I’m going).
If your want, choose marrow now, but try other things later.
You’ll get something like this back.
You see the little entries at the top? They tend to be dead ends (but I don’t know this to be true).
But the big one at the bottom looks like the database has some info.
Click on that and get a frightening screen.
Okay – I want to play spot the difference. See that “Genomic alignments (38)” link? Click on that.
Ah – now we’re getting somewhere.
There’re a bunch of letters. I know DNA uses the letters ACG and T. This looks like data.
(bear with me – nearly there)
Now let’s play spot the difference – against the Chimpanzee.
Some clever software lines them up against each other (sometimes there are blanks that the software shuffles in for you).

See that gene compared with the Chimp and Human? Their DNA is not identical.
Pretty close though. With humans and chimps it’s a bit like playing “Where’s Wally?”
Humans and dogs next.
Some bits are not even close:
Some bits are frighteningly close:
Next steps
You could ask the system to show you on a graph how close the different species are in terms of shared genome.
But isn’t it so much more fun to look at the streams of letters representing molecules making up the definition of a human and a chimp?
So now, try different genes (I picked genes involved with marrow at random).
Try other stuff too (we’ve only looked at a few of the features of this fantastic site). Explore the site. Have some fun.
Conclusion
Don’t talk saft! I don’t have a conclusion.
Get your own conclusions – that’s the point.