Open Access Debate

Plos_cartoon001Recently in the Scientific American magazine, an article "Open Access to Science Under Attack" by David Biello talks about the advocates for free access scientific research are now under fire from high-profile public relations flaks and high-powered lobbying groups. He has followed up with a blog post sharing different perspectives towards the open access debate.

For that matter, I am an advocate of open access. As a theoretical physicist, I took for granted that I could access new research papers from the Los Alamos preprint archive (moved to Cornell later). In biology, you have to pay to read these publications. Of course, with the emergence of Public Library of Science (PLOS), it is hoped that the biology community can adopt the physics model such that the scientists can free access to the journals.

Technorati Tags: ,

January 24, 2007

Motif Discovery and Validation

It might be interesting to give an introduction to the problem of finding motifs in computational biology. The problem is about finding over-presented words or short sequences/patterns in the entire genome (human, mouse).

The questions associated are:

  • Given a set of sequences which motifs are significantly over-represented with respect to a given background model? (Motif Discovery)
  • Given a set of motifs, which ones are actually putative binding sites? (Motif Validation)

The first question requires you to find a region of interest in the genome, for example, exons in the protein coding genes. It requires you to scan the regions and learn the interesting words from the sequence regions in the human genome. One common application is that to find important transcription factor (i.e. a protein whose function is to bind to specific genomic sequences on 5' end from the gene sequence) binding sites in eukaryotes.

The second question is to use existing annotated words (learned from wet-lab experiments) to scan novel regions in the genome. One common way to do this is to use a known set of data, for e.g. the TRANSFAC database, which contains the entire database of annotated position weight matrices and consensus words found from experiments.

Glossary:

  • Motif: a family of words where alternative symbols are acceptable.
  • Binding Site/Region: a statistically over-represented signal that may or may not link to a biological process or pathway.
  • Consensus Sequence: A biological sequence well-annotated and characterized.
  • Position Weight Matrices: a model which records a probability distribution over symbols which can be observed at each position.

Technorati Tags:

January 23, 2007

Good Math, Bad Math

I discovered an interesting blog about basic mathematics. The blogger, MarkCC, is a Computer Scientist working as a researcher in a corporate lab. The blog title is "Good Math, Bad Math" and focuses on finding the fun in good math and quashing the bad math and followers who advocate it.

Here are some articles about basic statistics:

Normal Distribution
Mean, Median and Mode
Standard Deviation
Margin of Error

Technorati Tags:

January 10, 2007

Malaria Genomics Resources

Here are some data resources pertaining to malaria genomics in my collection of public data for the infectious disease.

  1. Plasmodium falciparum Genome Projects, Wellcome Trust Sanger Institute: contains the sequencing data for various Plasmodium species (and not all are completed). They are sequencing chromosomes 1, 4, 5, 6 ,7, 8, 9 and 13. The project status for all the sequencing projects of three institutes are here.
  2. Plasmodium falciparum Genome Sequencing, TIGR. They are sequencing chromosome 2, 10, 11 & 14.
  3. Plasmodium falciparum Genome Sequencing, Stanford. They are sequencing chromosome 12.
  4. The Plasmodium Genome Resource, PlasmoDB, current release 5.2: a data repository for Plasmodium related species, mainly on gene annotation and fasta data.
  5. NCBI Malaria Genetics and Genomics: This web resource provides data and information relevant to malaria         genetics and genomics. These resources include organism specific sequence         BLAST databases (Plasmodium falciparum only, all Plasmodium         ), genome maps, linkage markers, and information         about genetic studies.

Technorati Tags: , , ,

January 09, 2007

Introduction to Malaria

190pxplasmodium Malaria is one of the most important vector borne infectious diseases in the world. The history of this disease dates back to the beginning of early civilizations in history. It infects between 300 and 500 million people every year and causes between one and three million deaths annually, mostly among young children in Africa [1,2]. It is one of the diseases which hinders economic development and a major cause of poverty in developing countries. Here are some interesting scientific facts of the disease (gathered from public data and scientific publications).

  • The disease is caused by protozoans parasites from the four species of the genus Plasmodium: P. falciparum, P. vivax, P. ovale and P. malariae.
  • The scientific understanding of this disease came with the establishment of the germ theory and the development of microbiology. The first and most important discovery made is the discovery of the malaria parasite and its mode of transmission. Patrick Manson made the discovery that mosquito acts as a vector.
  • What is the life cycle of the disease? It begins when an infected anopheline mosquito injects sporozoites, the infectious stages; into the blood of the host (for example, human beings).   These sporozoites enter and multiply in liver cells, and thousands of the daughter forms, merozoites, are released into the blood. These merozoites invade the red blood cells, in which another phase of multiplication happens. With the process repeating indefinitely, this give rise to the symptoms of this disease. However, not all merozoites divide. In some cases, they develop into sexual stages, the male and female gametocytes, that are taken up by another mosquito when it feeds. As a result, fertilization and zygote formation occur in the gut of the mosquito. The zygote develops into an oocyst on the outside of the mosquito gut, and within the oocyst, there exist another phase of multiplication. This process results in the production of sporozoites that reach the salivary glands to be injected into a new host.
  • From a macroscopic view (translating from the microbiology to the disease picture), the malaria parasites causes symptoms that include symptoms of anemia (light headedness, shortness of breath, tachycardia etc.), as well as other general symptoms such as fever, chills, flu-like illness, and in severe cases, coma and death.

References:
[1] F. E. G. Cox, "History of Human Parasitology", Clinical Microbiology Reviews, 2002, 15(4):595-612.
[2] Malaria in Wikipedia
[3] Nature, Focus on Malaria page.

Technorati Tags:

January 08, 2007

Nature Podcast

Naturepodcast_2 As Elia pointed out earlier in my earlier post on Science Friday, another podcast worth listening is the Nature magazine podcast. Every week, they will talk about the recent happenings in the scientific world and ask the expert to explain their findings. The podcast can be technical at times because the host will ask follow up questions that pertains to the published findings of various scientists. Interestingly, the podcast is presented by Chris Smith from my alma mater, University of Cambridge, UK.

You can look at the archive for earlier podcasts and transcripts for the show. It is updated every week.

Technorati Tags: ,

January 05, 2007

My Scientific Interest: Theoretical Physics

A reader of my blog has asked me to talk about my research interests. I will break it down to three major areas: physics, biology and economics. For this entry, I will talk about physics, as the tools I use to study the other two came from this one.

My doctorate is in the field of physics, more accurately, theoretical astrophysics and cosmology. My PhD thesis is on alternative cosmologies, where I look at some alternative theories to the mainstream big bang cosmology. The most important part of my thesis is to look for signatures of extra dimensions in the cosmic microwave background (the remnant 3K radiation left over after the big bang after the expansion of the universe). I belong to the group of physicists who believe that experiments are necessary drivers to good theories. Currently, string theorists have an elegant theory known as M-theory that may possibly unify the four forces of nature, namely electromagnetic, gravity and nuclear (strong and weak). However, there is no experimental evidence to give some indication that the theory is correct. For that matter, we have not found the Higgs boson, the last of the 18 particles that is predicted by standard model in particle physics.

The only important result that I did for my PhD thesis and it is perhaps a small and modest contribution is to produce how the tensor anisotropies [1] (in layman, perturbations that relates to the gravitational waves) can be modified by theories of extra dimensions. In fact, if you check the references that cites my paper, it is recently brought up in a review for the PLANCK project, which actually details the efforts of the European Space Agency on the new satellite which they will be sending up to space in 2008/2009. My thesis supervisor, Professor Anthony Lasenby is one of the key players to this project.

After that, I spent a year trying to find a postdoctoral position to do something different, because I was wondering whether my field is too saturated. Perhaps, like in the tradition of Crick and Watson, Black and Scholes, I should go into another field and contribute. So, I did some work in economics and subsequently move into computational biology, but specializing in mainly on the human genome project. While I was looking for my next move, I wrote a paper with William Saslaw to resolve the paradox on the "coldness" of the Hubble Flow and provided some idea on how to calculate binding energy distribution for galaxies. One of my NUS students is currently working with me on the followup to that paper. Using my theory, we showed that our results tally with an old experiment conducted by Garcia (1994) how the groups of galaxies are distributed in our Universe.

I was lucky to do my PhD research in the famous Cavendish Laboratory. For those who do not know, it is where the nuclear atom is split (Ruthorford) and the structure of the DNA is discovered (Crick and Watson).

References:
[1]  Bernard Leong, Anthony Challinor, Roy Maartens and Anthony Lasenby, "Braneworld Tensor Anisotropies in the CMB" astro-ph/0208015, Phys. Rev. D66:104010, 2002. You check the citation thru one of the links and find the review paper on Planck.
[2]  Bernard Leong and William Saslaw, "Gravitation Binding, Virialization and the Peculiar Velocity Distribution of the Galaxies", ApJ, 608:636-646, 2004.

Technorati Tags: , ,

January 03, 2007

Programming Languages for Computational Biology & Bioinformatics

Here is a list of 2 common programming languages which I use for computational biology (processing and analyis of biological data) and bioinformatics (database management & information technology for biology) :

  • BioJava:  project dedicated to providing a Java framework for processing biological data. It include objects for manipulating biological sequences, file parsers, DAS client and server support, access to BioSQL and Ensembl databases, tools for making sequence analysis GUIs and powerful analysis and statistical routines including a dynamic programming toolkit.
  • BioPerl: project dedicated to providing a Perl framework for processing data in bioinformatics.

For beginners, the best place to learn BioJava is the BioJava Cookbook where you will find some recipes in how to process sequences and performing some computations in biology. 

Technorati Tags: , , ,

January 02, 2007

Science Friday

Squarelogo300_1Popularizing science is not an easy job, particularly when the purists complain that a difficult concept in science may be misinterpreted. That's the price of pedagogy over research. Even as a practitioner of science, I also want to make sure that I can explain what I do to the general public. Of course, it is also good to inspire young minds to do science so that we can come up with cool innovations and technology for our work.

If you want to listen to a podcast on the latest trends of science and also interviews with prominent scientists (who are experts in their areas), a good place to go to is Science Friday hosted by Ira Flatow in the US. Recently, they have started a blog to document their work. Oftentimes, they will talk about a recent advance in a scientific area (for e.g. malaria research, cosmology), invite an expert to talk about the development of the view to the layman, followed by getting people to phone in to ask questions to the expert.

When I am doing my research work in the institute, I often turn this on so that I am always keeping tabs with the science. If you have ITunes and like to listen to podcasts, just do a search on ITunes store and type "Science Friday" or you can directly get the URL from their main website.

Technorati Tags: , ,

January 01, 2007

Prelude

Welcome to my science abode. It has been sometime that I want merge blogging with my scientific research. At the same time, I want to create a repository that can help me collect scientific resources (publications, data and information) for my own research. It also helps me to clear my mind on the things that I want to understand about the research I am doing.



Technorati Tags: