Microarray Data - Overview

One of the outstanding problems with microarray data is that it tends to be presented in the form of large coded tables of statistics or graphical representations of what one research team deems important genes. Since we have generated high quality data covering the whole genome, at the level of gene and transcript (as defined by version 56 of ENSEMBL). This website is designed to allow us to share Epstein-Barr virus (EBV)-related microarray data with the wider community in a form that is readily accessible to non-bioinformaticians. To this end, all of the gene and transcript data from our microarray experiments is summarised in readily accessible graphical format generated with Partek Genomic Suite:- Just serch for your favourite gene!! As we publish our array data, both the raw expression level data and the statistical analyses of these data as text files will also be made available.

The focus of our research is the impact of the EBNA3 proteins on host gene expression (see the Allday group website for more details), and to that end, we have constructed independent knockouts of each of the EBNA3 genes, and one missing the entire EBNA3 locus, in the B95-8 strain EBV BAC made by Henri-Jaques Delecluse and Wolfgang Hammerschmidt. These mutants and their rescue/revertant viruses have been used to infect BL31 cells, generating a panel of EBV-positive B-cell lines independently missing the EBNA3s. Additionally we have generated LCLs (ie cell lines grown out from infection of primary B-cells) with EBNA3B knockout virus, and the Kempkes lab have generated LCLs using two EBNA3A mutant viruses (Hertle et al).

We have quantified the RNA from these cell lines on Affymetrix Human Exon 1.0ST microarrays, and summarised this at both the gene and transcript level using the X:Map annotation of the human exon array probes onto the genome (as annotated by EBNSEMBL - version 56), and then summarised at the gene and transcript level with the MMBGX algorithm (Turro et al 2009).

To facilitate the interpretation of the data at the gene level, we have generated 'dotplots' to allow the visualisation of the microarray data on a gene-by-gene basis, and to look at individual transcripts for those genes. The dotplot (see example below) separates the data by columns for the virus variant and colours each dot for the type of virus (revertants classed as wild-type virus for the purposes of clarity). For the LCL data, 'dot' shape has been employed to indicate the genetic donor as this affects the basal expression level for some genes.

The scale of the plot varies from gene to gene, so it is importatnt to note the y-axis values: Gene expression is measured on a log 2 scale (so a unit difference represents a doubling of RNA abundance, 2 units a quadrupling and so on). In numerical terms, expression values of under 3 essentially reprents no expression, while few or no genes acheive expression levels of over 12. While there is no easy correlation between RNA and protein levels, we have generally been able to detect protein for mRNA values of over 6.

So all that remains is for you to Log In (if necessary), enter the ID for the gene you are interested in (% is a wildcard for the search) and tell it which data set you are interested in. I hope you find something intriguing!!