A significant issue in the gene expression field is that arrays and lab protocols are constantly evolving. For this reason, individual studies are generally conducted using a single array type at a single lab during a short time window. However, studies are rarely interpreted on their own. For example, in toxicogenomics, one generally interprets the results for a compound in the context of a historical (internal or external) database of compounds. Prior compounds may have been run on older arrays or using somewhat different laboratory protocols. In such cases, cross platform and cross laboratory normalization procedures can provide a significant benefit:
For basic normalization, methods such as RMA can be used. RMA was co-developed by Dr. Elashoff's group at Gene Logic. RMA can remove a fair amount of the processing-related variability in expression. However, RMA also removes a fair amount of the real biological variability in the study, thus making experimental groups appear more similar than they really are. RMA can be improved upon using several approaches
A combination of all of these approaches can provide a reliable solution to the problem of cross platform normalization. We have succesfully implemented automated cross platform normalization.
SMRI had multiple brain studies, some on the HGU95A, some on the HGU133A, and some on the HGU133 2.0+ array. Further, each study was run in a different laboratory. In order to provide a more powerful meta analysis, the studies had to be statistically normalized. First, we performed a bioinformatic matching of probeset across the three array types. Second, we made use of the fact that non-diseased normal controls were included in each study. This allowed us to factor out lab-specific variation from the expression values. Third, gene-specific normalization functions were computed.
Gene Logic's ToxShield application provides toxicogenomic predictions for compounds. It was built using a database of studies run on the RGU34A array. However, some users of the application were running the RAE230 array. Gene specific normalization functions were used to convert RAE230 array into 'pseudo' RGU34A arrays on which predictions could be calculated. Validation studies found that the normalization provided an approximate 95% correlation between the platforms after the normalization, in contrast to only ~60% prior to normalization.
The procedures for normalizing two-color (or other non-Affy platforms) to Affymetrix generally follows the same procedures outlined above, with a few notable exceptions.
We have developed Present/Absent call algorithms for non-Affymetrix arrays. We also compute commonly used Affymetrix QC metrics such as Percent Present and Scale Factor for non-Affymetrix arrays to aid in the normalization procedures. For chip level normalization, we use a robust lowess procedure. The lowess normalized values are then used as inputs to the gene-specific normalization functions.