Logo

Cross Platform and Cross Laboratory Normalization

A significant issue in the gene expression field is that arrays and lab protocols are constantly evolving. For this reason, individual studies are generally conducted using a single array type at a single lab during a short time window. However, studies are rarely interpreted on their own. For example, in toxicogenomics, one generally interprets the results for a compound in the context of a historical (internal or external) database of compounds. Prior compounds may have been run on older arrays or using somewhat different laboratory protocols. In such cases, cross platform and cross laboratory normalization procedures can provide a significant benefit:

  • Decreased variability, increased precision in array analyses
  • Ability to leverage multiple studies for increased power
  • Ability to use reference databases in combination with your own studies

Affymetrix <-> Affymetrix

For basic normalization, methods such as RMA can be used. RMA was co-developed by Dr. Elashoff's group at Gene Logic. RMA can remove a fair amount of the processing-related variability in expression. However, RMA also removes a fair amount of the real biological variability in the study, thus making experimental groups appear more similar than they really are. RMA can be improved upon using several approaches

  • Bioinformatic matching of genes/probes
  • QC metrics at the probe/gene/chip level
  • Gene specific normalization functions
  • Using study control sample/genes in the normalization procedures

A combination of all of these approaches can provide a reliable solution to the problem of cross platform normalization. We have succesfully implemented automated cross platform normalization.

Case Study #1

SMRI had multiple brain studies, some on the HGU95A, some on the HGU133A, and some on the HGU133 2.0+ array. Further, each study was run in a different laboratory. In order to provide a more powerful meta analysis, the studies had to be statistically normalized. First, we performed a bioinformatic matching of probeset across the three array types. Second, we made use of the fact that non-diseased normal controls were included in each study. This allowed us to factor out lab-specific variation from the expression values. Third, gene-specific normalization functions were computed.

Case Study #2

Gene Logic's ToxShield application provides toxicogenomic predictions for compounds. It was built using a database of studies run on the RGU34A array. However, some users of the application were running the RAE230 array. Gene specific normalization functions were used to convert RAE230 array into 'pseudo' RGU34A arrays on which predictions could be calculated. Validation studies found that the normalization provided an approximate 95% correlation between the platforms after the normalization, in contrast to only ~60% prior to normalization.

Affymetrix <-> Non-Affymetrix (e.g. Two-Color Arrays)

The procedures for normalizing two-color (or other non-Affy platforms) to Affymetrix generally follows the same procedures outlined above, with a few notable exceptions.

  • Need for an Present/Absent call that is similar to Affymetrix's (surprisingly, there are no standards for calls on two color or non-Affymetrix single color arrays)
  • Need for good QC metrics (again, standards are lacking in this area)
  • Need a robust chip level normalization comparable to RMA on Affymetrix

We have developed Present/Absent call algorithms for non-Affymetrix arrays. We also compute commonly used Affymetrix QC metrics such as Percent Present and Scale Factor for non-Affymetrix arrays to aid in the normalization procedures. For chip level normalization, we use a robust lowess procedure. The lowess normalized values are then used as inputs to the gene-specific normalization functions.

news

image 2

Patient Profiles version 4.0 released.