Massively Parallel Sequencing in Biopharmaceutical QC Testing – Is it too much information?
13th May 2011
By: Dr Daniel Galbraith, CSO
There is a tendency towards conservatism utilising tried and tested methodologies in the safety and QC testing schemes designed for biopharmaceuticals, rightly so. In the early 1990s PCR was a useful research tool and was beginning to be used in targeted adventitious agent testing in cell lines and products, particularly virus vaccines, because of its sensitivity and ability to detect “silent” infections of cell lines. This is not the place to discuss the pitfalls and problems of using PCR but one factor that always caused headaches in the design of a robust PCR test was ensuring that the amplimers would pick up as wide a range of variants of the target virus as possible. Even with our best efforts it is impossible to defend against the argument that we may miss a variant and PCR would never replace catchall tests such as in vitros. Recent advances in genomics and proteomics are giving us the tools to expand our ability to validate cell lines and materials and qualify cell-derived products, including vaccines, in adventitious agent testing bypassing the specificity of PCR, functional characterisation of the expression cassettes and even the sequences of virus vaccines and gene-therapy vectors lots.1,2,3
On the surface, the application of random primed PCR amplification and Massively Parallel Sequencing methodology to QC testing appears to be the ultimate tool today, saying that though, new technology is able to sequence individual nucleic acids directly without amplification. Nevertheless, all “next generation” sequencing platforms will generate data on adventitious agents, in a the product at the DNA/ RNA level, establish the nucleotide sequence of a an expression cassette or mRNA, or allow analysis of a virus vaccine or vector and any variants of the same arising during production. Aside from the not insignificant problem of showing the methodology will detect very low levels of potential contaminants routinely, allowing for artefacts and validating the database and algorithms required to mine the vast amount of data generated. The same sensitive, broad brush approach generates a volume of data which will be “viral noise”. Why? one of the more fascinating observations from looking at the data from the human genome sequencing and subsequent studies, is the large percentage of viral-derived sequences present, reported at over 8%, including sequences related to retroviruses, Bornaviruses, Filoviruses, Circoviridae and Parvoviridae.4,5,6,7 It is safe to assume that a similar picture will apply across all species. Viral noise will be generated from both DNA and RNA analyses. One interesting approach utilised by Onions and Kolman has focused on the transcriptome, which may prove useful for detecting latent viruses but they have also devised a “cell free” analysis focussed on packaged sequences, i.e. potential virions, thereby circumventing a large number of cellular sequences aside from those packaged. What do any suspect data obtained mean? Detecting well characterised viral sequences leads to fairly clear cut responses but as Keith Peden commented8, novel viruses will be discovered but discovery does not provide any information on pathogenicity. The relevance of such viruses to safety would need to be assessed. Similarly what does a percentage of variants within a virus preparation signify? The data generated by these methodologies in the future will provide enormous amounts of information, our challenge is to interpret the results and apply sensible actions in response to the findings.