Skip to main content

Posts

Showing posts with the label Regular expression

How to find a long indel from Nsp2 alignment

Motivation Zhou et al . presented that an unique 30-amino-acid deletion in Nsp2-coding region is a key feature to classify whether a strain is a highly pathogenic porcine reproductive and respiratory syndrome virus (PRRSV). And the Nsp2, nonstructural protein 2, has been shown to undergo remarkable genetic variation, primarily in its middle region, while exhibiting high conservation in the N-terminal putative protease domain and the C-terminal predicted transmembrane region ( Han et al . 2007 ). This post aims to show how to find a quite large deletion in a specific coding-region with positional tolerance. Figure 1. The 30-Amino-Acid Deletion in the Nsp2 of Highly Pathogenic PRRSV.  ( Zhou  et al.  2009 ) Method and Implementation Pairwise alignment between a sequence of interest and a reference sequence (ORF1a of VR-2332 strain) is an essential step for finding insertions and/or deletions, shortly indels. The two sequences were aligned with BLAST (Altschul et al. 1999), e