RDP will, without any prior information on the names and numbers of ancestral populations, deconstruct chromosome-scale SNP/sequence datasets into any number of sub-datasets containing groups of aligned SNP/nucleotides that share common ancestries. It will do this by applying a range of established heuristic recombination event detection and analysis tools that have only previously been usable in the study virus genome-scale datasets.
RDP is a PC application that takes aligned nucleotide sequence data in any of the standard alignment formats (e.g. nexus, fasta, clustalw, paup, phylip). For the analysis of SNP data from multiple different individuals the SNPs will need to be arranged in the order that they occur on their respective chromosomes and SNPs from different chromosomes should be analysed separately. SNPs must also be phased (i.e. with all the SNPs in each input sequence derived from the same chromosome). Finally the phased SNPs must aligned so that missing data/indels in particular individuals will be represented by a gap character = “-“. It will then be possible to take these SNP alignments and directly loaded into the program for analysis. Alternatively if full sequences for either individual chromosomes or concatenated exome sequences on individual chromosomes are to be analysed, the sequences will need to be aligned (with a program such as mauve) and saved in a format such as xmfa before the program will be able to load them. It will be possible to analyse up to 1000 chromosomes at a time with the tool.
RDP will output result files in three different formats:
- The output in “.rdp” file format will allow a user to examine the output in great depth using the program RDP4. RDP4 implements a wide range of heuristic and parametric recombination analysis, tree drawing and matrix tools and also provides a variety of data representation features.
- The output in “.csv” file format will allow a user to browse the results in any standard spreadsheet application (such as Microsoft excel) and will detail the positions of recombination breakpoints, the identity of recombinant sequences, the identity of sequences resembling the parental sequences, and degrees of statistical support, determined with a range of five different recombination detection methods, for the identified tracts of sequence falling between the identified breakpoints having been derived through recombination.
- RDP will provide “distributed alignments” of SNPS/genome fragments derived from any number of user specified ancestral populations – i.e. it will split the component nucleotides of input sequences up into different alignments based on the ancestral populations from which they were derived (note that although the recombination analysis carried out by the tool will not require specification of ancestral population numbers, it will be up to the user to specify how many alignments the program should split the data into).
The key methods implemented by RDP are five of the heuristic recombination detection methods implemented in the computer program RDP4: RDP, GENECONV, MAXCHI, CHIMAERA and 3SEQ. The combined results of these methods will then be processed by the tool using the same algorithm implemented in RDP4 for the identification of recombinant sequences, the identification of sequences resembling parental sequences and the identification of recombination breakpoint locations. Crucially the RDP4 algorithm does not require any prior information on either the number of underlying populations or the identification of likely admixed individuals. Also, by identifying sequences with shared recombinant histories the algorithm counts recombination events relatively accurately and can therefore detect evidence of recombination hot and cold-spots.
RDP applies a number of recombination detection and analysis methods. It runs well under Windows 95/98/NT/XP/VISTA/7 and may/may not run properly under Windows 8. RDP also runs well on most windows emulators. For Mac users PlayOnMac is recommended and for Linux users Wine is recommended. You may download:
- The most up to date reasonably stable version of the program (RDP4 Beta 4.60 including VB runtimes, accesory apps, LDHat lookup tables and a large 3Seq p-value lookup table)
- A probably more stable older version of the program (RDP4 Beta 4.56 including VB runtimes, accesory apps, LDHat lookup tables and a large 3Seq p-value lookup table)
- An old version of the program (RDP3.44 including VB runtimes, accesory apps, LDHat lookup tables and a large 3Seq p-value lookup table)
- A user manual
- A very old version of the program, RDP2, that can be used to load and explore older project files (RDP2 and earlier)
More information on RDP can be found at: http://web.cbio.uct.ac.za/~darren/rdp.html
Please cite: Martin DP, Murrell B, Golden M, Khoosal A, & Muhire B (2015) RDP4: Detection and analysis of recombination patterns in virus genomes. Virus Evolution 1: vev003 doi: 10.1093/ve/vev003 [PDF]