Simulations with Reads2MapTools
There are three different ways to generate simulated data to build maps in Reads2MapTools:
- Converting PedigreeSim software output to OneMap raw data: function
pedsim2raw
This function just convert the PedigreeSim software files to OneMap
raw data. The only source of errors are missing data controlled by
miss.perc
argument.
- Converting PedigreeSim output to VCF file simulating allele depth
with different distributions: function
pedsim2vcf
Simulating the VCF you can include other sources of errors. The map
is simulated in PedigreeSim and a statistical model can be selected to
simulate the reference and alternative alleles counts for each genotype.
The genotypes can be reestimated according to the alleles counts
simulated using the genotype callers updog (function
updog_genotype
), polyRAD (function
polyRAD_genotype
) and SuperMASSA (function
supermassa_genotype
).
- Use a chromosome of a reference genome to simulate a bi-parental
cross and RADseq Illumina reads for each individual: workflow
SimulateReads.wdl
The workflow SimulatedReads
is available in Reads2Map
repository in github. It performs the simulation of RADseq Illumina
reads for outcrossing population, the SNP calling is made in Freebayes and GATK software, the
genotype calling in updog, supermassa
and polyRAD, and build
genetics maps using every combination in OneMap and GUSMap. The workflow is
wroten in Workflow Description Language
(WDL) and cromwell workflow
management system. The parameters of every software implented can be
easily changed. All the results can be evaluated in a shiny
app.
Run PedigreeSim
It will require java installed. You can run PedigreeSim directly and
just use the output files in OneMap or you can use a function named
run_pedsim
that facilitates this procedure.
The function do not provide every possibilities offered by PedigreeSim software. If you want to change any parameter that is not available in the function, please use directly the PedigreeSim software. Here is the software documentation for more information.
# Required packages for this tutorial
library(onemap)
library(Reads2MapTools)
library(ggplot2)
library(vcfR)
# For outcrossing population
run_pedsim(chromosome = "Chr1", n.marker = 54, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("A1", "A2", "A3", "A4", "B1.5", "B2.6", "B3.7",
"C.8", "D1.9", "D1.10", "D1.11", "D1.12", "D1.13",
"D2.14", "D2.15", "D2.16", "D2.17", "D2.18"),
n.types = rep(3,18), pop = "F1",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# For F2 population
run_pedsim(chromosome = c("Chr1", "Chr2"), n.marker = c(75, 75),
tot.size.cm = c(100,100), centromere = c(50,50),
n.ind = 200, mk.types = c("A.H.B","C.A", "D.B"),
n.types = rep(50,3), pop = "F2",
name.mapfile = "mapfile_f2.map", name.founderfile="founderfile_f2.gen",
name.chromfile="sim_f2.chrom", name.parfile="sim_f2.par",
name.out="sim_f2")
The function allows to create f2 intercross and backcross populations
from bi-parental cross of inbred lines and segregating F1 population
from bi-parental cross of heterozygous parents. You can define it in
pop
argument. You can also define the number of chromosomes
(argument chromosome
), the number of markers in each
chromosome (n.marker
), the total size of the groups in cM
(tot.size.cm
), the position of centromere
(centromere
), number of individuals in the population
(n.ind
), the marker types (mk.types
, see the
table in session Creating the data file
of Outcrossing
populations vignette) and the number of markers of each type
(n.types
).
We suggest you to open the output files founderfile
,
chromfile
, mapfile
and parfile
to
check if they agree with your intentions before proceed to other
analysis.
Simulate OneMap raw data
Once you run PedigreeSim, you should have the output file
genotypes.dat
. To convert this to OneMap raw file, you just
need to specify the cross type (cross
), which ones are the
parents (parent1
and parent2
) and if you want
to include missing genotypes (miss.perc
). Only cross types
outcross
and f2 intercross
are supported by
now.
# For outcrossing population
pedsim2raw(genofile = "sim_out_genotypes.dat", cross="outcross", parent1 = "P1", parent2 = "P2", out.file = "sim_out.example.raw", miss.perc = 0)
<- read_onemap("sim_out.example.raw") onemap.obj
## Working...
##
## --Read the following data:
## Type of cross: outcross
## Number of individuals: 200
## Number of markers: 54
## Chromosome information: no
## Position information: no
## Number of traits: 0
plot(onemap.obj, all = F)
# For F2 population
pedsim2raw(genofile = "sim_f2_genotypes.dat", cross="f2 intercross", parent1 = "P1", parent2 = "P2", out.file = "sim_f2.example.raw", miss.perc = 0)
<- read_onemap("sim_f2.example.raw") onemap.obj
## Working...
##
## --Read the following data:
## Type of cross: f2
## Number of individuals: 200
## Number of markers: 150
## Chromosome information: no
## Position information: no
## Number of traits: 0
plot(onemap.obj)
Simulate VCF file
The same output file from PedigreeSim, the genotypes.dat
can be used to simulate a VCF file together with the PedigreeSim
mapfile
and chrom
. The advantages to simulate
a VCF instead of onemap raw file is that VCF is a standard file format
and can store a lot of other information included the allele counts,
usually in the field AD
or DPR
. The
pedsim2vcf
function can simulate the allele counts using
negative binomial or updog distributions (argument method
).
The main parameters for the distributions are defined with arguments
mean.depth
, that defines the mean allele depth in the
progeny, p.mean.depth
that defines the mean allele depth in
the parents, argument disper.par
defines the dispersion
parameter, mean.phred
defines the mean phred score of the
sequencing technology used. The function can also simulate missing data
(miss.perc
). Through argument pos
and
chr
you can define vectors with physical position and
chromosome of each marker. With argument haplo.ref
you
define which one of the haplotypes in genotypes.dat
will
the considered the reference. Establishing phase
as TRUE,
the VCF will have phased genotypes. After allele counts are simulated,
the genotypes are reestimated using a binomial distribution. The VCF
generated by this function only have one or two FORMAT fields, the GT
and AD (if counts = TRUE
). Dominant markers are not
supported by this function.
# Dominant markers are not supported then, simulate other dataset with only codominant markers
run_pedsim(chromosome = "Chr1", n.marker = 42, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("B3.7", "D1.10", "D2.15"),
n.types = rep(14,3), pop = "F1",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# For outcrossing population
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = 100,
p.mean.depth = 100,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=1,
od=0.00001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)
## Counts simulation changed 0.2828854 % of the given genotypes
library(vcfR)
<- read.vcfR("simu_out.vcf") vcfR_out.obj
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
<- onemap_read_vcfR(vcfR.object = vcfR_out.obj,
onemap_out.obj cross = "outcross", parent1 = "P1", parent2 = "P2")
plot(onemap_out.obj, all=F)
# For F2 population
# Dominant markers are not supported by this function
run_pedsim(chromosome = c("Chr1", "Chr2"), n.marker = c(75, 75),
tot.size.cm = c(100,100), centromere = c(50,50),
n.ind = 200, mk.types = c("A.H.B"),
n.types = 150, pop = "F2",
name.mapfile = "mapfile_f2.map", name.founderfile="founderfile_f2.gen",
name.chromfile="sim_f2.chrom", name.parfile="sim_f2.par",
name.out="sim_f2")
pedsim2vcf(inputfile = "sim_f2_genotypes.dat",
map.file = "mapfile_f2.map",
chrom.file = "sim_f2.chrom",
out.file = "simu_f2.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = 100,
p.mean.depth = 100,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=1,
od=0.001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)
## Counts simulation changed 0.1412151 % of the given genotypes
<- read.vcfR("simu_f2.vcf") vcfR_f2.obj
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 150
## column count: 212
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 150
## Character matrix gt cols: 212
## skip: 0
## nrows: 150
## row_num: 0
##
Processed variant: 150
## All variants processed
<- onemap_read_vcfR(vcfR.object = vcfR_f2.obj, cross = "f2 intercross", parent1 = "P1", parent2 = "P2", f1 = "F1")
onemap_f2.obj
plot(onemap_f2.obj, all=F)
Graphical view of genotypes and allele counts
Function create_depth_profile
generates dispersion
graphics with x and y axis pointing respectively the reference and
alternative allele counts. The function is only available for biallelic
markers and for outcrossing and f2 intercross population. Each dot
represent a genotype considering mks
markers and
inds
individuals. If are established NULL for both
arguments, all markers and individuals are considered. The color of the
dots are according with the genotypes present in onemap object
(GTfrom = onemap
) or in VCF file
(GTfrom = vcf
) or the color can represent the error rate (1
- highest genotype probability) of each genotype in onemap object
(GTfrom = prob
). A rds file is generated with the data in
the graphic (rds.file
). The alpha
argument
control the transparency of color of each dot, regulate this parameter
is a good idea when having big amount of markers and individuals.
# For outcrossing population
create_depths_profile(onemap.obj = onemap_out.obj,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_out.rds")
## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 68.74 90.00 512.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 13.00 35.00 43.46 63.00 390.00
# For f2 intercross population
create_depths_profile(onemap.obj = onemap_f2.obj,
vcfR.object = vcfR_f2.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_f2.rds")
## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 28.0 51.0 66.2 87.0 714.0
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 10.00 34.00 43.43 63.00 399.00
Reestimate genotypes
If you choose to simulate allele counts in pedsim2vcf
it
is also possible to reestimate the genotypes using this simulated allele
counts with other distributions. This will include in your data errors
coming from random sampling if considering only poisson or negative
binomial distributions and/or dispersion, outliers and bias errors if
using updog model. Other advantage is that reestimating genotypes with
any of this software you can obtain genotypes probabilities to be used
in the map building instead of only genotypes.
warning: The reestimation of genotypes is only performed in biallelic codominant markers. If you have multiallelic markers in your VCF, they will be kept with the same genotype.
Updog, polyRAD and supermassa are software designed for genotyping polyploid species. This is a more complex procedure compared with diploid species. These software consider not only the proportion of alleles to define the genotypes, but other aspects as the expected distribution in the progeny according with parents genotypes. See more about them in respective manuals.
Genotypes probabilities usage
OneMap 3.0 have three options to define error probability in HMM
emission phase. The default method is consider a global error rate for
every genotype \(10^{-5}\). Until this
present version, only this procedure was implemented. Now, with the
function create_probs
, we offer the option to change the
global error rate (global_error
argument), or consider a
error rate by genotype (genotypes_errors
) or a genotype
probability for each genotype (genotypes_probs
).
With updog
The function updog_genotype
make interface of OneMap
with Updog to perform the genotype calling.
<- updog_genotype(vcf = "simu_out.vcf",
onemap_geno.updog out_vcf = "updog.vcf",
vcf.par = "AD",
parent1="P1",
parent2="P2",
crosstype = "outcross",
f1=NULL,
recovering = FALSE,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = TRUE,
use_genotypes_probs = FALSE)
## | *.#,%
## ||| *******/
## ||||||| (**..#**. */ **/
## ||||||||| */****************************/*%
## ||| &****..,*.************************/
## ||| (....,,,*,...****%********/(******
## ||| ,,****%////,,,,./.****/
## ||| /**// .*///....
## ||| .*/*/%# .,/ .,
## ||| , **/ #% .* ..
## ||| ,,,*
##
## Working on it...done!New onemap object contains 42 biallelic markers
create_depths_profile(onemap.obj = onemap_geno.updog,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf")
## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 68.74 90.00 512.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 13.00 35.00 43.46 63.00 390.00
With polyRAD
The function polyRAD_genotype
make interface of OneMap
with polyRAD to perform the genotype calling.
polyRAD_genotype_vcf(vcf="simu_out.vcf",
outfile = "polyrad.vcf",
parent1="P1",
parent2="P2")
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## Scanning file to determine attributes.
## File attributes:
## meta lines: 214
## header_line: 215
## variant count: 42
## column count: 211
##
Meta line 214 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
<- onemap_read_vcfR(vcf = "polyrad.vcf", cross = "outcross", parent1 = "P1", parent2 = "P2")
onemap_geno.polyRAD
create_depths_profile(onemap.obj = onemap_geno.polyRAD,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf")
## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 6.00 37.00 52.00 71.68 93.50 214.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 25.00 49.00 57.46 80.00 225.00
With supermassa
The function supermassa_genotype
make interface of
OneMap with supermassa to perform the genotype calling.
Warning: It requires ‘python 2’ installed.
<- supermassa_genotype(vcf = "simu_out.vcf",
onemap_geno.supermassa vcf.par = "AD",
out_vcf = "supermassa.vcf",
cross = "outcross",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = FALSE,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)
create_depths_profile(onemap.obj = onemap_geno.supermassa,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf")
Example of usage
Here we want to compare the best map using updog counts simulations and updog, polyrad and supermassa genotyping and genotypes probabilities. As a simple example, we will simulate only one family and one chromosome.
for(depth in c(100, 50, 10)){
run_pedsim(chromosome = "Chr1", n.marker = 42, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("B3.7","D1.10", "D2.15"),
n.types = rep(14,3), pop = "F1",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# VCF with unmodified genotypes
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = FALSE)
<- read.vcfR("simu_out.vcf")
vcfR_out.obj <- onemap_read_vcfR(vcfR.object = vcfR_out.obj, cross = "outcross", parent1 = "P1", parent2 = "P2")
onemap_out.obj
<- rf_2pts(onemap_out.obj)
twopts <- make_seq(twopts, order(as.numeric(onemap_out.obj$POS)))
seq_ord <- map(seq_ord, phase_cores = 4)
map_real <- rf_graph_table(map_real, main = paste0("Real_",depth))
p ggsave(p, filename = paste0("real", depth,".jpeg"))
# Simulating counts
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = depth,
p.mean.depth = depth,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=0.8,
od=0.0001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)
<- read.vcfR("simu_out.vcf")
vcfR_out.obj <- onemap_read_vcfR(vcfR.object = vcfR_out.obj, cross = "outcross", parent1 = "P1", parent2 = "P2")
onemap_out.obj
<- updog_genotype(vcf = "simu_out.vcf",
onemap_geno.updog out_vcf = "updog.vcf",
vcf.par = "AD",
parent1="P1",
parent2="P2",
crosstype = "outcross",
f1=NULL,
recovering = FALSE,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = TRUE,
use_genotypes_probs = FALSE)
<- rf_2pts(onemap_geno.updog)
twopts <- make_seq(twopts, order(as.numeric(onemap_geno.updog$POS)))
seq_ord <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T) # If HMM find problems between two markers, one of them will be automatically discarted and the sequence of markers without it will be returned
map_updog while(is(map_updog, "vector")){ # if the result is a sequence of marker numbers, then HMM is run again
# This will be repeteated until HMM can run with no problems
<- make_seq(twopts, map_updog)
seq_temp <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
map_updog
}<- rf_graph_table(map_updog, main = paste0("Updog depth",depth))
p ggsave(p, filename = paste0("updog", depth,".jpeg"))
polyRAD_genotype_vcf(vcf="simu_out.vcf",
outfile = "polyrad.vcf",
parent1="P1",
parent2="P2")
<- onemap_read_vcfR(vcf = "polyrad.vcf", cross = "outcross", parent1 = "P1", parent2 = "P2")
onemap_geno.polyRAD
<- rf_2pts(onemap_geno.polyRAD)
twopts <- make_seq(twopts, order(as.numeric(onemap_geno.polyRAD$POS)))
seq_ord <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
map_polyRAD while(is(map_polyRAD, "vector")){
<- make_seq(twopts, map_polyRAD)
seq_temp <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
map_polyRAD
}<- rf_graph_table(map_polyRAD, main = paste0("polyRAD depth", depth))
p ggsave(p, filename = paste0("polyrad", depth,".jpeg"))
<- supermassa_genotype(vcf = "simu_out.vcf",
onemap_geno.supermassa vcf.par = "AD",
out_vcf = "supermassa.vcf",
cross = "outcross",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = FALSE,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)
<- rf_2pts(onemap_geno.supermassa)
twopts <- make_seq(twopts, order(as.numeric(onemap_geno.supermassa$POS)))
seq_ord <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
map_supermassa while(is(map_supermassa, "vector")){
<- make_seq(twopts, map_supermassa)
seq_temp <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
map_supermassa
}<- rf_graph_table(map_supermassa, main = paste0("supermassa depth", depth))
p ggsave(p, filename = paste0("supermassa", depth,".jpeg"))
.05 <- create_probs(onemap_geno.updog, global_error = 0.05)
onemap_0
<- rf_2pts(onemap_0.05)
twopts <- make_seq(twopts, order(as.numeric(onemap_0.05$POS)))
seq_ord .05 <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
map_0while(is(map_0.05, "vector")){
<- make_seq(twopts, map_0.05)
seq_temp .05 <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
map_0
}<- rf_graph_table(map_0.05, main = paste0("0.05", depth))
p ggsave(p, filename = paste0("0.05", depth,".jpeg"))
draw_map2(map_real, map_updog, map_polyRAD, map_supermassa, map_0.05, main = depth,
group.names = c("Real", "UD", "PR", "SM", "0.05"),
output = paste0(depth,"_maps.jpeg"))
}
Comparing estimated and simulated haplotypes
The simulations here included are very useful to test the efficiency
of our algorithms, we made them available in case to be useful for other
purposes. Using functions run_pedsim
,
pedsim2vcf
with argument phased TRUE
, we can
convert the phased VCF using vcf2progeny_haplotypes
, which
keeps the original haplotype. With that, we can compare the haplotypes
simulated and the ones estimated by the onemap approach.
Here we will use the file sim_out_cod_genotypes.dat
to
get one phased VCF. See that now we don’t want to simulate read counts
or genotype errors; we just want the original haplotypes in VCF format,
then we set counts = FALSE
and don’t need to define the
related arguments.
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out_phased.vcf",
miss.perc = 0,
counts = FALSE,
chr.mb = 10,
pos="cM",
chr=NULL,
phase = TRUE)
Now, we can convert the phased VCF to onemap_progeny_haplotypes
object using function vcf2progeny_haplotypes
. The position,
the group names, and individuals id will be according to the ones
contained in VCF file. This function can take some time to run if you
select too many individuals or groups.
<- read.vcfR("simu_out_phased.vcf") vcfR.object
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
<- vcf2progeny_haplotypes(vcfR.object = vcfR.object, ind.id = c("F1_004", "F1_005"), group_names = "Chr1", parent1 = "P1", parent2 = "P2", crosstype = "outcross")
progeny_dat plot(progeny_dat)
This is the simulated dataset. If OneMap algorithm could get all information needed, it could reproduce these haplotypes exactly how they were in simulated data. Let’s see how OneMap estimates these haplotypes:
<- onemap_read_vcfR(vcfR.object = vcfR.object, cross = "outcross", parent1 = "P1", parent2 = "P2", only_biallelic = FALSE)
onemap_test
# Just to remember of which dataset we are talking about onemap_test
## This is an object of class 'onemap'
## Type of cross: outcross
## No. individuals: 200
## No. markers: 42
## CHROM information: yes
## POS information: yes
## Percent genotyped: 100
##
## Segregation types:
## B3.7 --> 14
## D1.10 --> 14
## D2.15 --> 14
##
## No. traits: 0
<- rf_2pts(onemap_test) twopts
## Computing 861 recombination fractions ...
<- make_seq(twopts, "all")
seq1 <- map(seq1)
map_test <- progeny_haplotypes(map_test, ind = c(4,5), most_likely = FALSE)
progeny_est plot(progeny_est)
See that, besides markers are not in their exact position, the recombination breakpoints are the same as the ones simulated. The marker position could only be the same as the ones simulated if we have a very big (or maybe infinite) population size. Still, here we have only 200 individuals, but this is not a problem once we could reproduce the haplotypes.
Simulate Illumina reads
This one is a bit more complex and the its tools are stored in the github repository Reads2Map.
Remove generated files
system("rm sim* founderfile* mapfile* *rds ")
## Warning in system("rm sim* founderfile* mapfile* *rds "): 'rm' not found
## [1] 127