Building Linkage Map for Inbred Diploid Populations with OneMap
This tutorial will show how to build a linkage map for inbred lines populations with only the most updated methods implemented in OneMap. The complete tutorial, containing examples of all OneMap functions for inbred lines is available here.
Input File
Here, we will work with a tomato genotyping-by-sequencing dataset with 148 RILs and two parental lines. See further details about the dataset in the NCBI experiment SRR7060267. We previously used Reads2Map workflow to perform the alignment of the sequences with the tomato SL4.0 reference genome using BWA and the SNP calling with STACKs.
- You can download the resulted VCF file from here.
It is common that GBS datasets provides a large amount of markers, you may want to filter some of them before starting the analysis in R to optmize time and computational resources. For filtering the VCF file, we suggest a simple and efficient tool called vcftools.
If using a Linux based operational system or WSL in Windows you can install vcftools using:
See other installation options here and the vcftools manual for further information.
We filtered our VCF file by maximum percentage of missing data of 30% for a marker across all samples and a Minor Allele Frequency (MAF) of 5%.
vcftools --gzvcf tomato_stacks_markers.vcf.gz --maf 0.05 --max-missing 0.3 --out tomato_stacks_filtered --recode
The resulted VCF file has 22.257 markers.
We can convert the VCF file to an onemap object using the function
onemap_read_vcfR
:
## Registered S3 method overwritten by 'vegan':
## method from
## rev.hclust dendextend
onemap.obj <- onemap_read_vcfR(vcf = "tomato_stacks_filtered.recode.vcf.gz",
cross = "ri self",
parent1 = "NCEBR",
parent2 = "LA2093")
## 1706 Markers were removed of the dataset because one or both of parents have no informed genotypes (are missing data)
## 1160 Markers were removed from the dataset because one or both of the parents are heterozygotes, we do not expect heterozygotes parents in RILs populations.
## 124 Markers were removed from the dataset because they are monomorphic for the parents, these markers are not informative for the genetic map.
The function also filter markers that are considered non-informative for mapping purposes.
OneMap has other options for input file format, check the session Creating the data file
in the complete tutorial.
Visualize the markers:
Filters
- Missing data
While converting the VCF file the function also replace by missing data genotypes in the progeny that are not expected by the marker segregation. Therefore, we will need to filter by missing data again:
## Number of markers removed from the onemap object: 5840
By the graphic you can see that there are individuals with more missing data than others. We can also filter out by individuals:
## Number of indiduals removed from the onemap object: 32
## This is an object of class 'onemap'
## Type of cross: riself
## No. individuals: 116
## No. markers: 13413
## CHROM information: yes
## POS information: yes
## Percent genotyped: 80
##
## Segregation types:
## AA : BB --> 13413
##
## No. traits: 0
- Redundant markers
We can also keep only one of the markers that has same genotypes cross all individuals:
bins <- find_bins(onemap.obj.mis.ind, exact = FALSE)
onemap.bins <- create_data_bins(onemap.obj.mis.ind, bins)
onemap.bins
## This is an object of class 'onemap'
## Type of cross: riself
## No. individuals: 116
## No. markers: 4203
## CHROM information: yes
## POS information: yes
## Percent genotyped: 82
##
## Segregation types:
## AA : BB --> 4203
##
## No. traits: 0
- Segregation distortion
Estimate recombination fraction
## Computing 8830503 recombination fractions:
##
## 0% .................................................. 21%
## 21% .................................................. 41%
## 41% .................................................. 58%
## 58% .................................................. 72%
## 72% .................................................. 83%
## 83% .................................................. 91%
## 91% .................................................. 97%
## 97% .................................................. 99%
## 99% ...................... 100%
Group and ordering
For this dataset we have the information of each marker chromosome
location according to the reference genome used in the SNP calling
analysis. However, OneMap also present functions that can be used to
group and order in case there is no such information available. See
further details in the complete tutorial session using only recombination information
.
##
## SL4.0ch00 SL4.0ch01 SL4.0ch02 SL4.0ch03 SL4.0ch04 SL4.0ch05 SL4.0ch06 SL4.0ch07
## 5 588 473 350 410 272 262 302
## SL4.0ch08 SL4.0ch09 SL4.0ch10 SL4.0ch11 SL4.0ch12
## 294 345 254 310 338
chrom.names <- unique(twopts$data.name$CHROM)
seqs <- heatmaps <- list()
for(i in 1:length(chrom.names)){
print(paste("Evaluating", chrom.names[i]))
# Get sequence of markers based on their chromosome location
temp_seq <- make_seq(twopts, chrom.names[i])
# Remove distorted and order by genome location
seq_num <- temp_seq$seq.num[which(temp_seq$seq.num %in% segr.idx)]
seqs[[i]] <- ord_by_geno(make_seq(twopts, seq_num))
heatmaps[[i]] <- rf_graph_table(seqs[[i]], inter = FALSE, mrk.axis = "none")
}
## [1] "Evaluating SL4.0ch00"
## [1] "Evaluating SL4.0ch02"
## [1] "Evaluating SL4.0ch04"
## [1] "Evaluating SL4.0ch06"
## [1] "Evaluating SL4.0ch01"
## [1] "Evaluating SL4.0ch11"
## [1] "Evaluating SL4.0ch03"
## [1] "Evaluating SL4.0ch05"
## [1] "Evaluating SL4.0ch07"
## [1] "Evaluating SL4.0ch08"
## [1] "Evaluating SL4.0ch09"
## [1] "Evaluating SL4.0ch10"
## [1] "Evaluating SL4.0ch12"
Check the ordering by the recombination fraction heatmaps:
## Warning: package 'ggpubr' was built under R version 4.1.3
## Loading required package: ggplot2
Estimate multipoint genetic distances
Once the ordering is done, we can estimate the genetic distances using the Hidden Markov Model multipoint approach. This can take some time depending on the number of markers. We can use parLapply to parallelize the process:
library(parallel)
n.cores <-2
clust <- makeCluster(n.cores)
maps <- parLapply(clust, seqs[-1], map)
stopCluster(clust)
Overview of the relation between genomic and linkage map distances:
Add markers from non-assembled sequences
Try to add the markers from chromosome 0 to the other groups:
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## .......................................................
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ..................................................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ..............
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ..............................................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## .........
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ......................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ...........
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ..........................................................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ....................................................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ...........
## group 2
## ..
## Selecting markers:
## group 1
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## ............................................................
## .....................................
## group 2
## ..
## There are one or more markers that grouped in more than one sequence
## List of 14
## $ data.name :List of 11
## ..- attr(*, "class")= Named chr [1:2] "onemap" "riself"
## .. ..- attr(*, "names")= chr [1:2] "" "ri self"
## $ twopt :List of 8
## ..- attr(*, "class")= Named chr [1:2] "rf_2pts" "riself"
## .. ..- attr(*, "names")= chr [1:2] "" "ri self"
## $ mk.names : chr [1:4203] "13960:1:-" "11456:1:-" "753587:1:-" "732462:1:+" ...
## $ input.seqs :List of 12
## $ input.unlink.mks: int [1:4] 6 2 1 13
## $ out.seqs :List of 12
## $ n.unlinked : int 0
## $ n.repeated : int 1
## $ n.mar : int 3960
## $ OD : num 3
## $ ax.rf : num 0.5
## $ sequences :List of 12
## $ repeated :List of 2
## $ unlinked : Named num(0)
## ..- attr(*, "names")= chr(0)
## - attr(*, "class")= chr "group_seq"
## List of 12
## $ SL4.0ch02: int [1:473] 602 5 607 605 610 613 604 616 612 617 ...
## $ SL4.0ch04: int [1:410] 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 ...
## $ SL4.0ch06: int [1:195] 2102 2103 2101 2104 2105 2106 2107 2108 2109 2110 ...
## $ SL4.0ch01: int [1:587] 15 16 17 14 18 19 20 21 22 24 ...
## $ SL4.0ch11: int [1:310] 3557 3558 3559 3560 3561 3562 3564 3565 3566 3563 ...
## $ SL4.0ch03: num [1:262] 1232 1220 1219 1208 1218 ...
## $ SL4.0ch05: int [1:192] 1829 1830 1831 1832 1834 1835 1836 1833 1837 1838 ...
## $ SL4.0ch07: int [1:299] 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 ...
## $ SL4.0ch08: int [1:293] 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 ...
## $ SL4.0ch09: int [1:345] 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 ...
## $ SL4.0ch10: int [1:252] 3303 3305 3306 3308 3309 3310 3311 3312 3313 3314 ...
## $ SL4.0ch12: int [1:338] 3867 3866 3868 3869 3870 3871 3872 3873 3874 3875 ...
## List of 12
## $ SL4.0ch02: int [1:476] 602 5 607 605 610 613 604 616 612 617 ...
## $ SL4.0ch04: int [1:410] 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 ...
## $ SL4.0ch06: int [1:195] 2102 2103 2101 2104 2105 2106 2107 2108 2109 2110 ...
## $ SL4.0ch01: int [1:587] 15 16 17 14 18 19 20 21 22 24 ...
## $ SL4.0ch11: int [1:310] 3557 3558 3559 3560 3561 3562 3564 3565 3566 3563 ...
## $ SL4.0ch03: num [1:262] 1232 1220 1219 1208 1218 ...
## $ SL4.0ch05: int [1:192] 1829 1830 1831 1832 1834 1835 1836 1833 1837 1838 ...
## $ SL4.0ch07: int [1:299] 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 ...
## $ SL4.0ch08: int [1:293] 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 ...
## $ SL4.0ch09: int [1:345] 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 ...
## $ SL4.0ch10: int [1:252] 3303 3305 3306 3308 3309 3310 3311 3312 3313 3314 ...
## $ SL4.0ch12: int [1:338] 3867 3866 3868 3869 3870 3871 3872 3873 3874 3875 ...
# Identify best ordering for the inserted markers:
insert <- seqs[[1]]$seq.num[seqs[[1]]$seq.num %in% lgs_group$out.seqs$SL4.0ch02]
chr02 <- try_seq_by_seq(sequence = maps[[1]], markers = insert, cM.thr = 40, lod.thr = -100)
## Marker 6 was included
## Marker 2 was included
## Marker 13 was included
## [1] 6 2 13
## Completed
## Output file: ./maps.png
## Completed
## Output file: ./chr03.png
Adjusting global error
See further information about the usage of global error in the HMM in the Reads2Map manuscript.
clust <- makeCluster(n.cores)
maps_error <- parLapply(clust, maps, function(x) onemap::map(x, global_error = 0.05))
stopCluster(clust)
draw_map2(maps_error, output = "maps_error.png")
## Completed
## Output file: ./maps_error.png