Building Linkage Map for Inbred Diploid Populations with OneMap

This tutorial will show how to build a linkage map for inbred lines populations with only the most updated methods implemented in OneMap. The complete tutorial, containing examples of all OneMap functions for inbred lines is available here.

Install OneMap

From GitHub (version: 3.1.0):

devtools::install_github("Cristianetaniguti/onemap")

Input File

Here, we will work with a tomato genotyping-by-sequencing dataset with 148 RILs and two parental lines. See further details about the dataset in the NCBI experiment SRR7060267. We previously used Reads2Map workflow to perform the alignment of the sequences with the tomato SL4.0 reference genome using BWA and the SNP calling with STACKs.

You can download the resulted VCF file from here.

It is common that GBS datasets provides a large amount of markers, you may want to filter some of them before starting the analysis in R to optmize time and computational resources. For filtering the VCF file, we suggest a simple and efficient tool called vcftools.

If using a Linux based operational system or WSL in Windows you can install vcftools using:

sudo apt-get install vcftools

See other installation options here and the vcftools manual for further information.

We filtered our VCF file by maximum percentage of missing data of 30% for a marker across all samples and a Minor Allele Frequency (MAF) of 5%.

vcftools --gzvcf tomato_stacks_markers.vcf.gz --maf 0.05 --max-missing 0.3 --out tomato_stacks_filtered --recode

The resulted VCF file has 22.257 markers.

We can convert the VCF file to an onemap object using the function onemap_read_vcfR:

library(onemap)

## Registered S3 method overwritten by 'vegan':
##   method     from      
##   rev.hclust dendextend

onemap.obj <- onemap_read_vcfR(vcf = "tomato_stacks_filtered.recode.vcf.gz", 
                               cross = "ri self",
                               parent1 = "NCEBR", 
                               parent2 = "LA2093")

## 1706 Markers were removed of the dataset because one or both of parents have no informed genotypes (are missing data)
## 1160 Markers were removed from the dataset because one or both of the parents are heterozygotes, we do not expect heterozygotes parents in RILs populations.
## 124 Markers were removed from the dataset because they are monomorphic for the parents, these markers are not informative for the genetic map.

The function also filter markers that are considered non-informative for mapping purposes.

OneMap has other options for input file format, check the session Creating the data file in the complete tutorial.

Visualize the markers:

plot(onemap.obj)

Filters

Missing data

While converting the VCF file the function also replace by missing data genotypes in the progeny that are not expected by the marker segregation. Therefore, we will need to filter by missing data again:

onemap.obj.mis <- filter_missing(onemap.obj, threshold = 0.3)

## Number of markers removed from the onemap object:  5840

plot(onemap.obj.mis)

By the graphic you can see that there are individuals with more missing data than others. We can also filter out by individuals:

onemap.obj.mis.ind <- filter_missing(onemap.obj.mis, threshold = 0.35, by = "individuals")

## Number of indiduals removed from the onemap object:  32

plot(onemap.obj.mis.ind)

onemap.obj.mis.ind

##   This is an object of class 'onemap'
##     Type of cross:      riself 
##     No. individuals:    116 
##     No. markers:        13413 
##     CHROM information:  yes 
##     POS information:    yes 
##     Percent genotyped:  80 
## 
##     Segregation types:
##             AA : BB -->  13413
## 
##     No. traits:         0

Redundant markers

We can also keep only one of the markers that has same genotypes cross all individuals:

bins <- find_bins(onemap.obj.mis.ind, exact = FALSE)
onemap.bins <- create_data_bins(onemap.obj.mis.ind, bins)
onemap.bins

##   This is an object of class 'onemap'
##     Type of cross:      riself 
##     No. individuals:    116 
##     No. markers:        4203 
##     CHROM information:  yes 
##     POS information:    yes 
##     Percent genotyped:  82 
## 
##     Segregation types:
##             AA : BB -->  4203
## 
##     No. traits:         0

Segregation distortion

segr.test <- test_segregation(onemap.bins)
plot(segr.test)

segr.idx <- select_segreg(segr.test, distorted = FALSE, numbers = TRUE)

Estimate recombination fraction

twopts <- rf_2pts(onemap.bins)

## Computing 8830503 recombination fractions:
## 
##  0%  ..................................................  21%
##  21% ..................................................  41%
##  41% ..................................................  58%
##  58% ..................................................  72%
##  72% ..................................................  83%
##  83% ..................................................  91%
##  91% ..................................................  97%
##  97% ..................................................  99%
##  99% ......................  100%

Group and ordering

For this dataset we have the information of each marker chromosome location according to the reference genome used in the SNP calling analysis. However, OneMap also present functions that can be used to group and order in case there is no such information available. See further details in the complete tutorial session using only recombination information.

# Create sequences objects for each chromosome
table(twopts$data.name$CHROM)

## 
## SL4.0ch00 SL4.0ch01 SL4.0ch02 SL4.0ch03 SL4.0ch04 SL4.0ch05 SL4.0ch06 SL4.0ch07 
##         5       588       473       350       410       272       262       302 
## SL4.0ch08 SL4.0ch09 SL4.0ch10 SL4.0ch11 SL4.0ch12 
##       294       345       254       310       338

chrom.names <- unique(twopts$data.name$CHROM)

seqs <- heatmaps <- list()
for(i in 1:length(chrom.names)){
  print(paste("Evaluating", chrom.names[i]))
  # Get sequence of markers based on their chromosome location
  temp_seq <- make_seq(twopts, chrom.names[i])
  # Remove distorted and order by genome location
  seq_num <- temp_seq$seq.num[which(temp_seq$seq.num %in% segr.idx)]
  seqs[[i]] <- ord_by_geno(make_seq(twopts, seq_num))
  heatmaps[[i]] <- rf_graph_table(seqs[[i]], inter = FALSE, mrk.axis = "none")
}

## [1] "Evaluating SL4.0ch00"
## [1] "Evaluating SL4.0ch02"
## [1] "Evaluating SL4.0ch04"
## [1] "Evaluating SL4.0ch06"
## [1] "Evaluating SL4.0ch01"
## [1] "Evaluating SL4.0ch11"
## [1] "Evaluating SL4.0ch03"
## [1] "Evaluating SL4.0ch05"
## [1] "Evaluating SL4.0ch07"
## [1] "Evaluating SL4.0ch08"
## [1] "Evaluating SL4.0ch09"
## [1] "Evaluating SL4.0ch10"
## [1] "Evaluating SL4.0ch12"

names(seqs) <- chrom.names

Check the ordering by the recombination fraction heatmaps:

library(ggpubr)

## Warning: package 'ggpubr' was built under R version 4.1.3

## Loading required package: ggplot2

p_genomic <- ggarrange(plotlist = heatmaps, common.legend = TRUE)
p_genomic

# Order using UG - other possible algorithms: mds_onemap, rcd, record, seriation, order_seq
seq7_ug <- ug(seqs[[7]], hmm = FALSE)  
# Build recombination fraction matrix heatmap graphic
heatmap7_ug <- rf_graph_table(seq7_ug, inter = FALSE, mrk.axis = "none")

heatmap7_ug

heatmaps[[7]]

# Fix genomic ordering
edit_seq7 <- edit_order_onemap(input.seq = seq7_ug)
new_seq7 <- make_seq(edit_seq7)
heatmap7_edit <- rf_graph_table(new_seq7, inter = FALSE, mrk.axis = "none")
heatmap7_edit

seqs[[7]] <- new_seq7

Estimate multipoint genetic distances

Once the ordering is done, we can estimate the genetic distances using the Hidden Markov Model multipoint approach. This can take some time depending on the number of markers. We can use parLapply to parallelize the process:

library(parallel)
n.cores <-2 
clust <- makeCluster(n.cores)
maps <- parLapply(clust, seqs[-1], map)
stopCluster(clust)

Overview of the relation between genomic and linkage map distances:

mbXcm <- plot_genome_vs_cm(maps)
mbXcm

Add markers from non-assembled sequences

Try to add the markers from chromosome 0 to the other groups:

lgs_group <- group_seq(twopts, seqs = seqs[-1], unlink.mks = seqs[[1]], repeated = FALSE)

##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     .......................................................
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ..................................................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ..............
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ..............................................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     .........
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ......................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ...........
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ..........................................................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ....................................................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ...........
##    group    2 
##     ..
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     ............................................................
##     .....................................
##    group    2 
##     ..

## There are one or more markers that grouped in more than one sequence

str(lgs_group, max.level = 1)

## List of 14
##  $ data.name       :List of 11
##   ..- attr(*, "class")= Named chr [1:2] "onemap" "riself"
##   .. ..- attr(*, "names")= chr [1:2] "" "ri self"
##  $ twopt           :List of 8
##   ..- attr(*, "class")= Named chr [1:2] "rf_2pts" "riself"
##   .. ..- attr(*, "names")= chr [1:2] "" "ri self"
##  $ mk.names        : chr [1:4203] "13960:1:-" "11456:1:-" "753587:1:-" "732462:1:+" ...
##  $ input.seqs      :List of 12
##  $ input.unlink.mks: int [1:4] 6 2 1 13
##  $ out.seqs        :List of 12
##  $ n.unlinked      : int 0
##  $ n.repeated      : int 1
##  $ n.mar           : int 3960
##  $ OD              : num 3
##  $ ax.rf           : num 0.5
##  $ sequences       :List of 12
##  $ repeated        :List of 2
##  $ unlinked        : Named num(0) 
##   ..- attr(*, "names")= chr(0) 
##  - attr(*, "class")= chr "group_seq"

str(lgs_group$input.seqs, max.level = 1)

## List of 12
##  $ SL4.0ch02: int [1:473] 602 5 607 605 610 613 604 616 612 617 ...
##  $ SL4.0ch04: int [1:410] 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 ...
##  $ SL4.0ch06: int [1:195] 2102 2103 2101 2104 2105 2106 2107 2108 2109 2110 ...
##  $ SL4.0ch01: int [1:587] 15 16 17 14 18 19 20 21 22 24 ...
##  $ SL4.0ch11: int [1:310] 3557 3558 3559 3560 3561 3562 3564 3565 3566 3563 ...
##  $ SL4.0ch03: num [1:262] 1232 1220 1219 1208 1218 ...
##  $ SL4.0ch05: int [1:192] 1829 1830 1831 1832 1834 1835 1836 1833 1837 1838 ...
##  $ SL4.0ch07: int [1:299] 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 ...
##  $ SL4.0ch08: int [1:293] 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 ...
##  $ SL4.0ch09: int [1:345] 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 ...
##  $ SL4.0ch10: int [1:252] 3303 3305 3306 3308 3309 3310 3311 3312 3313 3314 ...
##  $ SL4.0ch12: int [1:338] 3867 3866 3868 3869 3870 3871 3872 3873 3874 3875 ...

str(lgs_group$out.seqs, max.level = 1) # three markers grouped in the chromosome 2

## List of 12
##  $ SL4.0ch02: int [1:476] 602 5 607 605 610 613 604 616 612 617 ...
##  $ SL4.0ch04: int [1:410] 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 ...
##  $ SL4.0ch06: int [1:195] 2102 2103 2101 2104 2105 2106 2107 2108 2109 2110 ...
##  $ SL4.0ch01: int [1:587] 15 16 17 14 18 19 20 21 22 24 ...
##  $ SL4.0ch11: int [1:310] 3557 3558 3559 3560 3561 3562 3564 3565 3566 3563 ...
##  $ SL4.0ch03: num [1:262] 1232 1220 1219 1208 1218 ...
##  $ SL4.0ch05: int [1:192] 1829 1830 1831 1832 1834 1835 1836 1833 1837 1838 ...
##  $ SL4.0ch07: int [1:299] 2362 2363 2364 2365 2366 2367 2368 2369 2370 2371 ...
##  $ SL4.0ch08: int [1:293] 2664 2665 2666 2667 2668 2669 2670 2671 2672 2673 ...
##  $ SL4.0ch09: int [1:345] 2958 2959 2960 2961 2962 2963 2964 2965 2966 2967 ...
##  $ SL4.0ch10: int [1:252] 3303 3305 3306 3308 3309 3310 3311 3312 3313 3314 ...
##  $ SL4.0ch12: int [1:338] 3867 3866 3868 3869 3870 3871 3872 3873 3874 3875 ...

# Identify best ordering for the inserted markers:
insert <- seqs[[1]]$seq.num[seqs[[1]]$seq.num %in% lgs_group$out.seqs$SL4.0ch02]

chr02 <- try_seq_by_seq(sequence = maps[[1]], markers = insert, cM.thr = 40, lod.thr = -100)

## Marker 6 was included 
## Marker 2 was included 
## Marker 13 was included

p_chr02 <- rf_graph_table(chr02, mrk.axis = "none")
insert

## [1]  6  2 13

p_chr02

maps[[1]] <- chr02

draw_map2(maps, output = "maps.png")

## Completed
## Output file: ./maps.png

maps

chr03 <- ord_by_geno(maps[[6]])
chr03 <- map(chr03)

draw_map2(chr03, output = "chr03.png")

## Completed
## Output file: ./chr03.png

Chromosome 3 with genomic order

maps[[6]] <- chr03

Adjusting global error

See further information about the usage of global error in the HMM in the Reads2Map manuscript.

clust <- makeCluster(n.cores)
maps_error <- parLapply(clust, maps, function(x) onemap::map(x, global_error = 0.05))
stopCluster(clust)

draw_map2(maps_error, output = "maps_error.png")

## Completed
## Output file: ./maps_error.png

maps with 5% global error

Visualize haplotypes

haplo <- progeny_haplotypes(maps_error, group_names = chrom.names[-1], ind = 1)
plot(haplo)

Add redundant markers

maps_done <- list()
for(i in 1:length(maps_error)){
  maps_done[[i]] <- add_redundants(maps_error[[i]], onemap.obj.mis.ind, bins)
}

mbXcm <- plot_genome_vs_cm(maps_done)
mbXcm

Export output

# Export linkage map distances:
write_map(maps_done, file.out = "maps.out")