Inbred Based Populations

Starting in version 2.0-0, OneMap can also deal with inbred-based populations, that is, populations that have homozygous parental lines in the genealogy (F2s, backcrosses and RILs). As a consequence, linkage phases do not need to be estimated. Since version 2.3.0, phases are estimated for F2 populations to proper generate progeny haplotypes not only the recombination fraction.

In this vignette we explain how to proceed the analysis in an F2 population. The same procedure can be used for backcrosses and RILs as well, and therefore users should not have any difficulty in analyzing their data. However, there are a number of differences from genetic mapping in outcrossing species; please read the proper vignette.

If you are not familiar with R, we recommend first the reading of vignette Introduction to R. You do not need to be an expert in R to build your linkage map, but some concepts are necessary and will help you through the process.

There is a github OneMap version which is constantly improved, we strong recommend all users to try this version. In augusto-garcia/onemap github page you can find instructions to install the package from github and also more fancy tutorials.

Creating the data file

For F2s, backcrosses and RILs, two input formats are accepted. The user can choose between the standard OneMap file format or the same raw file used by MAPMAKER/EXP (Lander et al., 1987). Therefore, one should have no difficulty in using data sets already available for MAPMAKER/EXP when deciding to try OneMap.

Both types of raw file can contain phenotypic information, but this will not be used during map construction, that requires only genotypic information (made available by molecular markers).

Creating MAPMAKER/EXP data file

The MAPMAKER/EXP raw file, combined with the map file produced by OneMap, can be readily used for QTL mapping using R/qtl (Broman et al., 2008) or QTL Cartographer (Wang et al., 2010), among others.

Here, we briefly present how to set up this data file. For more detailed information see the MAPMAKER/EXP manual (Lincon et al., 1993), available here.

The first line of your data file should be:

data type xxxx

where xxxx is one of the following data types:

xxxx Population type
f2 backcross Backcross
f2 intercross F2
ri self RIL, produced by selfing
ri sib RIL, produced by sib mating

The second line should contain the number of individuals in the progeny, the number of markers and the number of quantitative traits. So, for example, a valid line would be

10 5 2

for a data set with 10 individuals (yes, very small, but this is just an example), 5 markers and 2 traits evaluated.

Then, the genotype information is included for each marker. The character * indicates the beginning of information of a marker, followed by the marker name. For instance, here is an example of such a file for an F2 population with 10 individuals, 5 markers and 2 quantitative traits:

data type f2 intercross
10 5 2

*M1 A B H H A - B A A B
*M2 C - C C C - - C C A
*M3 D B D D - - B D D B
*M4 C C C - A C C A A C
*M5 C C C C C C C C C C

*weight 10.2 - 9.4 11.3 11.9 8.9 - 11.2 7.8 8.1 
*length 1.7 2.1 - 1.8 2.0 1.0 - 1.7 1.0 1.1

The codification for genotypes is the following:

Code Meaning
A homozygous for allele A (from parent 1 - AA)
B homozygous for allele B (from parent 2 - BB)
H heterozygous carrying both alleles (AB)
C Not homozygous for allele A (Not AA)
D Not homozygous for allele B (Not BB)
- Missing data for the individual at this marker

The symbols option (not included in this example), used in MAPMAKER/EXP files, is also accepted (please, see its manual for details).

The quantitative trait data should come after the genotypic data and has a similar format, except the trait values for each individual must be separated by at least one space, a tab or a line break. A dash (-) indicates missing data.

This file must be saved in plain text format using a simple text editor such as notepad on Microsoft Windows. Historically, MAPMAKER/EXP uses the .raw extension for this file; however, you can use any other extensions, such as .txt.

If you want to see more examples about this file type, open mapmaker_example_bc.raw and mapmaker_example_f2.raw, both available with OneMap and saved in the directory extdata on your computer, in the folder where you installed OneMap (use system.file(package="onemap") to see where it is located on your computer).

Now, let us load OneMap:

To save your project anytime, type:

if you are using Windows; otherwise, adapt the code. Notice that you need to specify where to save and the name of the file. You can also use the toolbar, of course.

Creating OneMap data file

The OneMap data file has few differences compared to MAPMAKER/EXP format. As MAPMAKER/EXP format, the input OneMap file is a text file, where the first line indicates the cross type and the second line provides information about the number of individuals and the number of markers, but, because the format also supports to keep physical markers locations informations, the followed numbers indicate the presence/absence (1/0) of chromosome and position informations and, after the presence/absence(1/0) of phenotypic data.

The third line contains sample IDs. Then, the genotype information is included separately for each marker. The character * indicates the beginning of information input for a new marker, followed by the marker name. Next, there is a code indicating the marker type according with:

Code Type
A.H.B Codominant marker
C.A Dominant marker for allele B
D.B Dominant marker for allele A
A.H Marker for backcross
A.B Marker for ril self/sib cross

Finally, after each marker type, comes the genotype data for the segregating population. Missing data are indicated with the character - (minus sign) and an empty space separates the information for each individual. Positions and phenotype information, if present, follows genotypic data with a similar structure. Details are found in the help of function read_onemap.

Here is an example of such file for 10 individuals and 5 markers (the three zeros in the second line indicate that there is no chromosome information, physical position information or phenotypic data, respectively). It is very similar to a MAPMAKER/EXP file, but has additional information about the cross type.

data type f2 intercross
10 5 1 0 1
I1 I2 I3 I4 I5 I6 I7 I8 I9 I10
*M1 A.H.B  ab a - ab b a ab - ab b
*M2 A.H.B  a - ab ab - b a - a ab
*M3 C.A    c a a c c - a c a c
*M4 A.H.B  ab b - ab a b ab b - a
*M5 D.B    b b d - b d b b b d
*CHROM 1 1 1 2 2 2 3 3 3 3
*fen1 10.3 11.2 11.1 - 9.8 8.9 11.0 10.7 - 10.1
*fen2 42 49 - 45 51 42 28 32 38 40

The input file must be saved in text format, with extensions like .raw. It is a good idea to open the text files called onemap_example_f2.raw, onemap_example_bc, onemap_example_riself (available in extdata with OneMap and saved in the directory you installed it) to see how this file should be. You can see where OneMap is installed using the command system.file(package="onemap").

Simulating the data file (new!)

Since version 2.3.0, OneMap offers wrapper functions for PedigreeSim (Voorrips and Maliepaard, 2012) software to simulate mapping populations.

Run PedigreeSim (new!)

First, download PedigreeSim java file. It will require java installed. You can run PedigreeSim directly and just use the output files in OneMap or you can use a wrapper function named run_pedsim that facilitates this procedure.

The function do not provide every possibilities offered by PedigreeSim software. If you want to change any parameter that is not available in the function, please use directly the PedigreeSim software. Here is the software documentation for more information.

The function allows to create f2 intercross (F2) and backcross (BC) populations from bi-parental cross of inbred lines and segregating F1 population from bi-parental cross of heterozygous parents. You can define it in pop argument. You must change the path.pedsim to the path where the PedigreeSim.jar is stores in your system. You can also define the number of chromosomes (argument chromosome), the number of markers in each chromosome (n.marker), the total size of the groups in cM (tot.size.cm), the position of centromere (centromere), number of individuals in the population (n.ind), the marker types (mk.types, see the table in session Creating the data file) and the number of markers of each type (n.types).

We suggest you to open the output files founderfile, chromfile, mapfile and parfile to check if they agree with your intentions before proceed to other analysis.

Convert PedigreeSim output to OneMap raw data (new!)

Once you run PedigreeSim (directly or using our run_pedsim function), you should have the output file genotypes.dat (see an example file sim_cod_F2_genotypes.dat in extdata in installed OneMap directory). To convert this to OneMap raw file, you just need to specify the cross type (cross) and which ones are the parents (parent1 and parent2). When simulating f2 intercross, you also need to define which is the F1 individual (f1) and if you want to include missing genotypes (miss.perc). Only cross types outcross and f2 intercross are supported by now.

Convert PedigreeSim output to VCF file (new!)

The same output file from PedigreeSim, the genotypes.dat, can be used to simulate a VCF file together with the PedigreeSim mapfile and chrom. The advantages to simulate a VCF instead of OneMap raw file is that VCF is a standard file format and can store a lot of other information included the allele counts (usually in the field AD or DPR). The pedsim2vcf function can simulate the allele counts using negative binomial or updog distributions (argument method). The main arguments for the distributions are defined with arguments mean.depth, that defines the mean allele depth in the progeny, p.mean.depth that defines the mean allele depth in the parents, argument disper.par defines the dispersion parameter, mean.phred defines the mean phred score of the sequencing technology used. The function can also simulate missing data (miss.perc). Through arguments pos and chr you can define vectors with physical position and chromosome of each marker. With argument haplo.ref you define which one of the haplotypes in genotypes.dat will be considered the reference. Establishing phase as TRUE, the VCF will have phased genotypes. After allele counts are simulated, the genotypes are re-estimated using a binomial distribution. The VCF generated by this function only have one or two FORMAT fields, the GT and AD (if counts = TRUE). Dominant markers are not supported by this function.

## Counts simulation changed 0.1149425 % of the given genotypes

The function prints on screen the percentage of genotypes changed between simulated by PedigreeSim and re-estimated using a binomial distribution and simulated allele counts. Notice that this percentage increase if you reduce the mean depth or increase bias or dispersion parameter.

In the session Importing data from VCF file below you will see how to import VCF files as OneMap objects and also how to convert it in OneMap raw data.

Importing data

From MAPMAKER/EXP file

Once you created your data file with raw data, you can use OneMap function read_mapmaker to import it to OneMap:

The first argument is the directory where the input file is located, modify it accordingly. The second one is the data file name.

In this example, an object named mapmaker_example_f2.raw was created. Notice that if you leave the argument dir blank, the file will be read from your current working directory. To set a working directory, see Introduction to R (Importing and Exporting Data).

For this example, we will use a simulated data set from an F2 population which is distributed along with OneMap. Because this particular data set is distributed along with the package, you can load it typing

To see what this data set is about, type

##   This is an object of class 'onemap'
##     Type of cross:      f2 
##     No. individuals:    200 
##     No. markers:        66 
##     CHROM information:  no 
##     POS information:    no 
##     Percent genotyped:  85 
## 
##     Segregation types:
##        AA : AB : BB -->  36
##         Not AA : AA -->  15
##         Not BB : BB -->  15
## 
##     No. traits:         1 
##     Missing trait values: 
##   Trait_1: 0

As you can see, the data consists of a sample of 200 individuals genotyped for 66 markers (36 co-dominant (AA, AB or BB), 15 dominant in one parent (Not AA or AA) and 15 dominant in the other parent (Not BB or BB) with 15% of missing data. You can also see that there is phenotypic information for one trait in the data set, that can be used for QTL mapping.

From OneMap raw file

The same procedure is made for OneMap raw file, but, instead of using the function read_mapmaker we use read_onemap to read the OneMap format.

In this example, an object named onemap_example_f2.raw was created. The data set containing the same markers and individual of the mapmaker_example_f2.raw file. Would be a good idea to open this two files in a text editor and compare them to better understand the differences between the two kinds of input files. We can read the onemap_example_f2.raw using:

Or, because this particular data are available together with OneMap package:

To see what this data set is about, type

##   This is an object of class 'onemap'
##     Type of cross:      f2 
##     No. individuals:    200 
##     No. markers:        66 
##     CHROM information:  no 
##     POS information:    no 
##     Percent genotyped:  85 
## 
##     Segregation types:
##        AA : AB : BB -->  36
##         Not AA : AA -->  15
##         Not BB : BB -->  15
## 
##     No. traits:         1 
##     Missing trait values: 
##   Trait_1: 0

As you can see, the mean difference in the output object is that read_onemap function keeps chromosome and position informations. Because the objects mapmaker_example_f2 and onemap_example_f2 are practically the same, from now we will use only onemap_example_f2.

Importing data from VCF file

If you are working with biallelic markers, as SNPs and indels (only codominant markers A.H.B), in VCF (Variant Call Format) files, you can import information from VCF to OneMap using onemap_read_vcfR function.

With the onemap_read_vcfR you can convert the object from vcfR package directly to onemap. The onemap_read_vcfR function keeps chromosome and position information for each marker at the end of raw file.

We will use the same example file vcf_example_f2.vcf to show how it works.

First, we convert the VCF file to vcfR object:

## 
##    *****       ***   vcfR   ***       *****
##    This is vcfR 1.8.0 
##      browseVignettes('vcfR') # Documentation
##      citation('vcfR') # Citation
##    *****       *****      *****       *****
## Scanning file to determine attributes.
## File attributes:
##   meta lines: 8
##   header_line: 9
##   variant count: 26
##   column count: 203
## 
Meta line 8 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 26
##   Character matrix gt cols: 203
##   skip: 0
##   nrows: 26
##   row_num: 0
## 
Processed variant: 26
## All variants processed

As described in the vcfR package vignette, memory use is a important consideration when using vcfR. Depending of your dataset, the object created can be very large and occupy a lot of memory.

After, you can use onemap_read_vcfR function to convert this object to onemap object. The parameters used is the vcfR.object we just created, the identification of each parent (here you must define only one sample for each parent) and the cross type.

Depending of your dataset, this function can take some time to run.

After the conversion, we can save the vcfR.object as a .RData and to remove it from the workspace, once it can occupy a lot of memory and turn the other process too slow.

From version 2.0.6 to 2.1.1005, OneMap had the vcf2raw function to convert vcf to .raw. Now, this function is defunct, but it can be replaced by a combination of onemap_read_vcfR and write_onemap_raw functions. See Exporting .raw file from onemap object session to further information about write_onemap_raw.

Visualization of raw data

Before building your linkage map, you should take a look at your data set. First, notice that by reading the raw data into OneMap, an object of classes onemap and f2 was produced:

## [1] "onemap" "f2"
##               f2 intercross 
##      "onemap"          "f2"

In fact, functions read_mapmaker and read_onemap will produce objects of classes backcross, riself, risib or f2, according to the information in the data file for inbred-based populations. Therefore, you can use OneMap’s version of function plot to produce a graphic with information about the raw data. It will automatically recognize the class of the object and produce the graphic. To see it in action, try:

The graphic is self-explanatory. If you want to save it, see the help for function plot.onemap:

This graphic shows that missing data is somehow randomly distributed; also, the proportion of dominant markers is relatively high for this data set. In OneMap’s notation, codominant markers are classified as of B type; dominant ones, by C type (for details about this notation, see the vignette for outcrossing species). You can see the number of loci within each type using function plot_by_segreg_type:

So, as shown before, the object onemap_example_f2 has 36 codominant markers and 30 dominant ones and the vcf_example_f2has only codominant markers.

Graphical view of genotypes and allele depths (new!)

Function create_depth_profile generates dispersion graphics with x and y axis representing, respectively, the reference and alternative allele depths. The function is only available for biallelic markers in VCF files with allele counts information. Each dot represent a genotype for mks markers and inds individuals. If both arguments receives NULL, all markers and individuals are considered. Dots are colored according with the genotypes present in OneMap object (GTfrom = onemap) or in VCF file (GTfrom = vcf). A rds file is generated with the data in the graphic (rds.file). The alpha argument control the transparency of color of each dot. Control this parameter is a good idea when having big amount of markers and individuals. The x_lim and y_lim control the axis scale limits, by default, it uses the maximum value of the counts.

Here is an example for the simulated dataset.

## Scanning file to determine attributes.
## File attributes:
##   meta lines: 4
##   header_line: 5
##   variant count: 30
##   column count: 212
## 
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
##   Character matrix gt rows: 30
##   Character matrix gt cols: 212
##   skip: 0
##   nrows: 30
##   row_num: 0
## 
Processed variant: 30
## All variants processed
## Summary of reference counts: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   28.00   51.00   64.86   85.00  429.00 
## Summary of alternative counts: 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00   13.00   35.00   44.15   64.00  344.00

Selecting genotypes from vcf (GTfrom = “vcf”) the colors are separated by VCF genotypes: 0/0 homozygote for the reference allele, 0/1 heterozygote and 1/1 homozygote for the alternative allele. Depending of your VCF, you can also have phased genotypes, which are presented by pipe (|) instead of bar (/).

Combining OneMap objects

If you have more than one dataset of markers, all from the same cross type, you can use the function combine_onemap to merge them into only one onemap object.

In our example, we have two onemap objects:

  • first: onemap_example_f2 (equivalent to mapmaker_example_f2) with 66 markers and 200 individuals
  • second: vcf_example_f2 with 25 biallelic markers and 192 individuals.

The combine_function recognizes the correspondent individuals by the ID, thus, it is important define exactly same IDs to respective individuals in both raw files. Compared with the first file, the second file do not have markers informations for 8 individuals. The combine_onemap will complete those informations with NA.

In our examples, we have only genotypic information, but the function can also merge the phenotypic information if it exists.

##   This is an object of class 'onemap'
##     Type of cross:      f2 
##     No. individuals:    200 
##     No. markers:        91 
##     CHROM information:  yes 
##     POS information:    yes 
##     Percent genotyped:  84 
## 
##     Segregation types:
##        AA : AB : BB -->  61
##         Not AA : AA -->  15
##         Not BB : BB -->  15
## 
##     No. traits:         1 
##     Missing trait values: 
##   Trait_1: 0

The function arguments are the names of the OneMap objects you want to combine.

Plotting markers genotypes from the outputted OneMap object, we can see that there are more missing data - (black vertical lines) for some individuals, because they were missing in the second file.

Find redundant markers

It is possible that there are redundant markers in your dataset, specially when dealing with too many markers. Redundant markers have the same genotypic informations that others markers, usually because didn’t happen recombination events between each other. They will not increase informations to the map, but will increase computational effort during the map building. Therefore, it is a good practice to remove them to build the map and, once the map is already builded, they can be added again.

First, we use the function find_bins to group the markers into bins according with their genotypic informations. In other words, markers with the same genotypic informations will be at the same bin.

## This is an object of class 'onemap_bin'
##     No. individuals:                         200 
##     No. markers in original dataset:         91 
##     No. of bins found:                       90 
##     Average of markers per bin:              1.011111 
##     Type of search performed:                non exact

The first argument is the OneMap object and the exact argument specify if only markers with exact same informations will be at the same bin. Using FALSE at this second argument, missing data will not be considered and the marker with the lowest amount of missing data will be the representative marker on the bin.

Our example dataset have only two redundant marker. We can create a new OneMap object without them, using the create_data_bins function. This function keeps only the most representative marker of each bin from bins object.

##   This is an object of class 'onemap'
##     Type of cross:      f2 
##     No. individuals:    200 
##     No. markers:        90 
##     CHROM information:  yes 
##     POS information:    yes 
##     Percent genotyped:  84 
## 
##     Segregation types:
##        AA : AB : BB -->  60
##         Not AA : AA -->  15
##         Not BB : BB -->  15
## 
##     No. traits:         1 
##     Missing trait values: 
##   Trait_1: 0

The arguments for create_data_bins function are the OneMap object and the object created by find_bins function.

Exporting .raw file from OneMap object

The functions onemap_read_vcfR generates new OneMap objects without use a input .raw file. Also, the function combine_onemap manipulates the information of the original .raw file and creates a new data set. In both cases, you do not have a input file .raw that contains the same informations of your current onemap object If you want to create a new input file with the data set you are working after using this functions, you can use the function write_onemap_raw.

The file new_dataset.raw will be generated in your working directory. In our example, it contains markers from onemap_example_f2 and vcf_example_f2 data sets.

Segregation tests

Now, it should be interesting to see if markers are segregating following what is expected by Mendel’s law. You first need to use function test_segregation using as argument an object of class onemap.

This will produce an object of class onemap_segreg_test:

## [1] "onemap_segreg_test"

You cannot see the results if you simply type the object name; use OneMap’s version of the print function for objects of class onemap_segreg_test:

(Nothing is shown!)

##    Marker    H0 Chi-square    p-value % genot.
## 1      M1 1:2:1  0.2068966 0.90172266     87.0
## 2      M2   3:1  0.2926829 0.58850636     82.0
## 3      M3   3:1  0.1240310 0.72470300     86.0
## 4      M4   3:1  0.2926829 0.58850636     82.0
## 5      M5   3:1  0.1597633 0.68937452     84.5
## 6      M6   3:1  0.6206897 0.43079112     87.0
## 7      M7 1:2:1  3.1851852 0.20339760     81.0
## 8      M8   3:1  0.1656442 0.68401234     81.5
## 9      M9   3:1  0.1269841 0.72157973     84.0
## 10    M10 1:2:1  1.7218935 0.42276165     84.5
## 11    M11   3:1  1.3909465 0.23824532     81.0
## 12    M12 1:2:1  1.6781609 0.43210768     87.0
## 13    M13   3:1  0.1542857 0.69447296     87.5
## 14    M14 1:2:1  5.3372093 0.06934892     86.0
## 15    M15   3:1  0.4032922 0.52539392     81.0
##  [ reached 'max' / getOption("max.print") -- omitted 75 rows ]

This shows the results of the Chi-square test for the expected Mendelian segregation pattern of each marker locus. This depends of course on marker type, because codominant markers can show heterozygous genotypes. The appropriate null hypothesis is selected by the function. The proportion of individuals genotyped is also shown.

To declare statistical significance, remember that you should consider that multiple tests are being performed. To guide you in the analysis, function Bonferroni_alpha shows the alpha value that should be considered for this number of loci if applying Bonferroni’s correction with global alpha of 0.05:

## [1] 0.0005555556

You can subset object f2_test to see which markers are distorted under Bonferroni’s criterion, but it is easier to see the proportion of markers that are distorted by drawing a graphic using OneMap’s version of the function plot for objects of class onemap_segreg_test:

The graphic is self-explanatory: p-values were transformed by using -log10(p-values) for better visualization. A vertical line shows the threshold for tests if Bonferroni’s correction is applied. Significant and non-significant tests are identified. In this particular example, no test was statistically significant, so none will be discarded.

Please, remember that Bonferroni’s correction is conservative, and also that discarding marker data might not be a good approach to your analysis. This graphic is just to suggest a criterion, so use it with caution.

You can see a list of markers with non-distorted segregation using function select_segreg:

##  [1] "M1"    "M2"    "M3"    "M4"    "M5"    "M6"    "M7"    "M8"    "M9"   
## [10] "M10"   "M11"   "M12"   "M13"   "M14"   "M15"   "M16"   "M17"   "M18"  
## [19] "M19"   "M20"   "M21"   "M22"   "M23"   "M24"   "M25"   "M26"   "M27"  
## [28] "M28"   "M29"   "M30"   "M31"   "M32"   "M33"   "M34"   "M35"   "M36"  
## [37] "M37"   "M38"   "M39"   "M40"   "M41"   "M42"   "M43"   "M44"   "M45"  
## [46] "M46"   "M47"   "M48"   "M49"   "M50"   "M51"   "M52"   "M53"   "M54"  
## [55] "M55"   "M56"   "M57"   "M58"   "M59"   "M60"   "M61"   "M62"   "M63"  
## [64] "M64"   "M65"   "M66"   "SNP1"  "SNP2"  "SNP3"  "SNP5"  "SNP6"  "SNP7" 
## [73] "SNP8"  "SNP9"  "SNP10"
##  [ reached getOption("max.print") -- omitted 15 entries ]

To get a list of distorted ones (none in this example):

## character(0)

It is not recommended, but you can define a different threshold value changing the threshold argument of the function select_segreg.

For the next steps will be useful to know the numbers of each markers with segregation distortion, so then you can keep those out of your map building analysis. These numbers refer to the lines where markers are located on the data file.

To access the corresponding number for of this markers you can change the numbers argument:

##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75
##  [ reached getOption("max.print") -- omitted 15 entries ]
## integer(0)

Estimating two-point recombination fractions

After visualizing raw data and checking for segregation distortion, let us now estimate recombination fractions between all pairs of markers (two-point tests). This is necessary to allow us to test which markers are linked. At this point, you should pay no attention if markers show segregation distortion or not, that is, simply use all of them.

There are two optional arguments in function rf_2pts: LOD and max.rf which indicate the minimum LOD Score and the maximum recombination fraction to declare linkage (they default to 3.0 and 0.5, respectively).

The default for the recombination fraction is easy to understand, because if max.rf < 0.5 we could state that markers are linked. The LOD Score is the statistic used to evaluate the significance of the test for max.rf = 0.50. This needs to take into consideration the number of tests performed, which of course depends on the number of markers. Function suggest_lod can help users to find an initial value to use for their linkage test. For this example:

## [1] 4.145858

Thus, one should consider using LOD = 4.145858 for the tests. Please, notice that this is just a guide, not a value to take without any further consideration. For now, we will keep the default values, but later will show that results do not change in our example by using LOD = 3 or LOD = 4.145858.

If you want to see the results for a single pair of markers, say M12 and M42, use:

##   Results of the 2-point analysis for markers: M12 and M42 
##   Criteria: LOD =  3 , Maximum recombination fraction =  0.5 
## 
##           rf     LOD
## CC 0.6842077 2.29369
## CR 0.3157923 2.29369
## RC 0.6842077 2.29369
## RR 0.3157923 2.29369

Since version 2.3.0, we estimate phases for F2 populations too, them here you can see the recombination fractions and LOD values for each possible phase. For RILs and backcross you will obtain only one value.

This was possible because OneMap has a version of the print function that can be applied to objects of class rf_2pts:

## [1] "rf_2pts" "f2"

However, objects of this type are too complex to print if you do not specify a pair of markers:

##   This is an object of class 'rf_2pts'
## 
##   Criteria: LOD = 3 , Maximum recombination fraction = 0.5 
## 
##   This object is too complex to print
##   Type 'print(object, c(mrk1=marker, mrk2=marker))' to see
##     the analysis for two markers
##     mrk1 and mrk2 can be the names or numbers of both markers

Strategies for this tutorial example

In this example we follow two different strategies:

  • Using only recombinations informations.

  • Using the recombinations and also the reference genome informations, once our example have CHROM and POS informations for some of the markers.

First, we will apply the strategy using only recombinations informations. In a second part of this tutorial we show a way to use also reference genome informations.

Using only recombinations informations

Assigning markers to linkage groups

To assign markers to linkage groups, first use the function make_seq to create a (un-ordered) sequence with all markers:

Function make_seq is used to create sequences from objects of several different classes. Here, the first argument is of class rf_2pts and the second argument specifies which markers one wants to use ("all" indicates that all markers will be analyzed). The object mark_all_f2 is of class sequence:

## [1] "sequence"

If you want to form groups with a subset of markers, say M1, M3 and M7, use:

In this case, it was easy because marker names and order in the objects (indicated in vector c(1, 3, 7)) are closed related, that is, you can easily know the position of markers in the object once you know their names. However, this is not true for real data sets, where markers do not have simple names such as M1 or M2.

A good example is to use the vector of markers without segregation distortion that we selected when applying the Chi-square tests.

In our example, there are no markers with segregation distortion, then the object mark_no_dist_f2 is equivalent to mark_all_f2.

Forming the groups

You can assign markers to linkage groups using the function group:

##    Selecting markers: 
##    group    1 
##     ............................................................
##     .............................
##   This is an object of class 'group'
##   It was generated from the object "mark_all_f2"
## 
##   Criteria used to assign markers to groups:
##     LOD = 3 , Maximum recombination fraction = 0.5 
## 
##   No. markers:            90 
##   No. groups:             1 
##   No. linked markers:     90 
##   No. unlinked markers:   0 
## 
##   Printing groups:
##   Group 1 : 90 markers
##     M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 SNP24 SNP26

This will show the linkage groups which are formed with criteria defined by max.rf and LOD. These criterias are applied as thresholds when assigning markers to linkage groups. If not modified, the same values used for the object twopts (from two-point analysis) will be maintained (so, LOD = 3.0 and max.rf = 0.5 in this example).

Users can easily change the default values. For example, using LOD suggested by suggest_lod (rounded up):

##    Selecting markers: 
##    group    1 
##     ............................................................
##     .............................
##   This is an object of class 'group'
##   It was generated from the object "mark_all_f2"
## 
##   Criteria used to assign markers to groups:
##     LOD = 4.145858 , Maximum recombination fraction = 0.5 
## 
##   No. markers:            90 
##   No. groups:             1 
##   No. linked markers:     90 
##   No. unlinked markers:   0 
## 
##   Printing groups:
##   Group 1 : 90 markers
##     M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 SNP24 SNP26

In our case, nothing different happen. (The parentheses above are just to avoid typing LGs_f2 in a new row to have the object printed).

We can see that the markers were assigned to only one linkage group. This division isn’t so trustful, mostly if you are also working with dominant markers. If you have reference genome information for some of your markers, you can separate the groups using it and group_seq function (further information in Using the recombinations and the reference genome informations) to add the markers without reference genome information. We can also confirm the separation of linkage groups in the ordering procedure.

Notice the class of object LGs_f2:

## [1] "group"

Ordering markers within linkage groups

After assignin markers to linkage groups, the next step is to order the markers within each group.

First, let us choose the mapping function used to display the genetic map. We can choose between Kosambi or Haldane mapping functions. To use Haldane, type

To use Kosambi’s function:

If none is setted, kosambi function is applied.

Linkage group 1

First, you must extract it from the object of class group. Let us extract the group 1 using function make_seq:

The first argument is an object of class group and the second is a number indicating which linkage group will be extracted. In this case, the object LGs_f2, generated by function group, is of class group. In fact, this function can handle different classes of objects.

If you type

## 
## Number of markers: 90
## Too many markers  - not printing their names
## 
## Parameters not estimated.

you will see which markers are comprised in the sequence. But notice that no parameters have been estimated so far (the function says Parameters not estimated). This refers to the fact that so far we only attributed markers to linkage groups, but we did not perform any analysis for them as a group - only as pairs. (Does it seem complicated? Do not worry, you will understand details in a moment).

Notice the class of object LG1_f2:

## [1] "sequence"

To order markers in this group, you can use a two-point based algorithm such as Seriation (Buetow and Chakravarti, 1987), Rapid Chain Delineation (Doerge, 1996), Recombination Counting and Ordering (Van Os et al., 2005) and Unidirectional Growth (Tan and Fu, 2006):

The algorithms provided different results (results not printed in this vignette). For an evaluation and comparison of these methods, see Mollinari et al. (2009).

In this particular case, seriation will return an expected error:

NOTE: (new!) If you are working with f2 intercross mapping population and have many markers (more than 60), we suggest to speed up seriation, rcd, record and ug using BatchMap parallelization approach. See section Speed up analysis with parallelization for more information.

Other method that has demonstrated be very efficiently in ordering markers uses multidimensional scaling (MDS) approach. OneMap has the mds_onemap function that makes interface with MDSMap package to apply this approach to order markers. This is particularly useful if you are dealing with many markers (up to 20). The method also provides diagnostics graphics and parameters to find outliers to help users to filter the dataset. You can find more informations in MDSMap vignette. Our mds_onemap do not present all possibilities of analysis that the MDSMap package presents. See help page ?mds_onemap to check which ones we implemented.

## Stress: 0.416784646843688
## Mean Nearest Neighbour Fit: 14.655917074491
## Markers omitted:

It is not the case of this example, but you can have an error like this:

ERROR: The linkage between markers 1 and 2 did not reached the OneMap default criteria. They are probably segregating independently. Use function map_avoid_unlinked to remove these markers automatically.

This means that some marker is not considered linked by the multipoint approach. Remember that mds, rcd, seriation, record and ug do not order the markers using all information available as the HMM chain uses. Therefore, when estimating the genetic distance with the HMM chain, maybe it finds inconsistencies in the ordering. To solve that, we can remove this marker and repeat the analysis, argument rm_unlinked = TRUE will do this automatically for you. It is available in all these ordering functions.

NOTE: (new!) If you are working with f2 intercross mapping population and have many markers (more than 60), we suggest to speed up mds using BatchMap parallelization approach. See section Speed up analysis with parallelization for more information.

By now, we ordered our group in several ways, but, which one result in the best order? We can check it plotting the color scale recombination fraction matrix and see if the generated maps obey the colors patterns expected.

Plotting the recombination fraction matrix

It is possible to plot the recombination fraction matrix and LOD Scores based on a color scale using the function rf_graph_table. This matrix can be useful to make some diagnostics about the map.

Ordered markers are presented in both axis. Hotter the colors lower the recombination fraction between markers related in each cell. Good map orders have hot colors in the diagonal and it is gradually turns to blue at the superior left and inferior right corners. This pattern means that markers that have low recombination fractions are really positioned closely in the map.

Let’s see the maps we have until now:

With default arguments, the graphic cells represents the recombination fractions. If you change the argument to graph.LOD = TRUE, LOD score values are plotted. The color scale varies from red (small distances and big LODs) to dark blue. You can also change the number of colors from rainbow palette with argument n.colors, add/remove graphic main and axis title (main and lab.xy), and shows marker numbers, instead of names in the axis (mrk.axis).

We can see that any of the algorithms gave a optimal ordering, there are a many red cells out of the diagonal, the color pattern is broke in all graphics, but we need to do an effort to find which one of the algorithms gave the best result. Then, we continue the analysis doing hand manipulations.

Here, ug and record approaches are the ones that gave results more close to the expected pattern. We can also see that probably there are more than one group, because of the big gaps. Let’s see the map generated by ug algorithm with more details using interactive mode of rf_graph_table function:

An interactive version of the graphic will pop up (not shown here) in your internet browser end generated a html file in your work directory. Hover the mouse cursor over the cell corresponding to two markers, you can see some useful information about them.

Markers from 89 to 11 seems to form a separated group (we will call this group LG1). Markers from 23 to 55 will be our LG2. Markers from 39 to 50 show strong evidence that constitute other separated group, including markers (LG3).

Let’s separate them and order again using the same method:

## 
## order obtained using UG algorithm:
## 
##  89 90 33 37 8 51 25 5 61 66 54 45 32 43 11 2 10 41 
## 
## calculating multipoint map using tol  1e-04 .

## 
## order obtained using UG algorithm:
## 
##  23 29 60 68 44 67 36 40 26 34 63 31 17 75 12 73 74 58 35 70 71 72 7 13 6 30 69 1 4 42 46 53 3 9 27 55 
## 
## calculating multipoint map using tol  1e-04 .

## 
## order obtained using UG algorithm:
## 
##  47 39 38 19 49 59 28 80 76 78 85 14 82 83 16 65 79 77 62 15 21 24 20 52 81 84 64 48 57 22 18 86 87 88 56 50 
## 
## calculating multipoint map using tol  1e-04 .

Besides the groups now is better separated, the order is still not the best we can do. With less markers in each group, we can apply a multipoint strategy to better order those markers, starting with group 1.

Linkage group 1

When possible (i.e., when groups have a small number of markers, in general up to 10 or 11), one should select the best order by comparing the multipoint likelihood of all possible orders between markers (exhaustive search). This procedure is implemented in the function compare. Although feasible for up to 10 or 11 markers, with 7 or more markers it will take a couple of hours until you see the results (depending of course on the computational resources available).

All our groups have more than 7 markers, so using function compare is infeasible. Thus we will apply a heuristic that shows reliable results. First, we will choose a moderate number of markers, say 6, to create a framework using the function compare, and then we will position the remaining markers into this framework using function try_seq. The way we choose these initial markers in inbred-based populations is somewhat different from what we did for outcrossing populations, where there is a mixture of segregation patterns (see the vignette for details).

In our scenario, we recommend two methods:

  1. Randomly choose a number of markers and calculate the multipoint likelihood of all possible orders (using the function compare). If the LOD Score of the second best order is greater than a given threshold, say, 3, then take the best order to proceed with the next step. If not, repeat the procedure.
  2. Use some two-point based algorithm to construct a map; then, take equally spaced markers from this map. Then, create a framework of ordered markers using the function compare. Next, try to map the remaining markers, one at a time, beginning with co-dominant ones (most informative ones), then add the dominant ones.

You can do this procedure manually, in a similar way as done for outcrossing species (see the vignette for details). However, this procedure is automated in function order_seq, which we will use here (it will take some time to run):

## 
## Cross type: f2 
## Using segregation types of the markers to choose initial subset
## 
## Comparing 60 orders:     
## 
## 
  |                                                                            
  |                                                                      |   0%    
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
## 
## 
## 
## Running try algorithm
## 61 --> M61   : ......
## 66 --> M66   : .......
## 54 --> M54   : .......
## 43 --> M43   : ........
## 10 --> M10   : .........
## 41 --> M41   : ..........
##  8 --> M8    : ...........
## 51 --> M51   : ............
##  5 --> M5    : .............
## 45 --> M45   : .............
## 11 --> M11   : ..............
##  2 --> M2    : ...............
## 32 --> M32   : ...............
## 
## LOD threshold = 3 
## 
## Positioned markers: 89 33 90 37 8 51 25 61 54 45 32 43 11 10 41 
## 
## Markers not placed on the map: 5 66 2 
## 
## 
## Calculating LOD-Scores
##  5 --> M5    : ................
## 66 --> M66   : ................
##  2 --> M2    : ................
## 
## 
## Placing remaining marker(s) at most likely position
## 66 --> M66   : ................
##  5 --> M5    : .................
##  2 --> M2    : ..................
## 
## Estimating final genetic map using tol = 10E-5.

The first argument is an object of class sequence (LG1_ug2_f2). n.init = 5 means that five markers will be used in the compare step. The argument subset.search = "twopt" indicates that these five markers should be chosen by using a two point method, which will be Rapid Chain Delineation, as indicated by the argument twopt.alg = "rcd". THRES = 3 indicates that the try_seq step will only add markers to the sequence which can be mapped with LOD Score greater than 3.

Check the order obtained by this procedure:

## 
## Best sequence found.
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 89 SNP24              0.00           a |  | b       a |  | b 
## 33 M33                5.53           a |  | b       a |  | b 
## 90 SNP26              6.97           a |  | b       a |  | b 
## 37 M37               13.32           a |  | b       a |  | b 
##  8 M8                19.35           a |  | o       o |  | o 
## 51 M51               31.93           o |  | a       o |  | o 
## 25 M25               37.58           a |  | b       a |  | b 
## 61 M61               43.31           a |  | b       a |  | b 
## 54 M54               51.76           a |  | b       a |  | b 
## 45 M45               54.48           a |  | o       o |  | o 
## 32 M32               59.07           o |  | o       o |  | a 
## 43 M43               63.48           a |  | b       a |  | b 
## 11 M11               67.27           a |  | o       o |  | o 
## 10 M10               75.42           a |  | b       a |  | b 
## 41 M41               83.65           a |  | b       a |  | b 
## 
## 15 markers            log-likelihood: -1196.176 
## 
## 
## 
## The following markers could not be uniquely positioned.
## Printing most likely positions for each unpositioned marker:
## 
## ------------------------ 
## |    |   5 |  66 |   2 |
## |----|-----|-----|-----| 
## |    |     |     |     |
## | 89 |     |     |     | 
## |    |     |     |     |
## | 33 |     |     |     | 
## |    |     |     |     |
## | 90 |     |     |     | 
## |    |     |     |     |
## | 37 |     |     |     | 
## |    |     |     |     |
## |  8 |     |     |     | 
## |    |     |     |     |
## | 51 |     |     |     | 
## |    | *** |     |     |
## | 25 |     |     |     | 
## |    |     |     |     |
## | 61 |     |     |     | 
## |    |     | *** |     |
## | 54 |     |     |     | 
## |    |     |     |     |
## | 45 |     |     |     | 
## |    |     |     |     |
## | 32 |     |     |     | 
## |    |     |     |     |
## | 43 |     |     |     | 
## |    |     |     | *   |
## | 11 |     |     |     | 
## |    |     |     | *** |
## | 10 |     |     |     | 
## |    |     |     |     |
## | 41 |     |     |     | 
## |    |     |     |     |
## ------------------------ 
## 
## '***' indicates the most likely position(s) (LOD = 0.0)
## 
## '**' indicates very likely positions (LOD > -1.0)
## 
## '*' indicates likely positions (LOD > -2.0)

Note that markers 5, 66 and 2 could not be safely mapped to a single position (LOD Score > THRES in absolute value). The output displays the safe order and the most likely positions for markers not mapped, where *** indicates the most likely position, and * corresponds to other plausible positions. (If you are familiar with MAPMAKER/EXP, you will recognize the representation).

To get the safe order, use

and to get the order with all markers (i.e., including the ones not mapped to a single position), use:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 89 SNP24              0.00           a |  | b       a |  | b 
## 33 M33                5.53           a |  | b       a |  | b 
## 90 SNP26              6.97           a |  | b       a |  | b 
## 37 M37               13.33           a |  | b       a |  | b 
##  8 M8                19.15           a |  | o       o |  | o 
## 51 M51               30.87           o |  | a       o |  | o 
##  5 M5                34.62           a |  | o       o |  | o 
## 25 M25               37.24           a |  | b       a |  | b 
## 61 M61               42.79           a |  | b       a |  | b 
## 66 M66               44.47           a |  | b       a |  | b 
## 54 M54               51.42           a |  | b       a |  | b 
## 45 M45               53.93           a |  | o       o |  | o 
## 32 M32               58.95           o |  | o       o |  | a 
## 43 M43               63.28           a |  | b       a |  | b 
## 11 M11               66.80           a |  | o       o |  | o 
##  2 M2                69.58           a |  | o       o |  | o 
## 10 M10               77.02           a |  | b       a |  | b 
## 41 M41               85.50           a |  | b       a |  | b 
## 
## 18 markers            log-likelihood: -1256.273

which places markers 5, 66 and 2 into their most likely positions.

Although some old publications presented maps with only safe orders, we see no reason not to use option force, and recommend it for users. In next, steps we can make modifications in the map based in more information.

The order_seq function can perform two rounds of the try_seq step, first using THRES and then THRES - 1 as the threshold. This generally results in safe orders with more markers mapped, but takes longer to run. To do this, type (it will take some time to run):

## 
## Cross type: f2 
## Using segregation types of the markers to choose initial subset
## 
## Comparing 60 orders:     
## 
## 
  |                                                                            
  |                                                                      |   0%    
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
## 
## 
## 
## Running try algorithm
## 61 --> M61   : ......
## 66 --> M66   : .......
## 54 --> M54   : .......
## 43 --> M43   : ........
## 10 --> M10   : .........
## 41 --> M41   : ..........
##  8 --> M8    : ...........
## 51 --> M51   : ............
##  5 --> M5    : .............
## 45 --> M45   : .............
## 11 --> M11   : ..............
##  2 --> M2    : ...............
## 32 --> M32   : ...............
## 
## LOD threshold = 3 
## 
## Positioned markers: 89 33 90 37 8 51 25 61 54 45 32 43 11 10 41 
## 
## Markers not placed on the map: 5 66 2 
## 
## 
## 
## Trying to map remaining markers with LOD threshold  2 
##  5 --> M5    : ................
## 66 --> M66   : .................
##  2 --> M2    : ..................
## 
## LOD threshold = 2 
## 
## Positioned markers: 89 33 90 37 8 51 5 25 61 66 54 45 32 43 11 10 41 
## 
## Markers not placed on the map: 2 
## 
## 
## Calculating LOD-Scores
##  2 --> M2    : ..................
## 
## 
## Placing remaining marker(s) at most likely position
##  2 --> M2    : ..................
## 
## Estimating final genetic map using tol = 10E-5.

The output is too big to be included here, so please try it to see what happens. In short, for this particular sequence, the touchdown step could additionally map markers 5 and 66, but this depends on the dataset. Let us continue our analysis using the order with all markers as suggested by the function order_seq:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 89 SNP24              0.00           a |  | b       a |  | b 
## 33 M33                5.53           a |  | b       a |  | b 
## 90 SNP26              6.97           a |  | b       a |  | b 
## 37 M37               13.33           a |  | b       a |  | b 
##  8 M8                19.15           a |  | o       o |  | o 
## 51 M51               30.87           o |  | a       o |  | o 
##  5 M5                34.62           o |  | a       o |  | o 
## 25 M25               37.24           a |  | b       a |  | b 
## 61 M61               42.79           a |  | b       a |  | b 
## 66 M66               44.47           a |  | b       a |  | b 
## 54 M54               51.42           a |  | b       a |  | b 
## 45 M45               53.93           a |  | o       o |  | o 
## 32 M32               58.95           o |  | o       o |  | a 
## 43 M43               63.28           a |  | b       a |  | b 
## 11 M11               66.80           a |  | o       o |  | o 
##  2 M2                69.58           a |  | o       o |  | o 
## 10 M10               77.02           a |  | b       a |  | b 
## 41 M41               85.50           a |  | b       a |  | b 
## 
## 18 markers            log-likelihood: -1256.273

Finally, to check for alternative orders, use the ripple_seq function:

The second argument, ws = 5, means that subsets (windows) of five markers will be permuted sequentially (5! orders for each window), to search for other plausible orders. The LOD argument means that only orders with LOD Score smaller than 3 will be printed.

The output shows sequences of five numbers, because ws = 5. They can be followed by an OK, if there are no alternative orders with LOD Scores smaller than LOD = 3 in absolute value, or by a list of alternative orders.

In this example, the first two windows showed alternative orders with LOD smaller than LOD = 3. However, the best order was that obtained with the order_seq function (LOD = 0.00). If there were an alternative order more likely than the original, one should check the difference between them and, if necessary, change the order with (for example) functions drop_marker (see Section about using an arbitrary order) and try_seq, or simply by typing the new order. For that, use LG2_f2_final$seq.num to obtain the original order; then make the necessary changes (by copying and pasting) and use the function map to reestimate the genetic map for the new order.

Linkage group 2

Here we will use multipoint approach to re-order the LG2. The approach is the automatic usage of the try algorithm:

The second round of try_seq added markers 23, 29, 67, 44, 68, 36, 40, 26, 63, 31, 17, 12, 75, 74, 58, 35, 13, 6, 70, 72, 30, 69, 1, 46, 42, 3, 27 and 55 (try it; results not shown).

Get the order with all markers:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 23 M23                0.00           o |  | o       a |  | o 
## 60 M60                7.35           a |  | o       o |  | o 
## 29 M29               10.70           a |  | b       b |  | a 
## 67 SNP1              19.28           a |  | b       b |  | a 
## 44 M44               21.93           a |  | b       b |  | a 
## 68 SNP2              23.81           a |  | b       b |  | a 
## 36 M36               26.56           o |  | o       o |  | a 
## 40 M40               33.51           a |  | b       b |  | a 
## 26 M26               37.16           a |  | b       b |  | a 
## 63 M63               44.94           a |  | b       b |  | a 
## 34 M34               44.94           o |  | o       o |  | a 
## 31 M31               53.09           o |  | o       o |  | a 
## 17 M17               57.15           a |  | b       b |  | a 
## 73 SNP8              62.81           a |  | b       b |  | a 
## 12 M12               64.25           a |  | b       b |  | a 
## 75 SNP10             66.14           a |  | b       b |  | a 
## 74 SNP9              72.51           a |  | b       b |  | a 
## 58 M58               80.93           a |  | o       o |  | o 
## 35 M35               87.85           a |  | o       o |  | o 
## 13 M13               94.66           o |  | o       o |  | a 
##  6 M6                99.87           o |  | o       o |  | a 
## 71 SNP6             106.84           a |  | b       b |  | a 
## 70 SNP5             112.25           a |  | b       b |  | a 
##  7 M7               113.93           a |  | b       b |  | a 
## 72 SNP7             117.11           a |  | b       b |  | a 
## 30 M30              127.43           a |  | b       b |  | a 
## 69 SNP3             130.22           a |  | b       b |  | a 
##  1 M1               135.35           a |  | b       b |  | a 
## 46 M46              143.90           a |  | b       b |  | a 
## 53 M53              148.09           o |  | o       o |  | a 
## 42 M42              149.84           a |  | o       o |  | o 
##  4 M4               156.20           a |  | o       o |  | o 
##  3 M3               157.93           o |  | o       o |  | a 
##  9 M9               168.74           o |  | o       o |  | a 
## 27 M27              170.85           a |  | b       b |  | a 
## 55 M55              178.49           a |  | b       b |  | a 
## 
## 36 markers            log-likelihood: -2470.256

Our heatmap is already very good with automatic approach, but I may want to see what happen if I remove a marker and try to reposition it. To remove a marker use function drop_marker:

After, you need to reestimate the parameters using the same previous order. For that, use map function:

NOTE: (new!) If you are working with f2 intercross mapping population and have many markers (more than 60), we suggest to speed up map using BatchMap parallelization approach. See section Speed up analysis with parallelization for more information.

See what happened:

And try to reposition the marker:

## 23 --> M23   : ....................................
## 
## LOD scores correspond to the best linkage phase combination
## for each position
## 
## The symbol "*" outside the box indicates that more than one
## linkage phase is possible for the corresponding position
## 
## 
##        Marker tested: 23
## 
##        Markers     LOD
##      =====================
##      |                   |
##      |             0.00  |  1  *
##      |  60               | 
##      |            -7.47  |  2  *
##      |  29               | 
##      |           -10.94  |  3  *
##      |  67               | 
##      |           -33.03  |  4  *
##      |  44               | 
##      |           -36.37  |  5  *
##      |  68               | 
##      |           -25.68  |  6  *
##      |  36               | 
##      |           -25.91  |  7  *
##      |  40               | 
##      |           -51.10  |  8  *
##      |  26               | 
##      |           -49.10  |  9  *
##      |  63               | 
##      |           -64.40  |  10  *
##      |  34               | 
##      |           -49.74  |  11  *
##      |  31               | 
##      |           -59.98  |  12  *
##      |  17               | 
##      |           -65.67  |  13  *
##      |  73               | 
##      |           -80.64  |  14  *
##      |  12               | 
##      |           -80.00  |  15  *
##      |  75               | 
##      |           -65.74  |  16  *
##      |  74               | 
##      |           -43.80  |  17  *
##      |  58               | 
##      |           -41.83  |  18  *
##      |  35               | 
##      |           -36.25  |  19  *
##      |  13               | 
##      |           -63.11  |  20  *
##      |   6               | 
##      |           -61.07  |  21  *
##      |  71               | 
##      |           -75.71  |  22  *
##      |  70               | 
##      |           -83.75  |  23  *
##      |   7               | 
##      |           -82.03  |  24  *
##      |  72               | 
##      |           -62.13  |  25  *
##      |  30               | 
##      |           -90.26  |  26  *
##      |  69               | 
##      |           -77.84  |  27  *
##      |   1               | 
##      |           -75.91  |  28  *
##      |  46               | 
##      |           -83.76  |  29  *
##      |  53               | 
##      |           -62.39  |  30  *
##      |  42               | 
##      |           -59.54  |  31  *
##      |   4               | 
##      |           -53.67  |  32  *
##      |   3               | 
##      |           -46.96  |  33  *
##      |   9               | 
##      |           -49.06  |  34  *
##      |  27               | 
##      |           -47.35  |  35  *
##      |  55               | 
##      |           -16.48  |  36  *
##      |                   |
##      =====================

The result show us that the best position for this marker (with higher LOD) is the 1 (as before). Let’s now include the marker at this position using:

Check the final map (results not shown):

Print it:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 23 M23                0.00           o |  | o       a |  | o 
## 60 M60               20.56           a |  | o       o |  | o 
## 29 M29               23.28           a |  | b       b |  | a 
## 67 SNP1              31.78           a |  | b       b |  | a 
## 44 M44               34.45           a |  | b       b |  | a 
## 68 SNP2              36.34           a |  | b       b |  | a 
## 36 M36               39.16           o |  | o       o |  | a 
## 40 M40               46.12           a |  | b       b |  | a 
## 26 M26               49.76           a |  | b       b |  | a 
## 63 M63               57.55           a |  | b       b |  | a 
## 34 M34               57.65           o |  | o       o |  | a 
## 31 M31               65.81           o |  | o       a |  | o 
## 17 M17               69.86           a |  | b       b |  | a 
## 73 SNP8              75.52           a |  | b       b |  | a 
## 12 M12               76.97           a |  | b       b |  | a 
## 75 SNP10             78.85           a |  | b       b |  | a 
## 74 SNP9              85.22           a |  | b       b |  | a 
## 58 M58               93.64           a |  | o       o |  | o 
## 35 M35              100.56           a |  | o       o |  | o 
## 13 M13              107.37           o |  | o       a |  | o 
##  6 M6               112.58           o |  | o       a |  | o 
## 71 SNP6             119.56           a |  | b       b |  | a 
## 70 SNP5             124.96           a |  | b       b |  | a 
##  7 M7               126.64           a |  | b       b |  | a 
## 72 SNP7             129.82           a |  | b       b |  | a 
## 30 M30              140.15           a |  | b       b |  | a 
## 69 SNP3             142.93           a |  | b       b |  | a 
##  1 M1               148.06           a |  | b       b |  | a 
## 46 M46              156.61           a |  | b       b |  | a 
## 53 M53              160.81           o |  | o       o |  | a 
## 42 M42              162.55           a |  | o       o |  | o 
##  4 M4               168.91           a |  | o       o |  | o 
##  3 M3               170.65           o |  | o       a |  | o 
##  9 M9               181.46           o |  | o       a |  | o 
## 27 M27              183.56           a |  | b       b |  | a 
## 55 M55              191.20           a |  | b       b |  | a 
## 
## 36 markers            log-likelihood: -2476.71

This is the final version of the map for this linkage group.

Linkage group 3

Automatic usage of try algorithm.

A careful examination of the graphics can be a good source of information about how markers were placed.

Now, get the order with all markers:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 47 M47                0.00           a |  | b       a |  | b 
## 19 M19                7.68           a |  | o       o |  | o 
## 39 M39                8.98           o |  | o       o |  | a 
## 38 M38               15.59           a |  | b       a |  | b 
## 59 M59               23.86           a |  | o       o |  | o 
## 49 M49               24.32           o |  | o       a |  | o 
## 76 SNP11             32.50           a |  | b       a |  | b 
## 28 M28               33.81           a |  | b       a |  | b 
## 80 SNP15             36.11           a |  | b       a |  | b 
## 78 SNP13             42.82           a |  | b       a |  | b 
## 85 SNP20             52.38           a |  | b       a |  | b 
## 83 SNP18             57.63           a |  | b       a |  | b 
## 14 M14               59.25           a |  | b       a |  | b 
## 82 SNP17             61.56           a |  | b       a |  | b 
## 16 M16               67.72           a |  | b       a |  | b 
## 65 M65               72.93           a |  | b       a |  | b 
## 77 SNP12             78.35           a |  | b       a |  | b 
## 79 SNP14             88.64           a |  | b       a |  | b 
## 62 M62               98.09           a |  | b       a |  | b 
## 15 M15              101.64           a |  | o       o |  | o 
## 21 M21              105.74           a |  | b       a |  | b 
## 24 M24              107.72           o |  | o       a |  | o 
## 20 M20              114.56           o |  | o       a |  | o 
## 50 M50              138.46           o |  | o       a |  | o 
## 56 M56              141.91           a |  | b       a |  | b 
## 87 SNP22            153.60           a |  | b       a |  | b 
## 86 SNP21            159.91           a |  | b       a |  | b 
## 18 M18              163.36           a |  | b       a |  | b 
## 88 SNP23            164.83           a |  | b       a |  | b 
## 22 M22              170.26           a |  | b       a |  | b 
## 57 M57              173.68           a |  | b       a |  | b 
## 48 M48              179.82           a |  | b       a |  | b 
## 64 M64              185.51           a |  | o       o |  | o 
## 52 M52              187.45           a |  | b       a |  | b 
## 84 SNP19            189.98           a |  | b       a |  | b 
## 81 SNP16            196.26           a |  | b       a |  | b 
## 
## 36 markers            log-likelihood: -2633.912

Check heatmap:

Markers 34, 39, 50, 56, 20, 24 and 64 seems to broke the color pattern. We will remove them and use try_seq to find better locations:

## 
## Cross type: f2 
## Using segregation types of the markers to choose initial subset
## 
## Comparing 60 orders:     
## 
## 
  |                                                                            
  |                                                                      |   0%    
  |                                                                            
  |=                                                                     |   2%
  |                                                                            
  |==                                                                    |   3%
  |                                                                            
  |====                                                                  |   5%
  |                                                                            
  |=====                                                                 |   7%
  |                                                                            
  |======                                                                |   8%
  |                                                                            
  |=======                                                               |  10%
  |                                                                            
  |========                                                              |  12%
  |                                                                            
  |=========                                                             |  13%
  |                                                                            
  |==========                                                            |  15%
  |                                                                            
  |============                                                          |  17%
  |                                                                            
  |=============                                                         |  18%
  |                                                                            
  |==============                                                        |  20%
  |                                                                            
  |===============                                                       |  22%
  |                                                                            
  |================                                                      |  23%
  |                                                                            
  |==================                                                    |  25%
  |                                                                            
  |===================                                                   |  27%
  |                                                                            
  |====================                                                  |  28%
  |                                                                            
  |=====================                                                 |  30%
  |                                                                            
  |======================                                                |  32%
  |                                                                            
  |=======================                                               |  33%
  |                                                                            
  |========================                                              |  35%
  |                                                                            
  |==========================                                            |  37%
  |                                                                            
  |===========================                                           |  38%
  |                                                                            
  |============================                                          |  40%
  |                                                                            
  |=============================                                         |  42%
  |                                                                            
  |==============================                                        |  43%
  |                                                                            
  |================================                                      |  45%
  |                                                                            
  |=================================                                     |  47%
  |                                                                            
  |==================================                                    |  48%
  |                                                                            
  |===================================                                   |  50%
  |                                                                            
  |====================================                                  |  52%
  |                                                                            
  |=====================================                                 |  53%
  |                                                                            
  |======================================                                |  55%
  |                                                                            
  |========================================                              |  57%
  |                                                                            
  |=========================================                             |  58%
  |                                                                            
  |==========================================                            |  60%
  |                                                                            
  |===========================================                           |  62%
  |                                                                            
  |============================================                          |  63%
  |                                                                            
  |==============================================                        |  65%
  |                                                                            
  |===============================================                       |  67%
  |                                                                            
  |================================================                      |  68%
  |                                                                            
  |=================================================                     |  70%
  |                                                                            
  |==================================================                    |  72%
  |                                                                            
  |===================================================                   |  73%
  |                                                                            
  |====================================================                  |  75%
  |                                                                            
  |======================================================                |  77%
  |                                                                            
  |=======================================================               |  78%
  |                                                                            
  |========================================================              |  80%
  |                                                                            
  |=========================================================             |  82%
  |                                                                            
  |==========================================================            |  83%
  |                                                                            
  |============================================================          |  85%
  |                                                                            
  |=============================================================         |  87%
  |                                                                            
  |==============================================================        |  88%
  |                                                                            
  |===============================================================       |  90%
  |                                                                            
  |================================================================      |  92%
  |                                                                            
  |=================================================================     |  93%
  |                                                                            
  |==================================================================    |  95%
  |                                                                            
  |====================================================================  |  97%
  |                                                                            
  |===================================================================== |  98%
  |                                                                            
  |======================================================================| 100%
## 
## 
## 
## Running try algorithm
## 78 --> SNP13 : ......
## 85 --> SNP20 : ......
## 83 --> SNP18 : ......
## 14 --> M14   : ......
## 82 --> SNP17 : ......
## 16 --> M16   : ......
## 65 --> M65   : .......
## 77 --> SNP12 : ........
## 79 --> SNP14 : .........
## 62 --> M62   : .........
## 21 --> M21   : .........
## 87 --> SNP22 : ..........
## 86 --> SNP21 : ...........
## 18 --> M18   : ...........
## 88 --> SNP23 : ...........
## 22 --> M22   : ...........
## 57 --> M57   : ............
## 48 --> M48   : .............
## 52 --> M52   : ..............
## 84 --> SNP19 : ...............
## 81 --> SNP16 : ................
## 19 --> M19   : .................
## 59 --> M59   : .................
## 15 --> M15   : ..................
## 49 --> M49   : ...................
## 
## LOD threshold = 3 
## 
## Positioned markers: 47 38 59 76 28 80 16 65 77 15 21 81 84 52 48 57 22 87 
## 
## Markers not placed on the map: 19 49 78 85 83 14 82 79 62 86 18 88 
## 
## 
## Calculating LOD-Scores
## 19 --> M19   : ...................
## 49 --> M49   : ...................
## 78 --> SNP13 : ...................
## 85 --> SNP20 : ...................
## 83 --> SNP18 : ...................
## 14 --> M14   : ...................
## 82 --> SNP17 : ...................
## 79 --> SNP14 : ...................
## 62 --> M62   : ...................
## 86 --> SNP21 : ...................
## 18 --> M18   : ...................
## 88 --> SNP23 : ...................
## 
## 
## Placing remaining marker(s) at most likely position
## 14 --> M14   : ...................
## 18 --> M18   : ....................
## 82 --> SNP17 : .....................
## 83 --> SNP18 : ......................
## 88 --> SNP23 : .......................
## 19 --> M19   : ........................
## 85 --> SNP20 : .........................
## 62 --> M62   : ..........................
## 79 --> SNP14 : ...........................
## 86 --> SNP21 : ............................
## 78 --> SNP13 : .............................
## 49 --> M49   : ..............................
## 
## Estimating final genetic map using tol = 10E-5.

Adding one by one we can see how each one change the map log-likelihood, map size and the color pattern in heatmap and judge if we will remove them.

## 34 --> M34   : ...............................
## 39 --> M39   : ...............................
## 50 --> M50   : ................................
## 56 --> M56   : ................................
## 20 --> M20   : ................................
## 24 --> M24   : .................................
## 64 --> M64   : ..................................

Check the final map (not shown):

The fifth window presented alternative orders that seems better than the current one. Let’s try to substitute the current order by the best presented by ripple. See that, instead of 59-49-76-28-80 we have 59-49-76-80-28. Let’s find this window and substitute the sequence:

Now, we estimate the distances for this already known order using map:

Print it:

Using the recombinations and the reference genome informations

In our example, we have reference genome chromosome and position informations for some of the markers, here we will exemplify one method of using this informations to help build the genetic map.

With the CHROM informations in the input file, you can identify markers belonging to some chromosome using the function make_seq with the rf_2pts object. For example, assign the string "1" for the second argument to get chromosome 1 makers. The output sequence will be automatically ordered by POS informations.

## 
## Number of markers: 9
## Markers in the sequence:
## SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
## 
## Parameters not estimated.

Adding markers with no reference genome informations

According to CHROM informations we have three defined linkage groups, now we can try to group the markers without chromosome informations to them using recombination informations. For this, we can use the function group_seq:

##    Selecting markers: 
##    group    1 
##     ............................................................
##     ..............
##    Selecting markers: 
##    group    1 
##     ............................................................
##     ..................
##    Selecting markers: 
##    group    1 
##     ............................................................
##     .......
## There are one or more markers that grouped in more than one sequence

The function works as the function group, but considering pre existing sequences. Setting seqs argument with the string "CHROM", it will considered the pre existing sequences according to CHROM information. You can also indicate others pre existing sequences if it make sense for your study. For that, you should inform a list with objects of class sequences, as the example:

In this case, the command had the same effect of the previous, because we indicate chromosome sequences, but others sequences can be used.

The unlink.mks argument receive a object of class sequence, this define which markers will be tested to group with the sequences in seqs. In our example, we will indicate only the markers with no segregation distortion, using the sequence mark_no_dist. It is also possible to use the string "all" to test all the remaining markers at the rf_2pts object.

In some cases, the same marker can group to more than one sequence, those markers will be considered repeated. We can choose if we want to remove or not (FALSE/TRUE) them of the output sequences, with the argument rm.repeated. Anyway, their numbers will be informed at the list repeated in the output object. In the example case, there are no repeated markers. However, if they exists, it could indicate that their groups actually constitute the same group. Also, genotyping errors can generate repeated markers. Anyway, they deserves better investigations.

We can access detailed information about the results just printing:

##   This is an object of class 'group_seq'
##   Criteria used to assign markers to groups:
##     LOD = , Maximum recombination fraction = 
## 
##   No. markers in input sequences:
##                        CHR1 :   9 markers
##                        CHR2 :   13 markers
##                        CHR3 :   2 markers
## 
##   No. unlinked input markers:   66 markers
## 
##   No. markers in output sequences:
##                        CHR1 :   9 markers
##                        CHR2 :   13 markers
##                        CHR3 :   2 markers
##   No. unlinked:                 0 markers
##   No. repeated:                 66 markers
## 
##   Printing output sequences:
##   Group CHR1 : 9 markers
##     SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10 
## 
##   Group CHR2 : 13 markers
##     SNP11 SNP12 SNP13 SNP14 SNP15 SNP16 SNP17 SNP18 SNP19 SNP20 SNP21 SNP22 SNP23 
## 
##   Group CHR3 : 2 markers
##     SNP24 SNP26 
## 
##   Unlinked markers: 0 markers
##     
## 
##   Repeated markers: 66  markers
##      
##   Group CHR1 : 66 markers
##     M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 
## 
##   Group CHR2 : 66 markers
##     M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66 
## 
##   Group CHR3 : 66 markers
##     M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11 M12 M13 M14 M15 M16 M17 M18 M19 M20 M21 M22 M23 M24 M25 M26 M27 M28 M29 M30 M31 M32 M33 M34 M35 M36 M37 M38 M39 M40 M41 M42 M43 M44 M45 M46 M47 M48 M49 M50 M51 M52 M53 M54 M55 M56 M57 M58 M59 M60 M61 M62 M63 M64 M65 M66

Also, we can access the numbers of repeated markers with:

## $CHR1
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
## 
## $CHR2
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66
## 
## $CHR3
##  [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
## [26] 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
## [51] 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66

The same way, we can access the output sequences:

## 
## Number of markers: 9
## Markers in the sequence:
## SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
## 
## Parameters not estimated.
## 
## Number of markers: 9
## Markers in the sequence:
## SNP1 SNP2 SNP3 SNP5 SNP6 SNP7 SNP8 SNP9 SNP10
## 
## Parameters not estimated.

For this function, optional arguments are LOD and max.rf, which define thresholds to be used when assigning markers to linkage groups. If none provided (default), criteria previously defined for the object rf_2pts are used.

In our example, all markers without genomic information grouped with all chromosomes, then this approach did not give us more information about grouping and from here we could follow the same strategy did in Ordering markers within linkage groups.

Map estimation for an arbitrary order

If you have some information about the order of the markers, for example, from a reference genome or previously published paper, you can define a sequence of those markers in a specific order (using the function make_seq) and then use the function map to estimate the final genetic map (based on multi point analysis). For example, let us assume that we know that the following markers are ordered in this sequence: M47, M38, M59, M16, M62, M21, M20, M48 and M22. In this case, select them from the two point analysis, and use function map:

Warning: If you find an error message like:

Error in as_mapper(.f, ...) : argument ".f" is missing, with no default

It’s because the map function has a very common name and you can have in your environment other function with the same name. In the case of the presented error, R is using map function from purrr package instead of OneMap, to solve this, simple specify that you want OneMap function:

## 
## Printing map:
## 
## Markers           Position           Parent 1       Parent 2
## 
## 47 M47                0.00           a |  | b       a |  | b 
## 38 M38               14.69           a |  | b       a |  | b 
## 59 M59               24.45           a |  | o       o |  | o 
## 16 M16               38.96           a |  | b       a |  | b 
## 62 M62               49.22           a |  | b       a |  | b 
## 21 M21               56.19           a |  | b       a |  | b 
## 20 M20               65.10           o |  | o       a |  | o 
## 48 M48               73.14           a |  | b       a |  | b 
## 22 M22               81.80           a |  | b       a |  | b 
## 
## 9 markers            log-likelihood: -1004.624

NOTE: (new!) If your map population is F2 intercross and your sequence has many markers (more than 60), we suggest to speed up map using BatchMap parallelization approach. See section Speed up analysis with parallelization for more information.

This is a subset of the first linkage group. When used this way, the map function searches for the best combination of phases between markers and prints the results.

To see the correspondence between marker names and numbers, use function marker_type:

##   Marker 47 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 38 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 59 ( ) --> Not  AA : AA (3:1)  
##   Marker 16 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 62 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 21 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 20 ( ) --> Not  BB : BB (3:1)  
##   Marker 48 ( ) --> AA : AB : BB (1:2:1)  
##   Marker 22 ( ) --> AA : AB : BB (1:2:1)

If one needs to add or drop markers from a predefined sequence, functions add_marker and drop_marker can be used. For example, to add markers M18, M56 and M50 at the end of LG3seq_f2_map:

## 
## Number of markers: 12
## Markers in the sequence:
## M47 M38 M59 M16 M62 M21 M20 M48 M22 M18 M56 M50
## 
## Parameters not estimated.

Removing markers M59 and 21 from LG3seq_f2_map:

## 
## Number of markers: 10
## Markers in the sequence:
## M47 M38 M16 M62 M20 M48 M22 M18 M56 M50
## 
## Parameters not estimated.

Drawing the genetic map

Once all linkage groups were obtained using both strategies, we can draw a map for each strategy using the function draw_map. Since version 2.1.1007, OneMap has a new version of draw_map, called draw_map2. The new function draws elegant linkage groups, and presents new arguments to personalize your draw.

If you prefer the old function, we also keep it. Follow examples how to use both of them.

Draw_map

We can draw a genetic map for all linkage groups using the function draw_map. First we have to create a list of ordered linkage groups:

Then use function draw_map for this list:

We also can draw a map for a specific linkage group:

Function draw_map draws a very simple graphic representation of the genetic map. More recently, we developed a new version called draw_map2 that draws a more sophisticated figure. Furthermore, once the distances and the linkage phases are estimated, other map figures can be drawn by the user with any appropriate software.

Draw_map2

The same figures did with draw_map can be done with draw_map2 function. But it has different capacities and arguments. Here are some examples, but you can find more options in the help page ?write_map2.

Let’s draw all three groups built:

You can include all sequence objects referring to the groups as the first arguments. The main argument define the main title of the draw and group.names define the names of each group. If no output file and file extension is defined, the draw will be generated at you working directory as map.eps. The eps extension is only the default option but there are others that can be used. You can have access to a list of them at the help page.

We also can draw a map for a specific linkage group:

You can also change the default colors using the col.group and col.mark arguments.

With argument tag you can highlight some markers at the figure according to you specific purpose.

Speed up analysis with parallelization (new!)

Warning: By now, only available for Unix-like operating systems

As already mentioned, OneMap uses HMM multipoint approach to estimate genetic distances, an very robust method, but it can take time to run if you have many markers. In 2017, Schiffthaler et. al release an OneMap fork with modifications in CRAN and in github with the possibility of parallelize the HMM chain dividing markers in batches and use different cores for each phase. Their approach speed up our HMM and keeps the genetic distances estimation quality. It allows to divide the job in maximum four cores according with the four possible phases for outcrossing and f2 mapping populations. We add this parallelized approach to the functions: map, mds_onemap, seriation, rcd, record and ug. For better efficiency it is important that batches are composed by 50 markers or more, therefore, this approach is only recommended for linkage groups with many markers.

Here we will show an example how to use the BatchMap approach in some functions that requires HMM. For this, we will simulate a group with 300 markers (we don’t want this vignette takes too much time to run, but usually maps with markers from high-throughput technologies result larger groups). Before start, you can see the time spent for each approach in this example:

Without parallelization (h) With parallelization (h)
rcd 0.6801889 0.1260436
record_map 1.8935892 0.3330297
ug_map 1.1002725 0.2256356
mds_onemap 2.1478900 0.4443492
map 2.1042114 0.6156722

Now, let’s simulate the mapping population and markers.

To prepare the data with defined bach size we use function pick_batch_sizes. It selects a batch size that splits the data into even groups. Argument size defines the batch size next to which an optimum size will be searched. overlap defines the number of markers that overlap between present batch and next. This is used because pre-defined phases at these overlap markers in present batch are used to start the HMM in the next batch. The around argument defines how much the function can vary around the defined number in size to search for the optimum batch size.

Some aspects should be considered to define these arguments, because if the batch size were set too high, there would be less gain in execution time. If the overlap size would be too small, phases would be incorrectly estimated and large gaps would appear in the map, inflating its size. In practise, these values will depend on many factors such as population size, marker quality and species. BatchMap authors recommended to try several configurations on a subset of data and select the best performing one.

Speed up map estimation for an arbitrary order (map function)

Because we simulate this dataset we know the correct order. We can use map_overlapping_batches to estimate genetic distance in this case. This is equivalent to map, but with parallelized process.

Similarly with map, using argument rm_unlinked = TRUE the function will return a vector with marker numbers without the problematic marker. To repeat the analysis removing automatically all problematic markers use map_avoid_unlinked:

As you can see in above maps, heuristic ordering algorithm do not return an optimal order result, mainly if you don’t have many individuals in your population. Because of the erroneous order, generated map size are not close to the simulated size (100 cM) and their heapmaps don’t present the expected color pattern. Two of them get close to the color pattern, they are the ug and the MDS method. They present good global ordering but not local. If you have a reference genome, you can use its position information to rearrange local ordering.

Export estimated progeny haplotypes (new!)

Function progeny_haplotypes generates a data.frame with progeny phased haplotypes estimated by OneMap HMM. For progeny, the HMM results in probabilities for each possible genotypes, than the generated data.frame contains all possible genotypes. If most_likely = TRUE the most likely genotype receives 1 and the rest 0 (if there are two most likely both receive 0.5), if most_likely = FALSE genotypes probabilities will be according with the HMM results. You can choose which individual to be evaluated in ind. The data.frame is composed by the information: individual (ind) and group (grp) ID, position in centimorgan (pos), progeny homologs (homologs) and from each parent the allele came (parents).

##    ind       grp      pos prob homologs parents
## 1    1 LG2_final  0.00000    1       H1      P1
## 2    1 LG2_final 20.56485    1       H1      P1
## 3    1 LG2_final 23.27857    1       H1      P1
## 4    1 LG2_final 31.78339    1       H1      P1
## 5    1 LG2_final 34.44508    1       H1      P1
## 6    1 LG2_final 36.34228    1       H1      P1
## 7    1 LG2_final 39.16418    1       H1      P1
## 8    1 LG2_final 46.11760    1       H1      P1
## 9    1 LG2_final 49.76391    1       H1      P1
## 10   1 LG2_final 57.55453    1       H1      P1
## 11   1 LG2_final 57.65453    1       H1      P1
## 12   1 LG2_final 65.80558    1       H1      P1
##  [ reached 'max' / getOption("max.print") -- omitted 276 rows ]

You can also have a view of progeny estimated haplotypes using plot. It shows which markers came from each parents homologues. position argument defines if haplotypes will be plotted by homologs (stack) or alleles (split). split option is a good way to better view the likelihoods of each alleles.

Final remarks

At this point it should be clear that any potential OneMap user must have some knowledge about genetic mapping and also the R language, because the analysis is not done with only one mouse click. In the future, perhaps a graphical interface will be made available to make this software a lot easier to use.

We do hope that OneMap is useful to researchers interested in genetic mapping in outcrossing or inbred-based populations. Any suggestions and critics are welcome.

Session Info

## R version 3.6.1 (2019-07-05)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 16.04.6 LTS
## 
## Matrix products: default
## BLAS:   /usr/lib/libblas/libblas.so.3.6.0
## LAPACK: /usr/lib/lapack/liblapack.so.3.6.0
## 
## locale:
##  [1] LC_CTYPE=pt_BR.UTF-8       LC_NUMERIC=C              
##  [3] LC_TIME=pt_BR.UTF-8        LC_COLLATE=en_US.UTF-8    
##  [5] LC_MONETARY=pt_BR.UTF-8    LC_MESSAGES=en_US.UTF-8   
##  [7] LC_PAPER=pt_BR.UTF-8       LC_NAME=C                 
##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
## [11] LC_MEASUREMENT=pt_BR.UTF-8 LC_IDENTIFICATION=C       
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] vcfR_1.8.0       onemap_2.3.1     rmdformats_0.3.5 knitr_1.29      
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1          ellipsis_0.3.1           
##  [3] class_7.3-15              rio_0.5.16               
##  [5] htmlTable_2.0.0           RcppArmadillo_0.9.900.1.0
##  [7] base64enc_0.1-3           rstudioapi_0.11          
##  [9] mice_3.9.0                farver_2.0.3             
## [11] codetools_0.2-16          splines_3.6.1            
## [13] doParallel_1.0.15         memuse_4.0-0             
## [15] Formula_1.2-3             polynom_1.4-0            
## [17] jsonlite_1.6.1            MDSMap_1.1               
## [19] broom_0.5.6               cluster_2.1.0            
## [21] png_0.1-7                 shiny_1.5.0              
## [23] compiler_3.6.1            backports_1.1.8          
## [25] assertthat_0.2.1          Matrix_1.2-17            
## [27] fastmap_1.0.1             later_1.1.0.1            
## [29] acepack_1.4.1             htmltools_0.5.0          
## [31] tools_3.6.1               gtable_0.3.0             
## [33] glue_1.4.1                reshape2_1.4.4           
## [35] dplyr_1.0.0               ggthemes_4.2.0           
## [37] Rcpp_1.0.4.6              carData_3.0-4            
## [39] cellranger_1.1.0          vctrs_0.3.1              
## [41] ape_5.3                   gdata_2.18.0             
## [43] nlme_3.1-141              iterators_1.0.12         
## [45] crosstalk_1.1.0.1         pinfsc50_1.1.0           
## [47] xfun_0.15                 stringr_1.4.0            
## [49] openxlsx_4.1.5            mime_0.9                 
## [51] miniUI_0.1.1.1            lifecycle_0.2.0          
## [53] weights_1.0.1             gtools_3.8.2             
## [55] princurve_2.1.4           candisc_0.8-3            
## [57] MASS_7.3-51.4             scales_1.1.1             
## [59] heplots_1.3-5             hms_0.5.3                
## [61] promises_1.1.1            parallel_3.6.1           
## [63] smacof_2.1-0              RColorBrewer_1.1-2       
## [65] yaml_2.2.1                curl_4.3                 
## [67] gridExtra_2.3             ggplot2_3.3.2            
## [69] updog_1.2.0               rpart_4.1-15             
## [71] reshape_0.8.8             latticeExtra_0.6-29      
## [73] stringi_1.4.6             highr_0.8                
## [75] foreach_1.5.0            
##  [ reached getOption("max.print") -- omitted 49 entries ]

References

Adler, J. R in a Nutshell A Desktop Quick Reference, 2009.

Broman, K. W., Wu, H., Churchill, G., Sen, S., Yandell, B. qtl: Tools for analyzing QTL experiments R package version 1.09-43, 2008.

Buetow, K. H., Chakravarti, A. Multipoint gene mapping using seriation. I. General methods. American Journal of Human Genetics 41, 180-188, 1987.

Doerge, R.W. Constructing genetic maps by rapid chain delineation. Journal of Agricultural Genomics 2, 1996.

Garcia, A.A.F., Kido, E.A., Meza, A.N., Souza, H.M.B., Pinto, L.R., Pastina, M.M., Leite, C.S., Silva, J.A.G., Ulian, E.C., Figueira, A. and Souza, A.P. Development of an integrated genetic map of a sugarcane Saccharum spp. commercial cross, based on a maximum-likelihood approach for estimation of linkage and linkage phases. Theoretical and Applied Genetics 112, 298-314, 2006.

Haldane, J. B. S. The combination of linkage values and the calculation of distance between the loci of linked factors. Journal of Genetics 8, 299-309, 1919.

Jiang, C. and Zeng, Z.-B. Mapping quantitative trait loci with dominant and missing markers in various crosses from two inbred lines. Genetica 101, 47-58, 1997.

Kosambi, D. D. The estimation of map distance from recombination values. Annuaire of Eugenetics 12, 172-175, 1944.

Lander, E. S. and Green, P. Construction of multilocus genetic linkage maps in humans. Proc. Natl. Acad. Sci. USA 84, 2363-2367, 1987.

Lander, E.S., Green, P., Abrahanson, J., Barlow, A., Daly, M.J., Lincoln, S.E. and Newburg, L. MAPMAKER, An interactive computing package for constructing primary genetic linkage maps of experimental and natural populations. Genomics 1, 174-181, 1987.

Lincoln, S. E., Daly, M. J. and Lander, E. S. Constructing genetic linkage maps with MAPMAKER/EXP Version 3.0: a tutorial and reference manual. A Whitehead Institute for Biomedical Research Technical Report 1993.

Matloff, N. The Art of R Programming. 2011. 1st ed. San Francisco, CA: No Starch Press, Inc., 404 pages.

Margarido, G. R. A., Souza, A.P. and Garcia, A. A. F. OneMap: software for genetic mapping in outcrossing species. Hereditas 144, 78-79, 2007.

Mollinari, M., Margarido, G. R. A., Vencovsky, R. and Garcia, A. A. F. Evaluation of algorithms used to order markers on genetics maps. Heredity 103, 494-502, 2009.

Oliveira, K.M., Pinto, L.R., Marconi, T.G., Margarido, G.R.A., Pastina, M.M., Teixeira, L.H.M., Figueira, A.M., Ulian, E.C., Garcia, A.A.F., Souza, A.P. Functional genetic linkage map on EST-markers for a sugarcane (Saccharum spp.) commercial cross. Molecular Breeding 20, 189-208, 2007.

Oliveira, E. J., Vieira, M. L. C., Garcia, A. A. F., Munhoz, C. F.,Margarido, G. R.A., Consoli, L., Matta, F. P., Moraes, M. C., Zucchi, M. I., and Fungaro,M. H. P. An Integrated Molecular Map of Yellow Passion Fruit Based on Simultaneous Maximum-likelihood Estimation of Linkage and Linkage Phases J. Amer. Soc. Hort. Sci. 133, 35-41, 2008.

Tan, Y., Fu, Y. A novel method for estimating linkage maps. Genetics 173, 2383-2390, 2006.

Van Os H, Stam P, Visser R.G.F., Van Eck H.J. RECORD: a novel method for ordering loci on a genetic linkage map. Theor Appl Genet 112, 30-40, 2005.

Voorrips, R.E. MapChart: software for the graphical presentation of linkage maps and QTLs. Journal of Heredity 93, 77-78, 2002.

Wang S., Basten, C. J. and Zeng Z.-B. Windows QTL Cartographer 2.5. Department of Statistics, North Carolina State University, Raleigh, NC, 2010. https://brcwebportal.cos.ncsu.edu/qtlcart/

Wu, R., Ma, C.X., Painter, I. and Zeng, Z.-B. Simultaneous maximum likelihood estimation of linkage and linkage phases in outcrossing species. Theoretical Population Biology 61, 349-363, 2002a.

Wu, R., Ma, C.-X., Wu, S. S. and Zeng, Z.-B. Linkage mapping of sex-specific differences. Genetical Research 79, 85-96, 2002b.

Statistical Genetics Lab
Department of Genetics
Luiz de Queiroz College of Agriculture
University of São Paulo

2020-06-29