Page Actions

Admixture analyses

From ISOGG Wiki

Admixture analysis (more properly known as biogeographical ancestry analysis) is a method of inferring someone's geographical origins based on an analysis of their genetic ancestry. An admixture analysis is one of the components of an autosomal DNA test. Companies which offer such tests include 23andMe, AncestryDNA, Family Tree DNA, MyHeritage DNA and Living DNA.

Admixture calculations

Admixture calculations provide genetic ancestry analysis to individuals tested for high-density single-nucleotide polymorphism (SNP) data. The different SNP extraction methods (mostly SNP-chips) need substantial overlap of extracted SNPs to allow meaningful comparisons. Admixture analysis usually builds ancestral components also called clusters by comparing a dataset of samples. Both the used datasets (regional, continental, worldwide) and the ancestral components (number, age) are very diverse depending on the used setup and analysis method. A new sample (not used in the dataset) is normally compared to the ancestral components by the calculation of the percentages. Additional tools allow also the prediction of ancestral populations. The analysis is strongly limited by the diversity and accuracy of the dataset, for example calculating an Asian individual with an Admixture tool based on an European dataset will not give meaningful results.

Accuracy and sophistication

Most calculators use a shared subset of the up to 0.7 million SNPs provided by Family Finder, AncestryDNA, 23andMe, etc. These are compared with publicly available datasets and the companies' own proprietary datasets. As can be seen from the Autosomal DNA testing comparison chart the accuracy and sophistication vary greatly and have not yet reached the quality desired for accurate genetic genealogy research. The public dbSNP (Build 137) database contains ca. 45 million human SNPs, and comprehensive whole-genome sequencing (WGS) of all human populations could substantially increase that number and allow much better calculators.[1]

DTC providers admixture analysis

Included for everyone who has been tested by the following companies. For further details see Autosomal DNA testing comparison chart

23andMe - Ancestry Composition

The Ancestry Composition feature offers a map view which displays one's ancestral components from various regions of the world as of 500 years ago, a split view for those who also have one or both parents who have been tested by 23andMe, and a breakdown by chromosome. Three settings are available: conservative, standard, and speculative. Overall accuracy is reasonably good, but predictions in Europe are still not optimal, particularly in the speculative mode. Ancestry Finder provides a breakdown of one's ancestry by country.

Family Tree DNA - Population Finder

Population Finder was the first incarnation of the admixture analysis provided with Family Tree DNA's "Family Finder" test. It was replaced by a new feature known as MyOrigins in May 2014. Population Finder used principal component analysis (PCA) to estimate biogeographical percentages of autosomal DNA. The population samples used in the analysis were continental groups (Africa, America, East Asia, Europe, Middle Eastern, Oceania, and South Asia). The analysis did not include the X-chromosome. For historical details of the test see Understanding results: Population Finder in the Internet Archive. The Population Finder analysis was relatively non-specific, particularly for people with European Ancestry.

For an explanation of the workings of Population Finder and the meaning of the Middle Eastern percentages seen in many Population Finder results see the guest blog post by Doug McDonald biogeograpical analysis.

AncestryDNA - Genetic Ethnicity

For background on the AncestryDNA Ethnicity Estimates see the AncestryDNA Ethnicity Estimates White Paper 2018.

Genographic Project - Who Am I

Since a relatively limited number of autosomal SNPs are available in the Geno 2.0 data for analysis, the biogeographical ancestry analysis is somewhat limited relative to other similar tools, particularly relative to Ancestry Composition. The two closest reference populations are given for each person who is tested. However, these predictions, particularly the second closest reference population, are frequently inaccurate.

Analysis projects / Admixture Calculators

To send in, provided by various sites, online tools or also to calculate on the own PC.

Comparison of SNPs coverage/overlap for admixture calculators of common autosomal/X-Tests

See also Autosomal SNP comparison chart and Autosomal DNA testing comparison chart

at/X Test G25 sim AncestryDNA 23andMe MyHeritage FamilyTreeDNA Living DNA 1240k capture array WGS Extract (30x) YSEQ WGS 30x
Version, in use since Nganasankhan v2, May 2016 v5, Aug. 2017 v2, Nov. 2016? v2 v2, 2016? Mathieson, Reich et al. 2015 23andMe CombinedKit 2023 23andMe all_hg19 2023
Number of autosomal/X SNPs tested/defined Correlation 637,639/28,892 630,132/16,530 576,157/29,694 612,272/16,271 683,503/15,028 ~1,240,000 2,010,232/51,970 1,450,113/41,984
Eurogenes G25 ~300,000 best in 2017[2]
LM Genetics K47 76,267 0.9990 67,703; 89% 21,477; 28% 28,960; 38% 27,015; 35% 76,149; 99.9% 73,550; 96.4%
MDLP K27 118,536 0.9990 107,753; 91% 33,708; 28% 47,824; 40% 44,513; 38% 118,349; 99.8% 114,457; 96.6%
HarappaWorld K16 188,173 0.9988 171,503; 91% 52,910; 28% 73,065; 39% 68,610; 36% 187,890; 99.9% 187,597; 99.7%
Eurogenes K36 165,688 0.9981 155,228; 94% 52,532; 32% 72,407; 44% 68,158; 41% 165,401; 99.8% 165,110; 99.7%
Dodecad Globe13 166,255 0.9982 152,175; 92% 47,411; 29% 66,026; 40% 62,231; 37% 166,014; 99.9% 165,758; 99.7%
Eurogenes K13 182,705 0.9971 172,294; 94% 59,077; 32% 78,222; 43% 73,805; 40% 182,402; 99.8% 182,109; 99.7%

Based on single results, either from Gedmatch or Admixture Studio. Please edit/expand missing values or send them to ChrisR et al

Eurogenes analysis by David Wesolowski

David does free analysis of raw data files from both 23andMe and FTDNA's Family Finder using the programs ADMIXTURE, BEAGLE, PLINK and ADMIXMAP. Results are distributed as Excel spreadsheets and as .png files. See http://eurogenes.blogspot.com/ and http://www.bga101.blogspot.com for background. Also see http://www.23andme.com/you/community/thread/5182. Information on how to interpret the results may be found at archive of http://bga101.blogspot.com/2010/10/brief-guide-to-output-youre-seeing.html. If you are interested in participating in his project contact him at .

Admixture analysis for Scandinavians by Anders Pålsen

Anders does a free analysis of admixture for people of Scandinavian ancestry who have been tested by 23andMe. Participants must have their primary ancestry from Norway, Sweden or Finland. The raw 23andMe data files are analyzed using the program ADMIXTURE and the ancestry is presented in a STRUCTURE like graph. For additional background see http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-10/12863480 59 and http://archiver.rootsweb.ancestry.com/th/read/GENEALOGY-DNA/2010-11/12891438 80. If you are interested in participating in his project contact him at .

L M Genetics by Lukasz Macuga

Lukasz provides a detailed report based on the Eurogenes K36 calculator on GEDmatch. The report includes a correlation map of your ancestral regions, population estimations and ancestry statistics in the form multidimensional plots and dendrograms. For further details see the L M Genetics website

Magnus Ducatus Lituaniae Project by Verenich, Kull

A biogeographical analysis project for the territories of the former Grand Duchy of Lithuania. Admin: Vadim Verenich Co-admin: Leon Kull. See the Magnus Ducatus Lituaniae Project blog for further details.

McDonald's BGA project by Doug McDonald

Doug McDonald does two types of free tests. One is like 23andme's "Advanced Global Similarity", except that he does more "dimensions". For people with ancestry outside Europe four of these are shown. For pure Europeans his world graph is essentially identical to 23andMe's so instead he shows a European graph, which includes (at lower right) the Adygei, a tribe living on the eastern shores of the Black Sea. The higher dimensions do not give additional information for pure Europeans so they are not shown. The results are sent to participants on graphs as .png files. Doug also does quantitative tests. These come in three flavors, first without South Asia (represented by Pakistan) and the Middle East, second with South Asia, and finally with all three, as comparison panels. See the ISOGG Wiki page on McDonald's BGA project for the qualifying criteria.

Dodecad Ancestry Project by Dienekes Pontikos

See http://dodecad.blogspot.com for details. Also see the summary written on November 7, 2010 on his anthropology blog. This analysis is currently closed to participants, but Dienekes says that he "may or may not process data from relatives, or non-target groups that was already sent to me and that was not assigned a DOD number." Contact him directly at to see if he might be willing to accept your data at some future point in time.

Analysis projects: Do it yourself (DIY)

GEDmatch online admixture applications

This free online service was created by John Olson and Curtis Rogers under www.gedmatch.com. The big data sizes to transfer and heavy usage sometimes leads to server problems; donations are welcome to help funding the service. Various admixture (ethnicity or deep ancestry) tools are included:

DIYDodecad

Dienekes Pontikus published the Do-It-Yourself Dodecad tool free of charge for non-commercial use. DIYDodecad can do admixture analysis on Windows or Linux 32bit/64bit machines. The analysis is carried out based on calculator files and appropriately standardized autosomal SNP raw data. There is an interesting admixture calculator which gives percentages for the different population clusters.

Versions

  • v1.0 July 2011: Dodecad v3 calculator included, Dodecad Oracle possible
  • v2.0 August 2011: new features including by-chromosome and by-segment ancestry analysis, etc.
  • v2.1 September 2011: allows incomplete genotype files to be used and not only the Illumina platforms

Standardize raw data

To convert your data from the company-specific format to a common format the R software is required, which can be downloaded and installed from http://www.r-project.org/. Follow the instructions in the DIYDodecad readme.txt

Calculator files

Different calculator files from various projects are published regularly. Numbers in the calculator file usually describe the number of population clusters. You should look at their blogs for new versions:

SPatial Ancestry analysis (SPA)

Method for predicting ancestry or where an individual is from.

SnpMap

Little program to view SNP data, and see how the data compares to other populations and regions of the world.

  • SnpMap version 1.0.4, June 2011

ADMIXTURE and PLINK

Razib Khan has provided tutorials for users who wish to perform DIY analyses on their autosomal DNA results using the software programs ADMIXTURE and PLINK:

Blog posts and articles

Scientific papers

Videos

Ancestry reimagined: dismantling the myth of genetic ethnicities by Kostas Kampourakis:

What can DNA tests really tell us about our ancestry? A short tutorial from Prosanta Chakrabarty:

Ethnicity percentages demystified. A lecture given by Debbie Kennett at Family Tree Live in April 2019:

Further reading

See also

References

  1. Figure 1 Venn diagram, Francioli et al 2014, Whole-genome sequence variation, population structure and demographic history of the Dutch population, Nature Genetics, http://dx.doi.org/10.1038/ng.3021
  2. https://eurogenes.blogspot.com/2017/10/genetic-ancestry-online-store-to-be.html