Private Alleles Reveal Minor Ancestry
Learning Curve: Interpreting Your Rare Matches
It is widely accepted that STRs are non-coding in nature and are therefore not implicated in gene expression. By the same token, scientists used to think that single variations in the number of repeats (STRs) in your autosomal profile could not be correlated with population or ancestry, that you needed at least four STR alleles to reach meaningful conclusions.
In 2007, researchers found that a remarkable“private allele” was present in about one-third of Native Americans. The value of 9 at the locus of D9S1120 occurred in 31.7% of Native American samples.
Soon the race was on to discover more private alleles—single values indicative of population affinity, traits or disease association. In 2020, a team of scientists at the Centre for Forensics in Sydney, Australia discovered that a total of 50 unique traits were associated with 24 standard DNA fingerprint markers mentioned in more than 50 separate studies. TH01 had the greatest number of associations with 27 traits linked to 40 different genotypes. For instance, a value of 9 repeats at THO1 was linked, to a very small extent, to be sure, to the trait of extraversion.
Even though all linkages were very slight, they were statistically significant, showing that single alleles characteristic of certain populations could be viewed as meaningful traces of ancestry and familial association.
This November, DNA Consultants will introduce the first private allele test, Primeval DNA Allele Report, priced at $99.
The origins of the new product may be sought in an important discovery made concerning Native American genotypes more than ten years ago. The classic article on private alleles is K.B. Schroeder et al, “A private allele ubiquitous in the Americas,” Biology Letters (2007) 3, 218-223. Schroeder is at UC-Davis where the largest collection of American Indian DNA resides. His team found D9S1120=9 is a strong “private allele,” meaning it is private to Native American (and Western Beringia) populations and not found in most others. Native Americans have it in a frequency of 31.7%. 0% of Africans have it. The genetic strand associated with it has been dubbed the American Modal Haplotype (confusingly, since AMH was previously used for Atlantic Modal Haplotype in Y tests falling in the R1b category). Note D91120 is not part of our standard array of loci.
As reported in the literature, different ethnicities get different frequencies of STR’s because they inherit their alleles from a pool of the same primeval ancestors. Because they had different ancestors, they are not likely to have the same alleles. For example, the most common value for a Caucasian male at the locus designated as THO1 is 6. I have it in my profile, and so do 23.2% other Caucasian males like me. That is its modal value. A slight (< l percent) susceptibility to malaria is also characteristic of this allele.
The original gene for the THO1 STR a hundred thousand years ago probably had 6 repeats. As mutations took place, the number of repeats increased or decreased. Autosomal DNA mutates randomly at about the same rate as mitochondrial DNA. There is a satisfyingly large amount of variations, captured in our world forensic profile database. Over time, since any value is just as likely to add as subtract a repeat, the patterns remain clustered around the oldest values. The new repeats did not spread through the entire population so they are found at smaller frequencies. They are all, to a certain extent, primeval DNA stories revealing the filaments of the great tapestry of human migrations.
Which repeats are most valuable in a DNA profile? Clearly, the repeats that occur at the lowest frequency because this variation is less likely to be exhibited in a random person having the same repeat. The chances are about 1 in a billion if we look at a combination of alleles. The lower frequency values are thus stronger evidence in forensic cases.
Let us look at a case study for THO1 variants. Only 0.8% of men have a value of 10 repeats. Compare that to nearly a quarter who carry the most common value of 6. Studies have found that in this small percentage of less than one percent of men, male impulsive violent behavior can be identified at a risk factor of 1 in 10,000, and the crime of rape is a risk with 5 percent. Such a statistic would be important if comparing two suspects in a court of law, one with 6 and the other with 10 repeats at THO1.
In the scheme of things, then, the smaller frequencies are more interesting, since they are inherited by fewer people and are thus more distinctive. DNA Consultants’ Primeval DNA Allele Report is the first ancestry test to emphasize the minority side of one’s admixture rather than the majority or consensus results.
All the Rare Genes from History are, effectively, private alleles. The King Tut and other Egyptian Rare Genes are private to Egyptian and other African descended genetic pools. The distribution maps shows all the Egyptian genes are centered on Cairo, the Egyptian heartland for thousands of years. They are not as well demarcated as the American Modal Haplotype (a repeat of 9 at D9S1120), which is present in large frequencies in Native American populations and entirely absent in other ethnic groups (it is not found at all in Africans). But the low frequency alleles in your Primeval DNA Allele Report all have the characteristic of ethnic salience—they run in families and appear to be important, though little studied genotypes.
Let us examine in some detail a specific case, a customer who has the biallelic readout of 11, 14 at D5S818. The 14 is rather rare. If we put 14 by itself into our population database, we find that many populations have a result of 0 (as shown on map with blank, non-populated dots). The highest result is 22.9% for India – Mongoloid – Meitei, arrived by a formula of dividing 2 X RMP into 1 (the inverse of RMP multiplied by 2 because there are 2 alleles per locus), and the lowest value was 0.14% percent for White Kentuckians and worldwide average was 1.5%—quite a spread.
The frequency of .229 corresponds to the value listed in the data tabs for the population at column CM named India – Mongoloid – Meitei under D5S8181=14. This is the frequency reported in the relevant article in the forensic literature,
Because of its salience, D5=14 could possibly serve as a sort of private allele. Iif someone had it it could indicate that that customer had a minor amount of South Asian ancestry as the top four matches are India and they are on a substantially larger order of magnitude. It doesn’t have the strong earmarks of the D9=9 American Indian Haplotype, that is, absence in most other populations but it does have a definite pattern of distribution.
We might conclude:
Customer has allele of 14 in D5, which is rare in many populations and prominent in India, suggesting a possible trace of Indian or Romani ancestry in the family.
The table below shows some other private alleles besides the AMH mentioned in S. Kanthaswamy, “The Enhancement of the Native American CODIS STR Database for Use in Forensic Casework” (July 2019), though not all are part of the standard 15-loci testing array. Does your profile have any of them?
List of STRs Considered to Be Private Alleles for Native American Tribal Affiliation. Source: S. Kanthaswamy et al., “The Enhancement of the Native American CODIS STR Database for Use in Forensic Casework,” National Criminal Justice Reference Series, Office of Justice Program (2019).
 Nicole Wyner, Mark Barash and Dennis McNevin, “Forensic Autosomal Short Tandem Repeats and Their Potential Association with Phenotype,” Frontiers in Genetics 11 (2020) 884, doi: 10.3389/fgene.2020.00884 PMCID: PMC7425049 PMID.