Do You Know Your European Origins by Country?
Review of European DNA Testing
By Donald N. Yates
Most people who buy a DNA test want to know what countries in Europe their ancestors came from. But the favored approaches of major companies like 23andMe have so far not yielded entirely satisfactory results, at least to judge from consumer feedback. This review article explores the reasons for this failing and proposes that DNA Consultants’ EURO DNA database based on forensic population data may be a more accurate measure of nationalities in our background than complicated and expensive microarray genotyping.
Since the beginnings until 1960, over 50 million immigrants settled in what is now the U.S., most of them from Europe. Before 1881, about 86% of the total arrived from northwest Europe, principally England, Wales, Scotland, Ireland, Germany, the Low Countries and Scandinavia. Under the New Immigration that followed between 1894 and 1914 immigrants from southern, central and eastern Europe accounted for 69% of the total. Many of those were Russian, Polish, Lithuanian, Ukrainian, Hungarian, Romanian and Galician Jews.
Despite their strong European roots, most Americans know little about what nationalities contributed to their family tree. Many families single out one country of origin and ignore others. In the 2013 American Community Survey, German Americans (14.6%), Irish Americans (10.5%), English Americans(7.7%) and Italian Americans (5.4%) were the four largest self-reported European ancestry groups in the United States, forming 38.2% of the total population.
And then there are those who report just being “American.” Often of English, Scottish, Scotch-Irish and/or Welsh ancestry that they cannot trace, given its predominance in the upper South (such as Kentucky and Tennessee), they amounted to nearly 10% in the 2010 Census, with this trend growing rapidly. Also, according to a Wikipedia article, two-thirds of white Americans have two or more different European nationalities, often four or more, and many “American” respondents may be cases where the person does not think any one ancestry is dominant enough to identify with.
Present-day European countries and major cities (Wikivoyage). Russia east to the Urals and five-percent of Turkey’s landmass fall in Europe. The broadly linguistic regions were similar as early as the sixteenth century and have been reaffirmed by DNA studies: British Isles (lilac), Scandinavia (blue-green), Russia (blue), Baltic (light green), Central Europe (green), Balkans (light blue), Greece and Turkey (purple), Caucasus (violet), Italy (orange), Low Countries (yellow), France (brown) and Iberia (rose).
An important article published last year by geneticists at Harvard and 23andMe drew back the veil on Americans’ European ancestry. It was titled “The Genetic Ancestry of African Americans, Latinos, and European Americans across the United States” and appeared in the prestigious American Journal of Human Genetics. The authors found a higher degree of genetic mixing among all groups than previously suspected. “This study sheds light on the fine-scale differences in ancestry within and across the United States and informs our understanding of the relationship between racial and ethnic identities and genetic ancestry,” according to the authors Katarzyna Bryc et al.
According to the 23andMe study, African Americans had about one-quarter European genes (Y chromosome studies had put the figure as high as 30%), and some had significant amounts of American Indian ancestry (Oklahoma blacks led the country). Latinos carry an average of 18% Native American ancestry, 65% European ancestry (mostly from the Iberian Peninsula) and 6% African ancestry (compared to 3.5% for European Americans).
Such fine-scale genetic analysis was made possible by affordable microchip technology involving more than 800,000 SNPs tracked longitudinally through cohort groups. But the analysis did not distinguish between different European ancestries, certainly not on a country-specific scale, and 23andMe’s European results—just as much as Ancestry.com’s or those of other companies using the “genetic strand” approach—have not exactly received a conqueror’s welcome in the ancestry market.
Chronology of European DNA Tests
Foundational to emerging European DNA studies was a 2008 article by Oscar Lao of the Department of Forensic Medicine in Rotterdam and co-authors: “Correlation between Genetic and Geographic Structure in Europe.” Current Biology 18/16: 1241-48. This study found that valid and meaningful genetic populations in Europe were defined by linguistic boundaries, which were largely in turn coincidental with modern national borders. This thesis makes sense: people throughout history have usually married someone nearby who spoke the same language. The work of the late Martin Lucas of DNA Tribes underscored this bedrock population structure, at least on a regional basis, if not a country-specific one. A burst of studies over the past five years have begun to paint in the genetic histories of various countries, such as England, Ireland and Belgium. Most of these ask for participants with four grandparents of the same local ancestry.
Previous European analyses had been content to match your Y chromosome or mitochondrial type to countries of origin reported by customers. The advantages of autosomal DNA are apparent if one considers that sex-linked tests target only two of your lines (your father’s male line and mother’s female line), whereas if you go back even five generations you have 16 male ancestors and 16 female ancestors (your 3rd great-grandparents). According to uniparental schemes of ancestry I should be 100% English. The diversity and surprising variety come in only if you dig beneath the surface and sift back through the generations.
It is suspected that the results even of “autosomal” (non-sex-linked) testing have not been entirely rid of skewed results and sample biases. The fact that samples often come from medical studies and the purpose of genetic research is largely aimed at medical studies, not ancestry, introduces an unavoidable bias, not to mention the suspicious preponderance of countries like England, German and the U.S. to the detriment of the nations of Eastern and Southern Europe. What about a truly autosomal method that completely ignores the gender of the tested person? What about a database of European countries that is equal, comprehensive and unequivocal? What about a method that compares you only to Europeans, not European Americans? In short, what about a good European DNA test plain and simple that gives genealogy enthusiasts what they want?
Just such a product is available for under a hundred dollars with the EURO DNA Ancestry Test from DNA Consultants. It forms part of the company’s atDNA autosomal ancestry database, now in version 7.0, released in late June (N = 9,983). Since 2009, we have worked with Professor Wendell Paulson at Arizona State University, Mathematics Department, to develop a 10-loci STR frequency database for European countries/populations, forming part of our DNA Fingerprint Test. The 10-loci are: D81179, D21S11, D3S1358, THO1, D16S539, D21338, D19S433, VWA, D18S51 and FGA. On this basis, we have incorporated data for the following 39 populations from publications or online sources:
|Albania/Kosovo (n = 136)||Austria (n = 222)||Belarus (n = 176)||Belgian – Flemish (n = 231)|
|Belgium (n = 206)||Bosnia and Herzegovina (n = 171)||Croatia (n = 200)||Czech Republic (n = 200)|
|Denmark (n = 200)||England/Wales (n = 437)||Estonia (n = 150)||Finland (n = 230)|
|France (n = 208)||France – North (Lille) (n = 200)||France – South (Toulouse) (n = 335)||Germany (n = 662)|
|Greece (n = 208)||Hungary (n = 224)||Ireland (n = 304)||Italy (n = 209) (Replaced Italy n = 103)|
|Lithuania (n = 300)||Macedonia (n = 100)||Montenegro (n = 200)||Netherlands (n = 231)|
|Northern Ireland (n = 207)||Norway (n = 202)||Poland (n = 206)||Portugal (n = 150)|
|Romania (n = 243)||Russia (n = 184)||Scotland – Highlands (Dundee) (n = 228)||Scotland – Lowlands (Glasgow) (n = 494)|
|Serbia (n = 100)||Slovakia (n = 247)||Slovenia (n = 207)||Spain (n = 449)|
|Sweden (n = 424)||Switzerland (n = 402)||Turkey (n = 500)|
This covers all European countries of significance in genealogy with the exception of the Ukraine and Latvia. The former appears in the World Matches part of reports, and while we are unaware of strictly Latvian data commensurate with the European standard, the neighboring countries of Estonia and Lithuania are represented in our current list. Minor countries like Iceland and Malta are not included, though data were available for them. The 39-country basis replaces the earlier 22-country basis limited to ENFSI (mostly European Union members) and goes beyond the partially updated Strbase 2.0.
How good is the EURO DNA Test? One customer, Jonah Womack, wrote to us in 2012:
I just wanted to compliment everyone at DNA consultants. My father had always said our ancestors were from Czechloslovakia, and I was curious enough to put it to the test. Within one week of mailing my sample, I had the answers I was looking for. I was so happy to share the news with my father; the top 3 matches were all from eastern Slovakia. That objective evidence led to him sharing family stories I would have likely never known. All I can say is thank you, and this was money well spent.
With the new version of atDNA 7.0, I naturally raced to input my own DNA profile and check my EURO results. An early analysis with ENFSI (available online since 2004) gave me the following Top Ten results:
The mystery of Finland and Estonia may be explained by the large Native American admixture in my genes: recent research has suggested that Finno-Ugric peoples and Native Americans share a wide degree of deep ancestry in the so-called “ghost populations” of Stone Age northeast Europe or Ancient North Eurasians (ANE).
But I was unaware of any Swiss, Swedish or Danish ancestors and felt dissatisfied with the list.
After improvements and additions, my new EURO results look like this:
|I||Scotland – Highlands (n = 228)|
|II||England/Wales (n = 437)|
|III||Netherlands (n = 231)|
|IV||Finland (n = 230)|
|V||Estonia (n = 150)|
|VI||Belgium – Flemish (n = 231)|
|VII||Scotland – Lowlands (n = 494)|
|IX||Northern Ireland (n = 207)|
|I||Portugal (n = 150)|
The listing continues with Italy, Czech Republic and Germany. The median falls between #30 France and # 31 Denmark. This “most on a par with each other with a few extreme outliers” picture seems to suggest that my European origins are a lot more diverse than the Top Ten would indicate. The countries below average frequency were Denmark (n = 200), Croatia (n = 200), Russia (n = 184), Belgium (n = 206), Belarus (n = 176), Austria (n = 222), Bosnia and Herzegovina (n = 171), Macedonia (n = 100), Lithuania (n = 300). On the face of it, I was less likely to have ancestry in any of these countries, and sure enough, I was not aware of any from my genealogical research. Statistically, I am ten times more likely to have Scottish, English or Dutch ancestry than Macedonian, Bosnian/Herzegovinian or Lithuanian.
DNA Analysis Checked by Surname
I next wanted to see how the top countries tallied with a surname count. Both parents had English surnames (Cooper and Yates), and this seemed to be reflected in the prominent position of England/Wales, while a Scottish grandmother (McDonald) and Dutch grandmother (Goble) seemed to justify Highlands Scotland and the Netherlands. We have already explained Finland. But what about the other countries?
Looking at the surname origins of my thirty-two 3rd-great-grandparents, I obtained the following statistics:
- 34% Scottish (Mitchell, McDonald, Johnson, Kitchens, Mason, Forester, Pickard, Proctor, Lackey)
- 25% English/Welsh (Barnes, Yates, Thomas, Goodson, Kimbrell, Cooper, Blevins, Wooten)
- 13% Dutch (Hooten, Goble, Shankles)
- 9% Irish (Ellard, Denney)
- 6% German (Graben, Redwine)
- 6% Portuguese/Jewish (Storer, Bondurant)
- 3% Hungarian (Sizemore)
An effective 3% percent, my 3rd-great grandmother Yates, who was a Creek Indian, had no surname. So that accounts for all strains and fits well with the new EURO results. The top three ancestries both in terms of autosomal DNA frequency and my Ahnentafel were Highlands Scottish, English/Welsh and Dutch. These were the most familiar ethnic origins mentioned in family stories and traditions.
Autosomal Population Analysis versus Genetic Strands
Let us compare these EURO results to 23andMe’s tabulation, expressed as percentages instead of a country breakdown ranked by likelihood. First of all, 23andMe has me as 99.2% European, with only 0.4% East Asian and Native American, in contradiction to the 8-25% Native American found in other tests from companies employing a percentage score. Of the 99.2% European, 46.7% is British and Irish—in agreement with my highest-ranked countries according to atDNA (nos. 1 and 7 Scotland, 2 England/Wales, and 9 and 16 Northern Ireland and Ireland). 40.1% is “broadly Northern European. Minor amounts are “broadly Southern European” (0.3%) and “broadly European” (2.8%), while <0.1% is “unassigned.” Of the Northern European, there is 5.3% French and German and 4.0% Scandinavian.
There is an air of scientific certitude about 23andMe’s EURO analysis. The listing of ancestry composition appears comprehensive and exhaustive. It adds up. But it is important to point out that the categories are regional, not country-specific. The only countries mentioned are France and Germany, which are not distinguished but lumped together—a choice that would create consternation in most Frenchmen and Germans. There are obvious flaws and limitations in their data and its interpretation.
One limitation is the special inclusion of “Ashkenazi” (of which I am said to have 0.0%) without a mention of “Sephardic,” historically the more numerous branch of Judaism. The DNA Fingerprint has discrete data for four Jewish populations in the World Populations (Israeli Sephardim, Hungarian Ashkenazi Jews, Chuetas, Majorca), as well as four ethnic markers, one of which is strong in Ashkenazi Jews and the other in Sephardic Jews.
The 23andMe approach could be called the omnium-and-gatherum method, with numerous blind spots. It is not, strictly speaking, evenly valid or consistent. It leaves a good deal lacking in reliability, too. Throughout history, Jews have converted or hidden their ancestry. We cannot expect them to come pouring out in the 21st century to self-identify for DNA surveys even if they retain knowledge of their Jewish past. Yes, perhaps some Ashkenazi Jews will sign up for the program and so identify, but one wonders about a medical motive and bias.
Unsurprisingly, Ancestry.com produced similar results for me—99% European, 0% Native American, with 61% coming from “Great Britain,” 15% Ireland and 0% “European Jewish” (equivalent to 23andMe’s Ashkenazi apparently). Presumably, Ireland comprehends only the country by that name, Northern Ireland being a part of Great Britain, although I have no knowledge of that much Irish in my family tree and Ireland ranks only 16th in my DNA Consultants results. Both Ancestry and 23andMe use high-throughput next-generation sequencing (NGS) from Illumina, involving as many as 800,000 SNPs.
The Illumina HumanOmniExpress BeadChip platform is also used in Family Tree DNA’s Family Finder autosomal DNA testing service (which I have not taken). A good description of the microarray process for genotyping technology can be found on a page at 23andMe, with a link to further information on the Illumina website.
In sum, next-generation genotyping technology seems to be accurate enough in assessing the broad picture of your European ancestry, but it is incapable of giving you a country breakdown. Only DNA Consultants’ EURO test, part of its DNA Fingerprint Plus($279) and available separately for as little as $99, can list and rank the countries of Europe where your ancestors likely originated. It does this not on the basis of genome-wide assessment of hundreds of thousands of SNPs but by comparing your DNA profile to the scores of 10,000 Europeans identified according to 37 actual country names, from Albania to Turkey.
My EURO results matched amazingly well with what I knew from extensive genealogy research about my European forebears, beginning with all the English and Scottish lines right down to minor lines from Portugal and Hungary. With its “false Finnish” match it also indirectly confirmed the Native American ancestry that was evident in abundance in my world matches. Now if I could only find the elusive Romanians (no. 8) in my tree . . . .