Sunday 8 February 2015

Genetic genealogy needs horizontal pedigree charts

Making the most of your autosomal DNA ancestry test requires understanding some simple odds and finding a good way of visualizing how genetic match connections work.

The trick is to build a picture that fits in your brain and doesn't leave you feeling overwhelmed by a morass of potentially connecting pathways. I've got one and I'll share it with you below in the hopes that it works for you too.

The most basic, probably universal, chart for "family" looks something like this:

When visualizing "ancestry", a common approach builds on the standard family chart by adding to it vertically. This is the vertical pedigree chart, which looks something like this:

You may recognize that as the structure used by Ancestry, FamilyTreeDNA and others for tree display. The tendency for genealogy and genetic testing companies to use the vertical pedigree visualization is a damn shame.

I think it is the major limit on efficiently identifying the Most Recent Common Ancestor (MRCA) between genetic matches. You'll see why in a moment.

The alternative ancestry charting method is the horizontal pedigree chart:


Notice how:
  1. this is a much more space-efficient chart that is easy to display on a computer screen, (it's basically a table) and
  2. each column is a nice, easy to read list of all the ancestors belonging to each ancestry level in your tree.
GEDMATCH, to its credit, uses a horizontal pedigree chart, although it's not space efficient (it does not list many generations). Why am I going on about space efficiency and the benefits of listing names per generation?

Odds, that's why.

When you receive your autosomal test results, you typically get a list of 700-1000 other testers who share at least one DNA segment with you. Looking at your list of matches and the estimated relationships between you two (provided by the testing company), you'll notice that you have a handful of relatively close matches but the vast bulk of your matches, say 995 of your 1000, will be more distant than that.

Pretend, for a moment, that all the connecting relationships for the 1000 matches were already known, the average relationship across the group would probably be something like 5th or 6th cousins. So, what do you need to know in order to identify the Most Recent Common Ancestor (MRCA) between you and the vast majority of your matches -- all these people, who are, on average, your 5th cousins?

Odds are, you need to know the fourth-great-grandparents of each tester.

If you have two full fifth cousins and you take a list of the 64 fourth-great-grandparents for each, two names on both those lists will be the same.
(simulated tree)

So, in order to effectively use your test results 99.5% of the time, you need to have lists of fourth, fifth, and sixth great-grandparents to compare. Unfortunately, none of the testing companies provide an easy way of doing this*.

None provide single view horizontal pedigrees to the fourth-great-grandparent level (or beyond). Instead, the tree structures they provide for testers to add information to are difficult to access and use.

I estimate that 90% of the completed, already researched, genealogies in the testing pool are not available by clicking on a match's name. This is a massively wasted opportunity.

As this charting method shows, in terms of odds, most matches will resolve through a shared person or couple in the list of your 64, 128, or 256 "lines" (i.e. the 4th, 5th, or 6th great grandparent level of your tree -- the farther you complete your tree, the more known lines you have and the more information you have available to figure out how you relate to someone). Most people have no trouble understanding they have a maternal and paternal side, but the exponential expansion of lines to the level of their fourth-great-grandparents is not yet part of how they see the process. Unless everyone is provided with a horizontal pedigree chart to complete to the relevant levels, efforts to identify MRCAs quickly stall.

It gets trickier to identify connecting relationship if fewer names are known (on either tree) but the same principal applies: use the testing company to estimate the level of your tree and your match's tree that should contain an overlapping couple or person (half relationships can be considered by going out one farther level than the estimate predicts). If you can't find a match, look at any missing areas on either side and consider whether the DNA and the combined information from both of you provides a clue about who the missing people could be.

This is how genetic genealogy can break through brick walls.

A seven to nine generation horizontal pedigree model provides a way of easily working with a complex situation. For full fifth cousin matches there are 32 potential pathways on your side and 32 potential pathways on your match's side (because the two sides of the final complete path between you and the match will connect at a couple).While this means that there are over one thousand potential pathways to investigate (odds that can seem overwhelming) checking two reasonably complete lists of 32 pairs of fourth-great-grandparents to find a common pair is not that hard.

So, in summary: to succeed at genetic genealogy you need to have a model of your tree and your matches' trees that allows you to easily identify the overlapping ancestors, namely shared fourth-, fifth-, and sixth-great-grandparents. Horizontal pedigree charts which run at least to the fourth-great-grandparent level allow you to do that efficiently and with an awareness of what is missing. Other methods are not as easy or effective.

***

A second reason why genetic genealogy needs horizontal pedigree charts is substantially more obvious then the one outlined above: they can provide a spatial representation of ancestry composition. Testing companies who provide ancestry composition estimates do not provide a charting tool that reveals regional contributions to the tester's DNA, but the horizontal pedigree chart can easily do this as well:











And finally, completing such a chart would give testers something to do during the long wait between sending the kit and waiting for their results to come in.

Updated: Template - this is an excel file I use (it is bigger than the above and set up to print on 11 x 17 at a copy shop). It is also expandable -- you can copy the table into a new worksheet and then each person in the last column becomes the base person of their own table, assigning them the ahnentafel number next to their name.

*Note for clarity: Apparently AncestryDNA does have a pedigree view option (I am not sure how many generations it shows on one screen). As a Canadian, I had used AncestryDNA for haplotype testing many years ago and those accounts, deleted by the company last year, did not have a pedigree tree view (or trees, if I remember correctly). Apparently those (US, Ireland) who can order the autosomal testing do have access to this.

Updated 2015-02-09 with template (see bottom). 2015-02-10 template link updated and switched to viewable sharing as someone is editing the template with their own information. Please let me know in the comments if the viewable template cannot be downloaded, thx.

25 comments:

  1. Excellent post. I've been utilizing charts in spreadsheets for some time trying to break down my brick walls.

    ReplyDelete
  2. AncestryDNA does display a seven-generation horizontal tree, and you can switch to pedigree view in all trees. It's not nearly as compact as your example, but at least it allows vertical scrolling.

    Another thing I particularly like about your example is the intuitive way you can visualize the amount of each ancestor's DNA contribution (on the average, of course).

    ReplyDelete
    Replies
    1. should probably mention I am Canadian, so I can access Ancestry, but not AncestryDNA

      Delete
  3. I'd love to get a chart like that. Where can i get one?

    ReplyDelete
    Replies
    1. This comment has been removed by the author.

      Delete
    2. This comment has been removed by the author.

      Delete
    3. hi, had to replace the file with a non-editable version as someone is editing it with their own info, this is viewable and, I hope, downloadable: https://onedrive.live.com/redir?resid=27E5620F4E143C7D!524&authkey=!ABc6bp6EP3leJM0&ithint=file%2cxlsx

      Delete
  4. Wow! Thank you so much. I laid my Dad out like this today. It is helping me focus into this project much easier. Yes, the computer programs do manage to do this somewhat, but this is very compact and usable. Thank you.

    ReplyDelete
  5. Thanks for this! I think you were reading my mind! I just hadn't taken the time to sit down and set up the template. I used the template tonight to fill in my tree, and colorized the name lines, making it easier to follow back. I love the results! I think this will make my life a whole lot easier!

    ReplyDelete
  6. Okay I missed your spreadsheet. Thank you very much for that!!! Great tool!

    ReplyDelete
  7. I created a similar color-coded chart as yours for ancestry composition, but instead for birth county, as most of the branches of my tree are from distinct geographical locations in the UK (see www.genealogyjunkie.net/pedigree-charts.html). Because of this, I find geographical location to be more productive than surnames when trying to make a genealogical connection with a DNA match, unless the name is more unusual. I can then give the URL to my matches, so they know which branches of my Ancestry tree to focus on when looking for common ancestors. But I wish my matches would provide me with a similar color-coded chart, as going through locations is slow and laborious when looking at a tree you have no familiarity with.

    But looking at my own tree now, I see I need to update it, as I made a few minor breakthroughs. And another idea came to mind is a separate color-coded chart for those ancestors who have been confirmed with DNA – like a visual "DNA score card".

    I had started to create an Excel spreadsheet similar to yours (the chart in the URL above is in Word), but having the cells in the pedigree chart sheet auto-populate from a list of names by Ahnentafel number on a separate sheet (and other columns could be added for locations) – but I got waylaid and it is now not near the top of my To Do list!

    ReplyDelete
    Replies
    1. hi Sue, checked out your link and not surprised that you have also found this structure useful (I played around with several different versions and it really does reduce to a standard simply by what you can fit on a screen) and that you also used color coding. In the actual charts I've done (the pictures here are modifications I made to simplify and get it to show up as a good image on the blog) I coded by birth country for the same reason - to quickly communicate potential branch intersections. The other things I've done include adding color coded text for military activity and including haplotype information, which is what I am covering in the next blog and the whole reason I put this post up this week -- to introduce the 7-9 g horizontal pedigree structure. Best of luck to you!

      Delete
  8. Is there a way to load a GEDcom into this or do we need to enter it manually? Just trying to find a way to save some time. Thank you, this is an excellent tool!

    ReplyDelete
    Replies
    1. Unfortunately I am not a programmer and don't have an app for loading it from a gedcom -- I could visualize kicking data out in CSV format from a database into the column structure but not sure how it would handle all the merging, centering, etc. It could work if the gedcom had assigned ahnentafel numbers for each person relative to the base person and the sheet was programmed with fields. However, what we really, really, need is for all the genealogy and autosomal testing companies to place the chart structure on the user profile page and prompt testers to complete them...

      Delete
    2. I have been playing with treeseek.com it is setup for familysearch but will also let you upload a gedcom maybe someone like them could do something like this, I use the 9 generation fan... also great to see what I still need to fill in.

      Delete
    3. I have also used, and can recommend the TreeSeek fans for those looking for a fan chart. Super easy to upload and generate a file and they print perfectly on 11 x 17 at a copy shop as well.

      Delete
  9. This comment has been removed by the author.

    ReplyDelete
  10. Text format, editable in any editor and also more scalable (no need to expand the whole binary tree, even few paths/lines are sufficient):

    http://siberean.livejournal.com/14874.html

    ReplyDelete
    Replies
    1. I have used text trees like that in the (extremely character limited) text block on the 23andme profile page and they were not well understood by newbie genealogists. The goal is not to invent a new chart (which I don't think this really is, although I will say my dad finds the excel file print-outs very readable in his old age) -- there are plenty of great charts for different purposes -- but to optimize the (input and) tree display interface that will enable genetic matches to figure out MRCAs. I do think that the image above is basically the way the testing companies should go to help customers achieve that goal, which is part of the service they are selling that is not working as well as it could at the moment.

      Delete
  11. This comment has been removed by the author.

    ReplyDelete
  12. I wonder if you've seen the Condensed Pedigree Chart generated by Second Site. Here is mine, for example: http://doriswheeler.org/ui262.htm. I can add flags and accents at will, a feature I love. The only thing missing is the ability to show locations, and I've asked John Cardinal to add that in a future upgrade.

    ReplyDelete
  13. This comment has been removed by the author.

    ReplyDelete