
11 Oct 2018
Abstract
Consumer genomics databases have reached the scale of millions of individuals. Recently, law enforcement authorities have exploited some of these databases to identify suspects via distant familial relatives. Using genomic data of 1.28 million individuals tested with consumer genomics, we investigated the power of this technique. We project that about 60% of the searches for individuals of European-descent will result in a third cousin or closer match, which can allow their identification using demographic identifiers. Moreover, the technique could implicate nearly any US-individual of European-descent in the near future. We demonstrate that the technique can also identify research participants of a public sequencing project. Based on these results, we propose a potential mitigation strategy and policy implications to human subject research.
[Image]
Fig. 2 Tracing a person of interest from a distant match using demographic identifiers.
(A) The possible relatives of a match (green) in a database. Each square represents a potential degree of relatedness. The range corresponds to the 5%-95% percentile of shared IBD in cM from ref: (16). Red: relatives that could fit a bona-fida 3C match (~100cM). The average number of relatives is denoted in the top-left corner of each square based on a fertility rate of 2.5 children per couple. Nie/Nep: Niece/Nephew; G2: Great-great; G3: Great-great-great; A/U: Aunt/Uncle.
(B) An example of the geographical dispersion of 3rd cousins or 2nd cousins once removed around the matched relative. Every circle denotes 100km.
(C and D) The distribution of the expected age differences between matches and their potential relatives with a genetic distance of third cousins. The main text reports a conservative scenario, in which the age estimator of the target is in the highest bin of each histogram (red arrow). The age distribution is shown in (C) 10yr resolution in and (D) in 1yr resolution.
(E) The entire pipeline of using demographic identifiers along with a long-range familial match to identify a US person (blue: average number of people).