Thomas Evans and Laurence Ettwiller of New England Biolabs don’t hesitate to answer that question in their recent paper — it’s right in the title: “DNA damage is a pervasive cause of sequencing errors, directly confounding variant identification”. Indeed, the word “variant” is used 88 times in the paper, including hard-to-decipher phrases like “Variants originating from real in-vivo variants”, because it’s used to mean at least three different things:

  1. a germline variant is a position in an individual’s germline DNA that is different from the reference genome sequence
  2. a somatic variant is a position in a somatic cell’s DNA that is different from that individual’s germline sequence
  3. a sequence read variant is a position in a specific DNA sequencing read that is different from the reference genome sequence, which I’ll call an observed non-reference allele

Much of the interest in this paper I’ve seen on twitter (and my own spit-take when I saw it as a preprint just after submitting a low-coverage sequencing paper) arises from the potential effects on those first two definitions, which are the foundation of human genetics and cancer genetics, respectively.
More »