Cybergenetics is a Pittsburgh-based bioinformation company that uses advanced mathematics to translate DNA data into useful information. Their flagship TrueAllele® technology provides accurate and objective DNA match statistics, allowing forensic teams to resolve complex forensic evidence. In his interview Cybergenetics Chief Scientific Office Dr. Mark W. Perlin discusses that problem with mixed DNA data, and explains how it can be resolved with TrueAllele®.
Please describe the story behind the company: What sparked the idea, and how has it evolved so far?
Cybergenetics started out working on the Human Genome Project, 25 years ago. We had several technologies, and one of them was to automate the interpretation of Short Tandem Repeat (STR) data. There are about 100,000 short tandem repeats scattered throughout the human genome. They have a lot of variation, useful for genetic testing and gene discovery. To resolve their PCR stutter artifacts, we used computers to automatically transform the STR data into genotypes. That’s what led me to leave Carnegie Mellon 25 years ago and form Cybergenetics.
A few years later, in 1998, we were contacted by the British government. At the time, they were the pioneers in forensic DNA, creating the methods, kits, protocols, and databases. But they got ahead of themselves. While they had automated the production of data, this created an interpretation backlog of 350,000 offender samples from cheek swabs. They needed to get these cheek swab genotypes onto their DNA database. They had a building in Birmingham with 100 people, working three shifts a day, trying to manually review the data in duplicate or triplicate. Even then, they had a genotyping error rate of one in 2000.
The question was, could the British Forensic Science Service automate DNA interpretation the same way they had automated the lab? The FSS contacted Cybergenetics. We adapted our TrueAllele® interpretation technology from genetics to forensics for simple cheek swabs. We eliminated their DNA backlog. But then the FSS had this other problem that came from casework on crime scenes – the DNA mixture problem.
Most DNA evidence is a mixture of two or more people. There may be one or two major contributors with 80 or 90% of the DNA, but there may be other minor contributors, the third or fourth person. In one case we later worked on, the mixture had seven people’s DNA on one handgun.
I thought about that mixture problem 20 years ago during a trip to England. Using math that was similar to what I had done with the stutter problem, I figured out how to separate the mixed data. The computer unmixed the genotypes into each of their constituent contributors, and then compared them.
The software evolved, and by the year 2000, we had a method that could separate these unsolvable mixtures into the genotypes of each contributor. And then compare the genotypes to get accurate match statistics.
Around 2008, Cybergenetics had its first case. A Pittsburgh area dentist, who was estranged from his wife, had a sudden intruder break into his house and slash him to death. He died on the floor, splattering blood all over the unsigned divorce papers. Suspicion fell on his estranged wife’s live-in boyfriend, who happened to be a state trooper. The suspect had a little scratch over his eyebrow, and the knife he usually played with had gone missing.
The DNA collected from under the victim’s fingernails was a mixture of two people: his own DNA, and 7% of some unknown person’s DNA. The FBI produced excellent STR data from the biological sample. Using the limited interpretation methods they had, which involved discarding and simplifying data, they reached a match statistic of 13,000. But there are 16 million people in Pennsylvania, so that was not a basis for prosecuting a major case.
The Pennsylvania Attorney General’s Office sent Cybergenetics the FBI’s DNA data. We found a more accurate match statistic of 189 billion. Because our TrueAllele software used the high peaks and the low peaks, we were able to accurately separate the genotypes to make that DNA comparison.
We had to show through scientific studies that TrueAllele is reliable in order for it to be accepted in court. We brought in other DNA mixture methods to show that as the power of data interpretation increases, you get more information. In the end, the judge and jury accepted the science. The trooper was convicted and sentenced to life in prison. And the state appeals court in Pennsylvania later declared that TrueAllele is reliable science.
Here’s a quick video introduction to the capabilities of TrueAllele® technology.
What are the current challenges that forensic teams are struggling with, and how does your technology help?
The main challenge with interpreting DNA is the limitations of the human mind. When people see data, it often doesn’t exactly follow what they imagine STR data is supposed to look like. They follow lab protocols where they throw out the DNA evidence because they can’t interpret the data. As a result, millions of items of evidence have gone uninterpreted or under-interpreted. There are so many cases I can tell you about.
Here’s an interesting case that Cybergenetics worked on about six years ago. A police officer is watching two cars shooting at each other. They start driving towards him without even knowing that he’s there and they start shooting at him. They end up finding these people, they find the cars and there’s a handgun with DNA on it. It’s a four-person mixture, and the crime lab says it’s uninterpretable. So what do you do in court? There’s DNA evidence, but it’s been thrown out by the crime lab, and they called it uninterpretable.
So, the District Attorney’s Office sends the case to Cybergenetics. We run TrueAllele on the data that the crime lab provides; we separate the mixtures and find two main suspects. One person isn’t there; they dropped the charges against him and he went free. The second person has a match statistic of about half a million to the gun, so he pleads guilty and the case is over. There’s no trial. There’s no argument. That’s the science, and everyone just moves on. This is a practical example of justice based on truth, instead of arguments between lawyers.
We’ve seen many salacious DNA cases of multiple homicides, wife murders, husband burners, rape and child molesters, and so on. On our website, we have posted about a hundred cases that went to trial. That shows the technology at work. We get DNA that humans can’t interpret, but the TrueAllele computing method can, and the results are reliable.
About 10 crime labs are using TrueAllele. They’ve processed their own 10,000 cases.
Our business model is interesting because we let interested clients, usually lawyers or sometimes police, send their DNA data to us from the crime lab. We screen their data for free, and send back a preliminary report that indicates whose DNA is in the evidence, who’s not there, and to what degree. Then, it’s up to the client to decide whether or not they want to pay for us to verify and report on that information, or testify about it. We’ve cleared hundreds of people of crimes, just from police queries where they sent us the data and we said no, it’s not your guy.
How come this is not part of the standard investigative process?
For cultural reasons, people have been slow to adopt it, but some have copied our technology and are selling it, so, between all the different software, it’s getting around.
When forensic analysts are trained, they have a concept that they will do everything themselves. They like handling the DNA, preparing and extracting the samples, amplifying it in the lab, running it on the DNA sequencer to separate it, and then doing the interpretation by eye. But people can’t solve 100-dimensional statistical problems that involve a lot of variation and a lot of variables, so they’re very limited in what they can do.
Lab analysts also have the problem that when they testify in court, they will be cross-examined. If they’re not comfortable understanding newer technology, then some may say, “If I can’t see it, I won’t testify to it.” So that’s been a cultural obstacle that we’ve seen for 20 years.
Sometimes the obstacles are lifted. When we reanalyzed the World Trade Center DNA data, 15 years ago, the question was what’s the information in connecting these 18,000 victim remains with the 2,700 missing people.
We started doing innocence cases five years ago. We’ve now helped exonerate 10 innocent men and free them from prison. As a result, more and more defense attorneys are seeing that TrueAllele is a tool of truth, it can show to what extent someone’s there, or is not there. The older methods were one-sided tools. You could include somebody statistically, but you couldn’t exclude them. On top of that, a lot of DNA labs were run by police departments, so some people suggested they could be biased.
TrueAllele uses statistical analysis with a probability model to weigh the evidence. It doesn’t even know what people it’s going to be comparing against. If you get a big match statistic like a million or a billion, that’s statistical support that someone was there. If you get a small number like one in a million or 1 in a billion, that’s statistical support that they were not there. TrueAllele is completely objective.
The best way to expand TrueAllele use in law enforcement would be to open up the national DNA databases. These databases are based on failed interpretation methods from the past – not even 5% of the evidence of mixtures ends up in these databases. They should use modern technology to represent evidence, and not keep a closed monopoly to some law enforcement labs. Rather open it up to anyone with legitimate interest or technology, to solve crimes and free the innocent.
In simple terms, how does your technology work?
PCR (polymerase chain reaction) is an amazing tool for copying small snippets of DNA molecules, but it introduces random variation and artifacts.
A genotype is made of two alleles, one from each parent, which can be the same or different. And that allele is in one of the 10 or 20 regions, called loci, that you’re going to be copying and analyzing. It used to be 4, then it moved to 8, then 10, and then 15. It’s now up to almost 25 locations that are looked at together in one test. Not much of the genome, but enough for testing human identity.
Suppose you have 50 cells. When you break each cell open and look at one location, you will find two alleles, one from your mother or one from your father. But when you amplify the alleles, you don’t get the same number of copies. You could get a tenfold difference, though usually, it’s more within a factor of two. Forensic scientists call these copying differences “stochastic effects.”
PCR is not a faithful amplifier – it has some distortion. When people look at that data variation, particularly with mixtures of two or more people, they can’t figure out what’s going on. There are too many possibilities, so they just throw out the data. That’s the usual result with complex DNA evidence. The labs won’t report it, so unless you call Cybergenetics, it’s gone.
Instead of throwing out the artifacts because of PCR variation, TrueAllele can measure that variation and exploit it to get accurate genotype probabilities of each possible suspect. The general scientific approach is that if there’s variation, you measure it and use it, you don’t throw out the data. As you said, it did not have a tremendous impact on world practice overnight, because many labs were more comfortable not reporting their data.
How do you use the data that you receive and what are you hoping to achieve from it?
We have a huge disclaimer at the bottom of our emails that says, “do not send us DNA samples,” as we only accept data.
The question that we’re always asked is, who’s in the sample and who isn’t? Is there somebody we don’t know about? Can you tell us about the people we don’t even have as references?
Cybergenetics has the TrueAllele computer separate the mixed data into genotypes. It unmixes the mixture up to probability, though there’s always uncertainty. We then compare the genotypes, relative to coincidence, to the point where we get a match statistic.
For reporting, we would say that a match between the victim’s fingernails and the trooper fully is 189 billion times more probable than coincidence. That’s understandable language because there’s a DNA match, and some coincidence – what’s the probability of either outcome? That’s what computers do well.
What we’re hoping to achieve is to find out who’s there in the DNA evidence, who’s not, and who else might be there. Basically, to unravel the crime scene so that other people can figure out what happened. Our focus is not so much on what happened at the scene, as opposed to who was there.
I’ll give you an example from a case I testified in about three years ago. There’s a young man who has friends over at his apartment. Suddenly there’s a struggle, the neighbors hear noise, gunshots ring out, and the host is found on the floor with two dead friends next to him. The host calls 911, as his neighbor stands with a shotgun to his head, and the police arrive.
The prosecution says it’s first-degree murder. That this was an intentional homicide; the defendant invited friends over to kill them. The defense says it was self-defense. Maybe manslaughter. They were just hanging out, there was a struggle, and then they started fighting and guns went off.
The blood on the wall and DNA in their fingernails all indicate what might have happened. So the prosecutor asked us to look at the evidence. He couldn’t use the results because they didn’t help his case. But, under the law, prosecutors must give it over to the defense. The defense looked at our results, and decided to use them in their favor.
In the end, in court you had two lawyers not arguing over the science or trying to attack one another. Instead, the scientific facts of whose DNA was there were enough for the defense to argue that there was a struggle; otherwise, these people’s DNA wouldn’t have been mixed together. Had it been a homicide, he would have just shot them.
We provide scientific facts, but we’re not lawyers or investigators, we’re just scientists and technologists. That’s the best use of science.
What are some legal or ethical issues you’ve had to deal with, and how were they resolved?
They fall into two main categories. One is reliability, and the other is transparency.
How do you know your method works? How do you know the results are accurate? How do you know that the results are reproducible? How do you know it’s reliable? For that, we have scientific and legal standards that help us.
In America, we have the Daubert standard, the main legal standard for reliability which is based primarily on testing. You can bring experts into court and they can argue forever about nonsense, but the real question is what happens when you test with one software as opposed to another software.
An analogy is that it’s like assessing a scale on which you weigh people. The question isn’t whether or not it is possible to weigh people, or if people have weight, or if mass exists in the universe. Those aren’t the real questions. In the real world, you look at empirical testing. Does it work.
The match statistic is also known in science as the weight of evidence. It’s the match magnitude, or log-likelihood ratio. We don’t generally talk about logarithms in court, because that’s not what lawyers study, but that’s what the statistic is. The number of zeros in a number determines the weight of the evidence. Is it six zeros because it’s a million, or is it minus six zeros because it’s one in a million?
In some sense, TrueAllele is merely a DNA scale for making a comparison. Therefore we have to prove that our scale works. The Daubert standards are really simple. About 40 validation studies have been done so far by us, by others, or in collaboration.
Validation studies establish error rates, which we can calculate on every match statistic we report. That’s something almost no one else does. In fact, the old interpretation methods were never validated and they don’t work. We and others have done studies that show they merely report random numbers. They weren’t tested. They don’t have error rates.
The third prong of Daubert is peer review. We have peer review articles that describe our methods, and eight published validation studies that describe the testing. The data go up to 10 unknown contributors in a mixture, and down to a fraction of a cell.
There are now about six different forensic standards for probabilistic genotyping and its validation. We provide documents that show how we comply with every paragraph of every standard.
We do even more under the transparency component, which you can think of as ethical. We are so transparent that often, people can’t believe it when they get all the results from us.
We provide four gigabytes of information. Software that you can review the results with, descriptions of how the technology works, all the papers and studies of the data in the case, and the opportunity to test our software for free.
If you visit our website, you will see over 100 lectures and talks we’ve given, including a series called “How TrueAllele Works.” We know it’s very useful, because people who’ve watched our series have copied our software.
Which trends and technologies do you expect to see more of in your industry in the future?
The most important problem that we’re working on with some groups is opening the past. What’s happened over the last 20 years with failed DNA analysis? How can we solve crimes and free innocent people by using automation on all those old unreported DNA samples?
My vision for the future is the same as the British vision was 20 years ago. You don’t need people in the lab, because the PCR, DNA extractions, robots, and sequencers are all automated. You don’t need people to interpret the data because that’s done much more reliably by computers like TrueAllele.
So what I see is complete automation, where all people have to do is understand the concepts, write reports, and explain them to society. Just like we do at Cybergenetics. Nobody has to be an expert in how MRI works to take an MRI scan, write a report on it, or run the machine. Rather, we rely on effective technology that has been tested.