Columns, Op-Eds, & Interviews

Op-Ed: If a Pre-Trial Risk Assessment Tool Does Not Satisfy These Criteria, It Needs to Stay Out of the Courtroom

WLA Guest
Written by WLA Guest

By Hayley Tsukayama and Jamie Williams, Electronic Frontier Foundation

Artificial Intelligence

Algorithms should not decide who spends time in a California jail. But that’s exactly what will happen under S.B. 10, a new law slated to take effect in October 2019. The law, which Governor Jerry Brown signed in September, requires the state’s criminal justice system to replace cash bail with an algorithmic pretrial risk assessment. Each county in California must use some form of pretrial risk assessment to categorize every person arrested as a “low,” “medium,” or “high” risk of failing to appear for court, or committing another crime that poses a risk to public safety. Under S.B. 10, if someone receives a “high” risk score, the person must be detained prior to arraignment, effectively placing crucial decisions about a person’s freedom into the hands of companies that make assessment tools.

Some see risk assessment tools as being more impartial than judges because they make determinations using algorithms. But that assumption ignores the fact that algorithms, when not carefully calibrated, can cause the same sort of discriminatory outcomes as existing systems that rely on human judgement—and even make new, unexpected errors. We doubt these algorithmic tools are ready for prime time, and the state of California should not have embraced their use before establishing ways to scrutinize them for bias, fairness, and accuracy.

EFF in July joined more than a hundred advocacy groups to urge jurisdictions in California and across the country already using these algorithmic tools to stop until they considered the many risks and consequences of their use. Our concerns are now even more urgent in California, with less than a year to implement S.B. 10. We urge the state to start working now to make sure that S.B. 10 does not reinforce existing inequity in the criminal justice system, or even introduce new disparities.

This is not a merely theoretical concern. Researchers at Dartmouth University found in January that one widely used tool, COMPAS, incorrectly classified black defendants as being at risk of committing a misdemeanor or felony within 2 years at a rate of 40%, versus 25.4% for white defendants.

There are ways to minimize bias and unfairness in pretrial risk assessment, but it requires proper guidance and oversight. S.B. 10 offers no guidance for how counties should calculate risk levels. It also fails to lay out procedures to protect against unintentional, unfair, biased, or discriminatory outcomes.

The state’s Judicial Council is expected to post the first of its rules mandated by S.B. 10 for public comment within the coming days. The state should release information—and soon—about the various algorithmic tools counties can consider, for public review. To date, we don’t even have a list of the tools up for consideration across the state, let alone the information and data needed to assess them and safeguard against algorithmic bias.

We offer four key criteria that anyone using a pretrial risk assessment tool must satisfy to ensure that the tool reduces existing inequities in the criminal justice system rather than reinforces them, and avoids introducing new disparities. Counties must engage the public in setting goals, assess whether the tools they are considering use the right data for their communities, and ensure the tools are fair. They must also be transparent and open to regular independent audits and future correction.

Policymakers and the Public, Not Companies, Must Decide What A Tool Prioritizes

As the state considers which tools to recommend, the first step is to decide what its objective is. Is the goal to have fewer people in prisons? Is it to cut down on unfairness and inequality? Is it both? How do you measure if the tool is working?

These are complex questions. It is, for example, possible to optimize an algorithm to maximize “true positives,” meaning to correctly identify those who are likely to fail to appear, or to commit another dangerous crime if released. Optimizing an algorithm that way, however, also tends to increase the number of “false positives,” meaning more people will be held in custody unnecessarily.

It’s also important to define what constitutes success. A system that recommends detention for everyone, after all, would have both a 100% true positive rate and a 100% false positive rate—and would be horribly unjust. As Matthias Spielkamp wrote for the MIT Technology Review: “What trade-offs should we make to ensure justice and lower the massive social costs of incarceration?”

Lawmakers, the courts, and the public—not the companies who make and sell algorithmic tools—should decide together what we want pretrial risk assessment tools to prioritize and how to ensure that they are fair.

The Data and Assumptions Used to Develop the Algorithm Must Be Scrutinized

Part of the problem is that many of these pretrial risk assessment tools must be trained by examining existing data. But the assumptions a developer makes when creating an assessment don’t always apply to the communities upon which they are used. For example, the dataset used to train a machine-learning algorithm might not be representative of the community that will eventually use the risk assessment. If the risk assessment tool was developed with bad training data, i.e. it “learned” from bad data, it will produce bad risk assessments.

How might the training data for a machine-learning algorithm be bad?

For example, the rate of re-arrest of released defendants could be used as a way to measure someone’s risk to public safety when building an algorithm. But does the re-arrest rate actually tell us about risk to public safety? In fact, not all jurisdictions define re-arrest in the same way. Some include only re-arrests that actually result in bail revocation, but some include traffic or misdemeanor offenses that don’t truly reflect a risk to society.

Training data can also often be “gummed up by our own systemic biases.” Data collected by the Stanford Open Policing Project shows that officers’ own biases cause them to stop black drivers at higher rates than white drivers and to ticket, search, and arrest black and Hispanic drivers during traffic stops more often than whites. Using a rate of arrest that includes traffic offenses could therefore introduce more racial bias into the system, rather than reduce it.

Taking the time to clean datasets and carefully vet tools before implementation is necessary to protect against unfair, biased, or discriminatory outcomes.

Fairness and Bias Must Be Considered and Corrected

Beyond examining the training data algorithms use, it’s also important to understand how the algorithm makes its decisions. The fairness of any algorithmic system should be defined and reviewed before implementation as well as throughout the system’s use. Does an algorithm treat all groups of people the same? Is the system optimizing for fairness, for public safety, for equal treatment, or for the most efficient allocation of resources?

Biased decision-making is a trap that both simple and complicated algorithms can fall into. Even a tool using carefully vetted data that focuses too narrowly on a single measure of success, for example, can also produce unfair assessments. (See, for example, Goodhart’s Law.) Algorithmic systems used in criminal justice, education policy, insurance, and lending have exhibited these problems.

It’s important to note that simply eliminating race or gender data will not make a tool fair because of the way machine learning algorithms process information. Sometimes machine learning algorithms will make prejudiced or biased decisions even if data on demographic categories is deliberately excluded—a phenomenon called “omitted variable bias” in statistics. For example, if a system is asked to predict a person’s risk to public safety, but lacks information about their access to supportive resources, it could improperly learn to use their postal code as a way to determine their threat to public safety.

In this way, risk assessment can use factors that appear neutral—such as a person’s income level—but produce the same unequal results as if they had used prohibited factors such as race or sex.

Automated assessments can also fail to take important, but less obvious, information about people’s lives into account—reducing people to the sum of their data and ignoring their humanity. A risk assessment may not, for example, consider something like familial relationships and responsibilities. But a person who is the primary caregiver for a sick relative may be at significantly higher risk of failing to appear in court—but not purposely absconding. If these familial relationships are not considered, then the system may conflate such life circumstances with a risk of flight—which would lead to inaccurate, potentially biased, and discriminatory outcomes in the future.

There are sensible solutions to address omitted variable bias, and they must be applied properly to offset existing biases inherent in the training data.

The Public and Independent Experts Must Be Informed and Consulted

Any government decision to adopt a system or tool that uses algorithmic decision-making is a policy decision—whether the system is being used for pretrial risk assessment or to determine whether to cut people off from healthcare—and the public needs to be able to hold the government accountable for those decisions. Thus, even when decision makers have thought through the steps we’ve outlined as they choose vendors, it’s equally vital that they let the public and independent data scientists review them.

Developers must be upfront about how their tools work, so that courts, policy makers, and the public understand how tools fit their communities. If these tools are allowed to be a “black box”— a system or device that doesn’t reveal how it reaches its conclusions—then it robs the public of their right to understand what the algorithm does and to test its fairness and accuracy. Without knowing what goes into the black box, it’s hard to assess the fairness and validity of what comes out of it.

The public must have access to the source code and the materials used to develop these tools, and the results of regular independent audits of the system, to ensure tools are not unfairly detaining innocent people or disproportionately affecting specific classes of people.

Transparency gives people a way to measure progress and ensure government accountability. As Algorithm Watch says, “The fact that most [algorithmic decision making] procedures are black boxes to the people affected by them is not a law of nature. It must end.”

California Needs To Address These Issues Immediately

As California looks to implement S.B. 10, it should not rely on vendor companies’ marketing promises. We urge the state to vet thoroughly any algorithmic tools considered—and enable independent experts and auditors to do the same. There must be thorough and independent evaluations of whether the tools up for consideration are fair and appropriate.

Any recommendation to take away someone’s liberty must receive immediate human review. These considerations should have been baked into S.B. 10 from the start. But it is critical that California satisfy these four criteria now, and that policymakers across the country considering similar laws build these critical safeguards directly into their legislation.

This commentary first appeared on the Electronic Frontier Foundation’s Deeplinks Blog. The nonprofit Electronic Frontier Foundation works to defend civil liberties in the digital world.

Hayley Tsukayama is a legislative activist focusing on state legislation for the Electronic Frontier Foundation.
Jamie Williams is a staff attorney on EFF’s civil liberties team.

Image by the Electronic Frontier Foundation.


Leave a Comment