In 2016, Pro Publica released a study that found that algorithms used across the US to predict future criminals – algorithms that come up with “risk assessments” by crunching answers to questions such as whether a defendant’s parents ever did jail time, how many people they know who take illegal drugs, how often they’ve missed bond hearings, or if they believe that hungry people have a right to steal – are biased against black people.
Pro Publica came up with that conclusion after analyzing what it called “remarkably unreliable” risk assessments assigned to defendants:
Only 20% of the people predicted to commit violent crimes actually went on to do so.
What Pro Publica’s data editors couldn’t do: inspect the algorithms that are used to come up with such scores. That’s because they’re proprietary.
The algorithms that produce the risk assessment scores that are widely used throughout the country’s criminal justice systems aren’t the only ones that have been found to be discriminatory: similarly, studies have found that black faces are disproportionately targeted by facial recognition technology. The algorithms themselves have been found to be less accurate at identifying black faces – particularly those of black women.
It’s because of such research findings that New York City has passed a bill to study biases in the algorithms used by the city. According to Motherboard, it’s thought to be the first in the country to push for open sourcing of the algorithms used by courts, police and city agencies.
The bill, Intro 1696-A, would require the creation of a task force that “provides recommendations on how information on agency automated decision systems may be shared with the public and how agencies may address instances where people are harmed by agency automated decision systems.”
Passed by the City Council on 11 December, the bill could be signed into law by Mayor Bill de Blasio by month’s end.
The bill’s current form doesn’t go as far as criminal justice reformers and civil liberties groups would hope.
An earlier version introduced by council member James Vacca, of the Bronx, would have forced all agencies that base decisions on algorithms – be it for policing or public school assignments – to make those algorithms publicly available.
The watered-down version only calls for a task force to study the possibility of bias in algorithms, be it discrimination based on “age, race, creed, color, religion, national origin, gender, disability, marital status, partnership status, caregiver status, sexual orientation, alienage or citizenship status.”
The idea of an “open-source” version was resisted by Tech:NYC, a high-tech industry trade group that counts among its members companies such as Facebook, Google, eBay, Airbnb and hundreds of small startups, such as Meetup.
Tech:NYC policy director Taline Sanassarian testified at an October hearing that the group was concerned that the proposal would have a chilling effect on companies that might not want to work with the city if doing so required making their proprietary algorithms public. She also suggested that open-sourcing the algorithms could lead to Equifax-like hacking:
Imposing disclosure requirements that will require the publishing of confidential and proprietary information on city websites could unintentionally provide an opportunity for bad actors to copy programs and systems. This would not only devalue the code itself, but could also open the door for those looking to compromise the security and safety of systems, potentially exposing underlying sensitive citizen data.
But most of the technologists in the room didn’t agree with her, according to Civic Hall.
Civic Hall quoted Noel Hidalgo, executive director of the civic technology organization BetaNYC, who said in written testimony that “Democracy requires transparency; copyright nor ‘trade secrets’ should ever stand in the way of an equitable, accountable municipal government.”
Another technologist who spoke in favor of the open-sourcing of the algorithms was Sumana Harihareswara, who said that open-source tools and transparency are the way to get better security, not worse.
If there are businesses in our community that are making money off of citizen data and can’t show us the recipe for the decisions they’re making, they need to step up, and they need to get better, and we need to hold them accountable.
Joshua Norkin, a lawyer with the Legal Aid Society, told Motherboard’s Roshan Abraham that it’s “totally ridiculous” to say that government has some kind of obligation to protect proprietary algorithms:
There is absolutely nothing that obligates the city to look out for the interests of a private company looking to make a buck on a proprietary algorithm.
The argument over whether open source or proprietary technology is more secure should sound familiar. In fact, the same debate is taking place now, a year before our next US election, with regards to how to secure voting systems.
Terry
If it’s democracy anyone is after, here’s what to do. Have a vote on every important issue from everyone involved. Go with the democratic majority flow. It’s democracy, stupid. Nothing else is, although it may claim to be.
Giulio Douhet
BetaNYC, despite claiming titles like technologist, appears not to have any understanding of the sort of algorithms involved in machine learning. Idiots. They need to examine the training data, but it’s liar’s game of statistics once politics gets involved.
Giving a platform to people that don’t know what they’re talking about benefits no one. I’m disappointed in you, Naked Security.
Paul Ducklin
To be fair to us, we’re reporting how the meetings and decision went down, not “giving a platform” to BetaNYC. The computer security angle is issue of whether ranking systems (or, as we mention, voting systems) should be open or not. Google’s famously isn’t, mainly to prevent people from gaming it for SEO reasons; but where should the public sector be in matters like this? I think our readers can make their own minds up about BetaNYC, which is part of this story whether you like it or not.
Mahhn
This should be a really bad joke… NY is upset because their Minority Report software doesn’t work, so they are going to spend more money to try and understand why it doesn’t work, but they are all ready calling it racist. It was a fricking movie! the people that bought the software must be the most gullible people on earth. I can see the conversation: We got this magic software, it predicts future criminals. (later that year) Something is wrong with our magic software, we could only convict 20% of the people it charged. It must be broken, lets throw other peoples money at it until it’s magic like it was supposed to be, weeeee other peoples money, spend spend spend woohooo.. (dem state)
Paul Ducklin
I’m inclined to take this view, too, at least the way this story is written (there might be a bit more to it than we’ve covered here, but I’ll assume we’re close enough).
If there really is a 20% true positive rate in any major subgroup (diced however you want – race, gender, height, visual acuity, number of languages spoken, latitude of birthplace, third letter of first name), then is it really *bias* of the sort described here that needs eliminating?
“We used to get it completely wrong for 80% of black people but only 41% of whites. Thanks to our bias investigation, we’re now at a perfectly equitable condition where it’s an even 50% of both groups that get falsely accused. Huzzah amd hooray!” But would that be a result, or would it simply mask the fact that the best the system can do is to be “less badly wrong”?
In other words, if you think there is a flaw in the system and you decide to rush in and try to explain it (and fix it) in terms of a pre-determined list of prejudices…you may end up with nothing better than a still-flawed system that nevertheless seems to have won a gold star for improvement.
When you bring a hammer, surprise, surprise, all your problems suddenly start looking like nails.
Cal Frye
I’m open to companies protecting their trade secrets, provided they will also bear full responsibility for the fairness of the results, including being the lead defendant in court proceedings questioning those results and the liability for damages. Releasing the source and training data is the protection against such suits.