Using Deep Learning and Google Street View to Estimate the Demographic Makeup of the US

As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative to human review. Here, the authors present a method that attempts to determine socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, the authors attempted to determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful.

Abstract: The United States spends more than $1B each year on initiatives such as the American Community Survey (ACS), a labor-intensive door-to-door study that measures statistics relating to race, gender, education, occupation, unemployment, and other demographic factors. Although a comprehensive source of data, the lag between demographic changes and their appearance in the ACS can exceed half a decade. As digital imagery becomes ubiquitous and machine vision techniques improve, automated data analysis may provide a cheaper and faster alternative. Here, we present a method that determines socioeconomic trends from 50 million images of street scenes, gathered in 200 American cities by Google Street View cars. Using deep learning-based computer vision techniques, we determined the make, model, and year of all motor vehicles encountered in particular neighborhoods. Data from this census of motor vehicles, which enumerated 22M automobiles in total (8% of all automobiles in the US), was used to accurately estimate income, race, education, and voting patterns, with single-precinct resolution. (The average US precinct contains approximately 1000 people.) The resulting associations are surprisingly simple and powerful. For instance, if the number of sedans encountered during a 15-minute drive through a city is higher than the number of pickup trucks, the city is likely to vote for a Democrat during the next Presidential election (88% chance); otherwise, it is likely to vote Republican (82%). Our results suggest that automated systems for monitoring demographic trends may effectively complement labor-intensive approaches, with the potential to detect trends with fine spatial resolution, in close to real time.


The Ethics of Algorithms: Mapping the Debate

More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper attempts to clarify the ethical importance of algorithmic mediation by: 1) providing a prescriptive map to organize the debate; 2) reviewing the current discussion of ethical aspects of algorithms; and 3) assessing the available literature in order to identify areas requiring further work to develop the ethics of algorithms.

Abstract: In information societies, operations, decisions and choices previously left to humans are increasingly delegated to algorithms, which may advise, if not decide, about how data should be interpreted and what actions should be taken as a result. More and more often, algorithms mediate social processes, business transactions, governmental decisions, and how we perceive, understand, and interact among ourselves and with the environment. Gaps between the design and operation of algorithms and our understanding of their ethical implications can have severe consequences affecting individuals as well as groups and whole societies. This paper makes three contributions to clarify the ethical importance of algorithmic mediation. It provides a prescriptive map to organise the debate. It reviews the current discussion of ethical aspects of algorithms. And it assesses the available literature in order to identify areas requiring further work to develop the ethics of algorithms.


Accountability for the Use of Algorithms in a Big Data Environment

Decision makers, both in the private and public sphere, increasingly rely on algorithms operating on Big Data. As a result, special mechanisms of accountability concerning the making and deployment of algorithms is becoming more urgent. In the upcoming EU General Data Protection Regulation, concepts such as accountability and transparency are guiding principals. Yet, the authors argue that the accountability mechanisms present in the regulation cannot be applied in a straightforward way to algorithms operating on Big Data. The complexities and the broader scope of algorithms in a Big Data setting call for effective, appropriate accountability mechanisms.

Abstract: Accountability is the ability to provide good reasons in order to explain and to justify actions, decisions, and policies for a (hypothetical) forum of persons or organizations. Since decision-makers, both in the private and in the public sphere, increasingly rely on algorithms operating on Big Data for their decision-making, special mechanisms of accountability concerning the making and deployment of algorithms in that setting become gradually more urgent. In the upcoming General Data Protection Regulation, the importance of accountability and closely related concepts, such as transparency, as guiding protection principles, is emphasized. Yet, the accountability mechanisms inherent in the regulation cannot be appropriately applied to algorithms operating on Big Data and their societal impact. First, algorithms are complex. Second, algorithms often operate on a random group-level, which may pose additional difficulties when interpreting and articulating the risks of algorithmic decision-making processes. In light of the possible significance of the impact on human beings, the complexities and the broader scope of algorithms in a big data setting call for accountability mechanisms that transcend the mechanisms that are now inherent in the regulation.


Accountable Algorithms

Many important decisions historically made by people are now made by computers. Algorithms can count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an audit, and grant or deny immigration visas. This paper argues that accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? The authors propose that additional approaches are needed to ensure that automated decision systems — with their potentially incorrect, unjustified or unfair results — are accountable and governable. This article describes a new technological toolkit that can be used to verify that automated decisions comply with key standards of legal fairness.

Abstract: Many important decisions historically made by people are now made by computers. Algorithms count votes, approve loan and credit card applications, target citizens or neighborhoods for police scrutiny, select taxpayers for an IRS audit, and grant or deny immigration visas.

The accountability mechanisms and legal standards that govern such decision processes have not kept pace with technology. The tools currently available to policymakers, legislators, and courts were developed to oversee human decision-makers and often fail when applied to computers instead: for example, how do you judge the intent of a piece of software? Additional approaches are needed to make automated decision systems — with their potentially incorrect, unjustified or unfair results — accountable and governable. This Article reveals a new technological toolkit to verify that automated decisions comply with key standards of legal fairness.

We challenge the dominant position in the legal literature that transparency will solve these problems. Disclosure of source code is often neither necessary (because of alternative techniques from computer science) nor sufficient (because of the complexity of code) to demonstrate the fairness of a process. Furthermore, transparency may be undesirable, such as when it permits tax cheats or terrorists to game the systems determining audits or security screening.

The central issue is how to assure the interests of citizens, and society as a whole, in making these processes more accountable. This Article argues that technology is creating new opportunities — more subtle and flexible than total transparency — to design decision-making algorithms so that they better align with legal and policy objectives. Doing so will improve not only the current governance of algorithms, but also — in certain cases — the governance of decision-making in general. The implicit (or explicit) biases of human decision-makers can be difficult to find and root out, but we can peer into the “brain” of an algorithm: computational processes and purpose specifications can be declared prior to use and verified afterwards.

The technological tools introduced in this Article apply widely. They can be used in designing decision-making processes from both the private and public sectors, and they can be tailored to verify different characteristics as desired by decision-makers, regulators, or the public. By forcing a more careful consideration of the effects of decision rules, they also engender policy discussions and closer looks at legal standards. As such, these tools have far-reaching implications throughout law and society.

Part I of this Article provides an accessible and concise introduction to foundational computer science concepts that can be used to verify and demonstrate compliance with key standards of legal fairness for automated decisions without revealing key attributes of the decision or the process by which the decision was reached. Part II then describes how these techniques can assure that decisions are made with the key governance attribute of procedural regularity, meaning that decisions are made under an announced set of rules consistently applied in each case. We demonstrate how this approach could be used to redesign and resolve issues with the State Department’s diversity visa lottery. In Part III, we go further and explore how other computational techniques can assure that automated decisions preserve fidelity to substantive legal and policy choices. We show how these tools may be used to assure that certain kinds of unjust discrimination are avoided and that automated decision processes behave in ways that comport with the social or legal standards that govern the decision. We also show how algorithmic decision-making may even complicate existing doctrines of disparate treatment and disparate impact, and we discuss some recent computer science work on detecting and removing discrimination in algorithms, especially in the context of big data and machine learning. And lastly in Part IV, we propose an agenda to further synergistic collaboration between computer science, law and policy to advance the design of automated decision processes for accountability.


Why a Right to Explanation of Automated Decision Making Does Not Exist in the General Data Protection Regulation

This paper argues that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against harmful automated decision-making, and therefore runs the risk of being toothless. The authors propose a number of legislative steps that, they argue, would improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.

Abstract: Since approval of the EU General Data Protection Regulation (GDPR) in 2016, it has been widely and repeatedly claimed that a ‘right to explanation’ of decisions made by automated or artificially intelligent algorithmic systems will be legally mandated by the GDPR. This right to explanation is viewed as an ideal mechanism to enhance the accountability and transparency of automated decision-making. However, there are several reasons to doubt both the legal existence and the feasibility of such a right. In contrast to the right to explanation of specific automated decisions claimed elsewhere, the GDPR only mandates that data subjects receive limited information (Articles 13-15) about the logic involved, as well as the significance and the envisaged consequences of automated decision-making systems, what we term a ‘right to be informed’. Further, the ambiguity and limited scope of the ‘right not to be subject to automated decision-making’ contained in Article 22 (from which the alleged ‘right to explanation’ stems) raises questions over the protection actually afforded to data subjects. These problems show that the GDPR lacks precise language as well as explicit and well-defined rights and safeguards against automated decision-making, and therefore runs the risk of being toothless. We propose a number of legislative steps that, if taken, may improve the transparency and accountability of automated decision-making when the GDPR comes into force in 2018.


Exposure Diversity as a Design Principle for Recommender Systems

Some argue that algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct, this paper discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to discourage potential “filter bubbles” rather than create them. Combining insights from democratic theory, computer science, and law, the authors suggest design principles and explore the potential and possible limits of “diversity sensitive design.”

Abstract: Personalized recommendations in search engines, social media and also in more traditional media increasingly raise concerns over potentially negative consequences for diversity and the quality of public discourse. The algorithmic filtering and adaption of online content to personal preferences and interests is often associated with a decrease in the diversity of information to which users are exposed. Notwithstanding the question of whether these claims are correct or not, this article discusses whether and how recommendations can also be designed to stimulate more diverse exposure to information and to break potential ‘filter bubbles’ rather than create them. Combining insights from democratic theory, computer science and law, the article makes suggestions for design principles and explores the potential and possible limits of ‘diversity sensitive design’.


Algorithmic Transparency via Quantitative Input Influence: Theory and Experiments with Learning Systems

In this paper, the authors have developed a formal foundation to improve the transparency of such decision-making systems. Specifically, they introduce a family of Quantitative Input Influence (QII) measures that attempt to capture the degree of influence of inputs on outputs of systems. These measures can provide a foundation for the design of transparency reports that accompany system decisions (e.g. explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination).

Abstract: Algorithmic systems that employ machine learning play an increasing role in making substantive decisions in modern society, ranging from online personalization to insurance and credit decisions to predictive policing. But their decision-making processes are often opaque—it is difficult to explain why a certain decision was made. We develop a formal foundation to improve the transparency of such decision-making systems. Specifically, we introduce a family of Quantitative Input Influence (QII) measures that capture the degree of influence of inputs on outputs of systems. These measures provide a foundation for the design of transparency reports that accompany system decisions (e.g., explaining a specific credit decision) and for testing tools useful for internal and external oversight (e.g., to detect algorithmic discrimination). Distinctively, our causal QII measures carefully account for correlated inputs while measuring influence. They support a general class of transparency queries and can, in particular, explain decisions about individuals (e.g., a loan decision) and groups (e.g., disparate impact based on gender). Finally, since single inputs may not always have high influence, the QII measures also quantify the joint influence of a set of inputs (e.g., age and income) on outcomes (e.g. loan decisions) and the marginal influence of individual inputs within such a set (e.g.,income). Since a single input may be part of multiple influential sets, the average marginal influence of the input is computed using principled aggregation measures, such as the Shapley value, previously applied to measure influence in voting. Further, since transparency reports could compromise privacy, we explore the transparency-privacy tradeoff and prove that a number of useful transparency reports can be made differentially private with very little addition of noise. Our empirical validation with standard machine learning algorithms demonstrates that QII measures are a useful transparency mechanism when black box access to the learning system is available. In particular, they provide better explanations than standard associative measures for a host of scenarios that we consider. Further, we show that in the situations we consider, QII is efficiently approximable and can be made differentially private while preserving accuracy.


Data-Driven Discrimination at Work

A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the bases on which employment decisions are made; and they may exacerbate inequality because error detection is limited and feedback effects can compound bias. Given these risks, this paper argues for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics.

Abstract: A data revolution is transforming the workplace. Employers are increasingly relying on algorithms to decide who gets interviewed, hired, or promoted. Although data algorithms can help to avoid biased human decision-making, they also risk introducing new sources of bias. Algorithms built on inaccurate, biased, or unrepresentative data can produce outcomes biased along lines of race, sex, or other protected characteristics. Data mining techniques may cause employment decisions to be based on correlations rather than causal relationships; they may obscure the basis on which employment decisions are made; and they may further exacerbate inequality because error detection is limited and feedback effects compound the bias. Given these risks, I argue for a legal response to classification bias — a term that describes the use of classification schemes, like data algorithms, to sort or score workers in ways that worsen inequality or disadvantage along the lines or race, sex, or other protected characteristics. Addressing classification bias requires fundamentally rethinking anti-discrimination doctrine. When decision-making algorithms produce biased outcomes, they may seem to resemble familiar disparate impact cases; however, mechanical application of existing doctrine will fail to address the real sources of bias when discrimination is data-driven. A close reading of the statutory text suggests that Title VII directly prohibits classification bias. Framing the problem in terms of classification bias leads to some quite different conclusions about how to apply the anti-discrimination norm to algorithms, suggesting both the possibilities and limits of Title VII’s liability-focused model.