We believe that a first fundamental step to creating more equitable methods of data mining starts at the educational level. Our research into course listings at other schools and our own computer science department shows that the awareness and education of the ethical consequences surrounding data mining is lacking. This deficiency is unacceptable as it perpetuates injustice. There are people out in the world working hard to solve some of these issues, but how does one even start? The age in which we live today is filled with data; our society runs on it. Data mining is used in all different types of industries like healthcare, policing, telecommunication, and social media, and the ethical issues that stem from its applications extend far beyond the few we’ve discussed in this paper. It’s a momentous task to take on all these problems, and there’s no one policy prescription or silver bullet that can alleviate them. This is why we believe the first fundamental step is to move upstream and influence the minds of the future creators of these technologies. Our curriculum (see Appendix) will not just be a lecture on the various negative consequences of data mining; that’s a superficial outlook since data mining is already so integrated into society. To even begin to come up with solutions students have to gain a better understanding of the technology itself and think about the underlying assumptions that are included in the design of these algorithms. In 2020, we should stamp out injustice and inequity wherever we see it; to that end we’ve laid out a data mining ethics course in the hopes that students will take what they learn and use it to create more equitable technologies out in the real world after graduation.

Our course structure consists of four modules: Technology as a Socio-Technical System, Introduction to Data Mining, Biases in Data Mining, and Privacy & Data Mining. The structure is organized as such so that the students are equipped with the knowledge to analyze the resulting ethical issues by the time we introduce data mining. At the start of the course, we plan to teach students about technology as socio-technical systems. Technology is hard to define, and any superficial misconceptions about it being solely an object must be debunked. We will rely heavily on Mathewson’s Technology and Social Theory to broaden our perception of technology as a collection of objects, activities, and knowledge. We will discuss the faults of theories like technological determinism and social determinism before finding a middle ground where technology and society bilaterally influence each other. From there it becomes clear that while technology does help shape our daily lives, humans and societal contexts play an integral role in how that technology is adopted and applied. This leads us to the notion that technology is value laden since humans are in fact the creators of technology and we instill in it our own morals and characteristics. After students have a better understanding regarding our relationship with technology and its place within society, we can move on to the specific technology of data mining. We will use this second module to discuss the rise of big data through the HBO documentary The Human Face of Big Data. The proliferation of data has led society to adopt these techniques of extracting patterns from large datasets, and we must recognize why the practice came about. We will cover the basics of data mining as well as machine learning and the underlying algorithms at a level where students from any academic background can understand and take part in the discussions.  We will talk about the proliferation of data mining and the extent to which it is used in numerous industries. From there we will move on to our third module which encompasses our first in depth ethical issue: bias.

We use two main contexts to discuss bias in data mining: facial recognition, and predictive policing. The technology used to detect facial features is based on algorithms which analyze huge amounts of data. In recent cases, we’ve seen that Zoom has a problem detecting African Americans on camera in conjunction with virtual backgrounds (Dickey, 2021). This may seem harmless, but it is degrading to anyone who is negatively impacted by this bias and should not be an issue in 2020. A context where data mining results in perhaps more serious consequences is policing. With advancements in technology, law enforcement has increasingly relied on predictive policing programs to stop crime before it takes place. Behind some of these programs is a database of individuals who are deemed to be more threatening based on the number of encounters with police officers. But given that minorities are far more likely to be stopped by police (Racial Profiling 2.0, 2020) based on their ethnicity, this just creates a feedback loop which strengthens the bias in the system. CBSN’s Racial Profiling 2.0 documentary will help elucidate the inner workings of some of these programs as well as their effects on the communities in which they are used. We will then conduct an activity with hands-on analysis of datasets. At this point in the class, students can apply their new knowledge of technology as socio-technical systems when analyzing bias in data mining. They will know by now that data mining and these algorithms are not value neutral. The underlying issue in many of these cases is either the biased data being fed into the algorithms, or the inherent bias in all humans which is reflected in the technology when it is created.

In the fourth module we will take a close look at our next ethical issue: political and privacy issues. We’ll start by analyzing the documentary The Great Hack and use the Cambridge Analytica scandal as a case study to show how these social media companies can influence political elections.  Next will cover an even more recent issue with TikTok and the privacy concerns surrounding it. Then we’ll pivot to a more general overview of how the notion of privacy has changed as more and more aspects of our lives have moved online. We will then briefly cover the legal aspects; our “right to privacy” and how that is protected by laws and regulations at state and Federal level. We will look at how our economic system has created an environment where companies are centered around monetizing our data through excerpts from Professor Zuboff’s book The Age of Surveillance Capitalism. Nowhere is this more obvious than in the social media industry, and we will take the time to watch and discuss the Netflix documentary The Social Dilemma. Students will learn about the problematic business models of these companies as well as just how much influence they have over the information we receive. It is our hope that students come away from this module with an understanding of how Surveillance Capitalism erodes democracy and human autonomy by influencing our thoughts and behaviors through social media. To wrap up, students will complete another activity where they come up with a policy they think best tackles the problem of Surveillance Capitalism and the monetization of personal information. 

One of the strongest aspects of our course will be its interdisciplinary nature. We believe Lafayette is uniquely suited to hold a class such as this within the EGRS major. This class was structured specifically to avoid the more technical side of data mining, and instead analyze its place and uses in society at large. We want to entice students of all academic backgrounds to the discussion, not just computer science majors. Our analysis of the underlying ethical issues will benefit from the diversity of perspectives coming from majors such as Policy Studies, Economic, and A&S to name a few. 

We’ve conducted interviews with Professors Lopez and Liew at Lafayette and have gained valuable knowledge on both the nature of data mining ethics education and advice on how to make the class impactful and efficient. Both professors stressed that students must understand the inherent biases in datasets, as well as how the choices we make introduce bias into the algorithm. The best way to get this point across is to have practical data lessons where students can manipulate datasets for themselves and draw conclusions at the end. We want to teach students to challenge the existing data rather than simply accepting it and running it through an algorithm. These activities were designed so that students can see how sometimes the data and the choices we make about the data can be biased, rather than just the algorithm itself. Students do not need to have a vast knowledge of data regression to complete this activity, in keeping with our interdisciplinary nature we plan on using Excel and simple analyses. Professor Liew mentioned that not many undergrads get access to large datasets like the ones we plan on using. Working with data (epically in our age of big data) is an important skill to have no matter what career you’re interested in, and we hope this unique activity will further entice students to our class.

We also plan to make multimedia an integral part of our class in order to avoid dull, droning lectures. By introducing mediums such as documentaries, interactive websites, replays of congressional hearings, and talks by speakers outside of Lafayette, we hope to engage students more fully while also delivering an impactful message about how relevant this topic is in society today. A complete list of our multimedia sources can be found below our course syllabus (see Appendix).

To view our conclusion please click here.