“The world’s most valuable resource is no longer oil, but data” according to a 2017 Economist article. This seems even more prevalent today as technology is becoming more intertwined in our daily lives. While seemingly true on the surface, this headline actually fails to understand the purpose of data and our relationship with it.

 

Figure 2. Big Data (MIT Technology Review, 2019)

 

Data is not necessarily significant in and of itself.; it becomes useful when you derive meaning from the raw information. Enter data mining, the process of discovering patterns within large datasets. The term is just one step in the Knowledge Discovery Process – finding knowledge in data – but it has become the buzzword people associate with extracting information from data. Humans can’t pour through millions of data points on their own, so we employ computers to do it for us. Data mining therefore involves fields such as machine learning, statistics, and computer science to discern patterns and trends from raw information. As digitization becomes more and more abundant, so do applications for data mining. Retail stores, supermarkets, financial institutions, telecommunication companies, police departments, and social media companies are just some examples of where data mining can be used. The information through the mining process can make businesses more effective and efficient by reducing costs, increasing sales, more effective marketing, and higher customer retention, among numerous other applications.

            These benefits come at a cost however, and as data mining becomes more prevalent in society people are beginning to shed light on some of the ethical issues that have arisen. At the core of data mining is information on individuals. Some people are uncomfortable with companies sifting through their personal information and using it for their own purposes. Questions about the amount of access businesses should have to user data, and how they actually use that data have been raised by “big data” critics and concerned citizens in general. The ethical issues surrounding data mining differ by industry and application, but we have boiled it down to a few main problems: discrimination, privacy, and political influence. 

To study discrimination, we turn to law enforcement. As our technological capabilities advance, law enforcement has found ways to integrate that technology into their jobs. Today, police departments are utilizing data mining in what is known as predictive policing to help them fight crime. There are numerous programs where this process is carried out. One example of this: PredPol, is currently used by 60 police departments in the US (Puente, 2019). The software uses historical event datasets to train its algorithm to highlight areas where crime is likely to take place. Another app called Neighbor is widely used by police to identify potential criminals. Users grant police departments access to their Ring security cameras and also upload feeds to the Neighbors app ostensibly so police can watch crimes being committed. What analysts have found however, is that the majority of the “suspects” the police act upon are minorities and some haven’t even committed a crime. Another program called “Laser” which was implemented by the LAPD targeted criminals who were most likely to commit crimes. The software came to this conclusion by assigning a score to individuals based on a number of factors including past criminal history and police interactions. The score was directly correlated to how “at risk” a person was. This program was eventually dissolved but citizens are still concerned that subsequent arrests have been based off of the software’s predictions (Racial Profiling 2.0, 2020).

  Many people believe these programs are unjust and biased. The predictions are based off of data that is fed into the algorithm, but what if that data is biased in itself? We know that minorities are more likely to be stopped by police compared to whites, so what if a minority is pulled over for an unfounded reason? That individual is then put into a database which deems him/her more likely to commit a crime, which then leads them to be more highly scrutinized by the police and more likely to be pulled over again. Because police are biased towards minorities, the predictive policing programs which use that data only perpetuate and possibly even amplify that bias.

For privacy concerns we look towards social media companies. While we stay glued to our phones, these companies are collecting data on what we’re reading, what we’re watching, and what pages we’re most engaged with. They can then sell that data to ad companies which target us with promotions designated to grab our attention. The Netflix documentary The Social Dilemma, portrayed these companies as puppeteers who compile profiles of us based on our data and then control what we see on our screens. There’s a huge economic incentive for these companies to gather user data. Facebook for example made $67.9 billion dollars from advertising in 2019 which comprised more than 98% of their total revenue for that year (Lyengar, 2020). People are becoming increasingly concerned with online privacy and how companies are using their data (Rainie, 2020). There are over 3 billion people that use social media and that number has been growing every year (Dean, 2020). The average user spends multiple hours on these platforms every day which results in an almost unfathomable amount of data being generated. This makes social media an ideal case study to analyze what can happen to all our online information. Although today there is a heightened attention to privacy policies, people are still not protected since third party corporations often have different policies towards data handling than the social media companies they received the data from. In some cases, users aren’t even aware or don’t consent to having their data analyzed or sold; in addition, they have no idea what their data is being used for. This lack of transparency is particularly concerning, and the privacy issues will only get worse as social media becomes more prevalent. These platforms are available for users to freely express themselves online; and if they’re constantly worried about their data being passed around to third party corporations’ people will be more guarded in the future or might stop using social media all together. 

The next issue we look at how data mining can be problematic in a political context. Previously we saw how social media companies’, through data mining and transfer of that data to third parties, decide, to a certain extent, what we see on our screens. It would appear then, that we have lost some autonomy when we’re engaging with our devices. All this information being put in front of us could influence our way of thinking. If political in nature, social media companies could persuade us to join certain protests, or even vote a certain way in an election. What is especially concerning about this subtle process is that we have no say in what ads are shown to us. One might ask then, are these political views even our own? It’s a scary thought that social media companies might in fact be determining the outcome of political elections by controlling our flow of information. In 2016 this actually happened. The Trump team hired Cambridge Analytica to run the digital arm of their campaign. Cambridge Analytica then gathered data from surveys and apps on Facebook to build a profile of American voters. They bragged they had around 5,000 data points per person (The Great Hack) which they used to construct profiles of different types of people. They then targeted users with ads, specifically ones in which their profile indicated they could be swayed in the upcoming election to vote for Trump. They obtained this data from Facebook illegally without users’ knowledge or consent. In the documentary The Great Hack, the producers clearly see this large-scale data analysis as a threat to individual autonomy and democracy.

 Our solution to the ethical issues that arise from data mining do not involve changing policy or implementing new regulations, at least not directly. We believe that focusing on the educational level behind these technologies and algorithms is a fundamental first step to enacting real change. The lack of education surrounding the consequences of some of these methods perpetuates injustice. To that end we have laid out a data mining ethics course where we can teach students to incorporate ethical considerations in the design process in the hope that they will use this knowledge to create more equitable technology after they graduate. We will create a curriculum that will highlight the issues data mining creates through lectures, multimedia sources, readings, and visiting professor talks; the goal being to have a more engaging and diverse class than a normal lecture. The main modules of the lecture are as follows: 1) Technology as a Socio-Technical System 2) Introduction into Data Mining 3) Biases in Data Mining 4) Privacy & Data Mining. Section one is where we will analyze data mining as a socio-technical system. It appears at the beginning of our class because it is the most important, and it will set up our discussion of data mining for the entirety of the semester. Students need to see how technology and society impact and affect each other. From there we will discuss how technology is inherently human instilled with human values since we created it, and how that may actually cause some of the resulting problems. By learning at the start how data mining is not a singular piece of technology, but rather an entire socio-technical system, students will have a better understanding of the issues surrounding it, and hopefully be better equipped to fix them in the future. In this theme we will focus on community; how they should be incorporated in the design phase and how they can provide feedback on the technology. Our modules are designed to be interdisciplinary, and we hope to attract other students apart from those who have engineering backgrounds. The issues that arise from data mining affect a variety of different people in a variety of different ways. Including students from all sorts of academic backgrounds in the discussion will allow us to view these problems more comprehensively. We hope that by educating students on these problems and by framing data mining as a socio-technical system, they will take that knowledge and change the way data mining is used to alleviate some of the prevalent issues today. 

 

For more on data-mining history please click here.