When attempting to make a curriculum that allows students to realize and understand the importance of data mining ethics, it is helpful to look at current educational systems in place especially those surrounding computer science majors. After speaking with two professors in the computer science department at Lafayette College, we were able to better understand how current courses incorporated ethics into their curriculums. Professor Chun Wai Liew, an Associate Professor and Department Head of Computer Science at Lafayette College shared his views with us about data mining ethics. He stated that a problem within the computer science division was the neutral approach to machines or data. The data and machines have biases and need to be treated as such in order to proactively fight against that bias. A course at Lafayette college, CS 200 – Computers and Society, “examines the computer’s cultural context: the managerial, political, legal, ethical, psychological, and philosophical implications of computing.” Professor Liew emphasizes the importance of allowing the students to see the tangible effects of bias in data by stating that “class can be less effective when focusing on the abstract, students need to encounter it.” (Prof. Chun Wai Liew). In another interview with Professor Christian Lopez, an Assistant Professor of Computer Science at Lafayette, he mentioned a potential activity to allow students to witness certain biases in data. Allowing students to have access to data and running algorithms that show disproportionate effects on some students rather than others, clearly shows the reality of biased data. Along with creating a case study in the classroom, professors have also recommended that documentaries provide powerful messages for people that don’t fully understand the effect of bias or privacy issues that come from data driven software. When discussing the feasibility of educational understanding for undergraduates, Professor Liew indicated that a clear understanding of the ethical issues was necessary and the limit of what should be extended to undergraduate students. We discussed that if classes were to focus on the complicated solutions for the current issues surrounding data mining ethics, we would lose sight of the goal, which should be to create awareness.
To summarize both interviews, some key takeaways from our discussions with Professor Liew and Professor Lopez include the following. One, the approach to data cannot be neutral. This means you cannot assume that data is naturally unbiased because all decisions surrounding which data to use and exclude, are all human decisions. Two, students must be able to see data bias in practice versus only discussing the abstract. This teaches them what questions to ask along with how to ask them. Lastly, the goal for a data mining ethics course should be to create awareness of these issues so that after graduation, students are able to tackle the different issues surrounding ethical issues in data mining. For the course to be feasible, both professors also mentioned the importance of making the course attractive so that students will take it, and take away the necessary information which they can apply outside of academia.
Figure 16: Image promoting education in ethics (Ethical Education, 2020)
Other colleges with computer science and data science majors sometimes provide an ethics course involving law and data mining or at least a module consisting of some form of engineering ethics. Reviewing courses at different colleges allows an understanding of the importance of ethics in computer science related subjects. Data mining ethics is not a new topic to undergraduate or graduate schools however, with the ever-evolving industries that use data in different ways, courses must evolve with the technology and the ethical understanding of how that data is used. Learning how different schools incorporate and do not incorporate these topics into their curriculum will help clarify the need for adaptation and enable students to be better prepared for what is to face in the future. The schools we decided to look at are, MIT, Harvard University, University of Pennsylvania, Stanford University, Tufts University, Boston University, and University of Chicago.
Looking at MIT’s subtopic of computer science, it lists specialties in algorithms and data structures, artificial intelligence, computer design and engineering, computer networks, cryptography, data mining etc. Courses vary under each specialty; however, there are very few that focus strictly on ethics. One course titled Ethics and the Law on the Electronic Frontier covers topics including policy making and the structure of law, regulating the decentralized internet, regulating government use of surveillance technology, and the transparency challenge which examines issues between the consumer’s rights and the data they unknowingly provide (MIT Courseware). While this was the only computer science course that specifically focused on ethical issues, there were other courses that had modules or briefer discussions based on ethics. For a college as prestigious as MIT, there is a startling lack of focus on the ethical issues stemming from technology and data mining in particular.
Harvard University is also renowned for their computer science program. The course titled Privacy and Technology within their computer science program aims to answer “What is privacy, and how is it affected by recent developments in technology?” (Harvard. Edu, Course Listings). This course covers a wide range of topics including privacy, policy and ethics by examining the following case studies; “database anonymity, research ethics, wiretapping, surveillance, and others.” Similar to MIT, Harvard University has a very small collection of courses where the primary focus is ethics in data and computer science but does incorporate some of these ideas into other classes. For example, the course titled “Great Ideas in Computer Science” is described as “an introduction to the most important discoveries and intellectual paradigms in computer science.” (Harvard. Edu, Course Listings). While this class is meant for people with little or no experience in data science, it explores some of the historical ethical dilemmas presented by data. As data science and computer science are both very technical, it makes sense that the courses for each study are technically heavy. The course “Great Ideas in Computer Science” is not meant to be strictly about the practices; however, it also focuses on the humanities and societal consequences of the data and software. While approaching the numerous issues surrounding data mining ethics, it is necessary to combine the technical with the sociological in order to effectively approach an equitable solution.
University of Pennsylvania has a course called “Science of Data Ethics.” While this course does focus on some of the social issues surrounding data mining, the class is still highly technical and requires experience in statistics and probability, although comparatively, it does not require as much technical experience as some of the other courses reviewed. This course covers a wide range of topics while introducing different experimental modules that the students complete in order to show ethical issues within data mining first hand. The course focuses mainly on privacy issues and analyzes the mechanisms that are currently in place that support privacy rights. This course also looks at different ethical theories and applies that to the modern world. While most of the information from this class is based on scientific literature, the class also utilizes some “mainstream media and other articles” (UPenn. Edu, Science of Data Ethics). We believe this is highly beneficial because as this topic is rapidly evolving and becoming a larger part of our lives, it is important to realize and understand the real-life applications of data mining and who it is mostly affecting. This course seems to cover important material while providing real life examples from media and in class data experimentation to allow students to see tangible effects of ethical issues surrounding data mining.
One other course we reviewed is titled “Data Privacy and Ethics” which is taught at Stanford University. This course seemed to directly align with material we researched and is described by the course description as follows; “This course engages with difficult ethical challenges in the modern practice of data science. The three main focuses are data privacy, personalization and targeting algorithms, and online experimentation. The focus on privacy will raise both practical and theoretical considerations. As part of the module on experimentation, students will be required to complete the Stanford IRB training for social behavioral research. The course will assume a strong familiarity with the practice of machine learning and data science.” (Stanford. Edu, Data Privacy and Ethics) From the course description, as well as the other content covered in the course description, the “Data Privacy and Ethics” course seems to deeply analyze a wide range of ethical issues such as privacy, policing, policy, and theoretical topics surrounding data mining. This course uses articles, scientific research, and many other sources to inform students of ethical issues surrounding data mining. This course also requires experience in computer science and coding which can prohibit certain students from taking the course however because of the in-depth analysis the students are required to perform on certain data, it seems that this course is better suited for people solely in the specified field.
Tufts University provides a multitude of courses that apply to data mining ethics. One course in particular is called “Social Context Comp: Exploration of CS Ethics” and is described as follows; “Computing permeates our lives and environment, raising issues of fairness, safety, security, and privacy, among others, in a variety of contexts, including social media, connected vehicles, and socio-economic stratification. It is increasingly difficult for computing professionals to avoid these kinds of issues. This course aims to equip practitioners with background knowledge (including some relevant history) and conceptual tools (including ethical frameworks and ways of thinking about risk) for thinking constructively-both as computing professionals and as members of society-about challenging ethical and policy issues in which information technology plays a key role. As part of this process, we will apply this thinking to a number of relevant historical and contemporary case studies. Upon completing this course, students will be in an improved position to arrive at defensible ethical analyses and conclusions. This course assumes a basic knowledge of computer science, software engineering, and/or information systems, such as one might obtain from an introductory or survey course or from practical experience. An interest in current events related to these is also helpful.” (Tufts. Edu, CS Undergraduate Course Descriptions). This course also seems to directly align with the diversity of ethical issues caused by data mining. A benefit to this course is that it requires minimal coding experience allowing students from other disciplines to take the course and be exposed to some of the issues brought forward by the class. The course description also emphasizes how different disciplines fall into the category of data mining which provides a well-rounded experience for the student. Boston University has a highly-respected Data Science major and has a course titled “Data Ethics: Analytics in Social Context” which directly relates to the data mining ethics and the issues brought up previously in this paper. This course focuses mainly on privacy issues and the “blurry line between the private and the public spheres in the digital age.” (Boston University. Edu, Courses). Similarly, to other courses reviewed, a goal of this course it to fit analytics in a social context, as the title of the course suggests, however with a narrowed focus on privacy issues, it allows students to develop a deeper understanding of that specific content rather than focusing on different contexts such as economic and political issues that result from data mining. This course is offered in the College of Engineering at Boston College therefore is not available to students of different disciplines. Boston University’s size and segmented studies creates a student with a specialized interest rather than a liberal arts education that provides students with a well-rounded understanding of material. This can be arguably advantageous or disadvantageous to the student however using the course “Data Ethics: Analytics in Social Context” as an example, it would plausibly be better if different disciplines were to have access to a course that discusses societal contexts rather than only Data Science majors.
As seen, some schools dedicate an entire course focused on data mining ethics and related topics or even multiple courses while others make data mining ethics a module within a larger course. The University of Chicago uniquely offers a stand alone three-hour course titled “Introduction to Ethics in Data Analytics.” “[Because it is] a stand-alone offering, the course has no formal syllabus” however the course description states, “the goal of the course is to offer students a workable, introductory understanding of current ethical challenges they will face in their careers as data science professionals.” This course covers topics including, “privacy; data, discrimination, and disparate impact; and algorithmic bias.” (Graham School University of Chicago. Edu, Introduction to Ethics in Data Analytics). This course is interesting because it delves into the different contexts surrounding data mining ethics with less focus on the technical coding ability of a student. Although introducing the relevant topics to data mining ethics and allowing students to begin an ethical conversation about those issues, the class is only three hours long and can not accurately display the importance and detail of these issues as full semester courses or courses would. This course, as well as others we have reviewed do not directly mention an analysis of specific software that exists today that require discussion of how they can be changed to better society instead of creating societal discrepancies. For example, the software PredPol earlier discussed as showing disproportionate results toward low income and ethnic populations is still in use today. “Introduction to Ethics in Data Analytics” has been one of the few courses that focuses on data biases however an in-depth understanding of these technologies is necessary in order to allow future generations to make productive changes.
Lafayette College, MIT, and Harvard University incorporate ethics into some of their course curriculums, there are some ethical topics, as mentioned, missing. In an interview on Digital Surveillance and People Power, Virginia Eubanks states, “we have to rebuild engineering education to give students the tools to even ask the questions we’re talking about. Because just giving them a week of ethics or that one history requirement isn’t enough.” (Eubanks, Digital Surveillance and People Power). Some of the courses discussed achieve the analysis and introduction to real life topics surrounding data mining ethics that need reform; however, each could be made better or expanded in some way so that students can more effectively pursue a solution to those issues. In the course syllabus in the seminar “Engineering and Society” at Lafayette College, it states that the objectives of the course are to “examine the ways cultural values shape technologies, social foundations define the role of engineers, and engineers influence the broader world in efforts to achieve progress.” (B.R. Cohen). Analyzing the cultural influences and effects of technology is crucial in achieving progress especially within data mining topics. The focus of different programs, especially those directing to a career in creating algorithms, should share some of the objectives of Engineering and Society while including the technical skills to apply to the solutions for social, economic, and political issues. While attempting to reform the educational issues it’s important to remember the feasibility of creating a course that can effectively educate students about data mining ethics. As Lafayette College’s Professor Liew stated in his interview, the goal should be to educate, inform, and create awareness surrounding these issues however, as seen in different course curriculums, an analysis of potential solutions can also help lead to a more equitable future.