Instructor:          Prof. Christian Lopez, 569 Rockwell Integrated Science Center, lopezbec@lafayette.edu

Class:                     (SEE MOODLE/BANNER)

Office Hours:     (SEE MOODLE)

Prerequisite:      An introductory statistics course AND an introductory computing course (SEE BANNER)

________________________________________________________________________________________________________

Course Description

This is a survey course that will introduce the principles of data science. Specifically, it will cover how to (i) collect and manage large sets of data, (ii) summarize and visualize data to convey information in a meaningful way, and (iii) implement basic machine learning models; all this while doing so in a thoughtful manner taking ethics and privacy into consideration.

We will be using example datasets from multiple domains and problems to study the effectiveness of different techniques and approaches.  This course can be viewed as a fusion between a computing course focused on programming and algorithms, and a statistics course focused on estimation and inference. While in this course we will be using multiple programming languages (mainly Python3 and R), this is not a programming class. Hence, students are not required to be “experts” in any language, instead, they are expected to have a basic understanding of programming constructs (e.g., functions, loops, variables).

Student Learning Outcomes

Upon completion of this course, students will be able to:

  • Acquire data through web-scraping and data APIs.
  • Extract, transform and load messy and unstructured datasets.
  • Use SQL to create tables and insert, modify, retrieve, and delete data.
  • Use a database composed of at least 2 tables.
  • Perform Exploratory Data Analysis using multiple techniques and tools.
  • Create effective visualizations.
  • Understand the advantages and disadvantages of different visualization approaches.
  • Apply machine learning methods and assess the quality of model output (predictions).
  • Develop scripts to build pipelines.
  • Effectively communicate results.
  • Work effectively and synergically in teams on data science projects.
  • Discuss the ethical and privacy considerations in the decision to collect, store, use, and/or display a piece (or sub pieces) of data.

**Syllabus Example