Skip to main content
Back to top
Ctrl
+
K
Introduction
Part I: Exploring Data
1. What is Data Science?
2. Data Science Case Study
3. Programming in Python
3.1. Operations
3.2. Assignment Statements
3.3. Data Types
3.4. Comparisons
3.5. Functions
Built-In Functions and Methods
User-Defined Functions
4. Collections of Data
4.1. Lists
4.2. Dictionaries
4.3. Arrays
4.4. Assignment for Mutable Data Types
5. Randomness and Control Statements
5.1. Random Choice
5.2. Conditional Statements
5.3. Iteration and Simulation
5.4. While Loops
6. DataFrames
6.1. Creating a DataFrame
6.2. Accessing Columns
6.3. Column Operations
6.4. Accessing Rows
6.5. Selection by Label
6.6. Selection by Condition
7. DataFrame Methods and Operations
7.1. Applying Functions
7.2. Merging Data
7.3. Grouping Data
7.4. Pivot Tables
8. String Data and Fuzzy Matching
8.1. Set-Based (Jaccard) Similarity
8.2. Sequence-Based Similarlity
8.3. Canonicalization
8.4. Reduced Alphabet Similarity
8.5. Example: Building Inspection reports
8.6. Encoding and Unicode
9. Data Visualization
9.1. Introduction to Matplotlib
9.2. Numerical Data
9.3. Categorical Data
9.4. Other Visualization Techniques
10. Data Collection
10.1. Causality versus Association
10.2. Observation versus Experimental Studies
10.3. Sampling
10.4. Biases
11. Probability
11.1. Definitions and Rules
11.2. A Simulation-Based Solution
11.3. Mathematical Derivation vs Computational Estimation
11.4. The Birthday Problem: Relaxed Assumptions
12. Empirical and Probability Distributions
12.2. Distributions Overview
12.3. Uniform Distribution
12.4. Normal Distribution
12.5. Binomial Distribution
13. Hypothesis Testing
13.1. Evaluating Consistency Between Data and a Model
13.2. Hypothesis Testing
13.3. Two-Sample Testing
13.4. Categorical Data
13.5. Connections with Classical Statistical Methods
14. Estimation and Confidence Intervals
14.1. Theoretical Justification for Confidence Intervals
14.2. The Bootstrap
14.3. Percentile Bootstrap Confidence Intervals
15. Ethics and Pitfalls in Data Science
15.1. Data Ethics and the Law
15.2. Pillar 1: Data Transparency & Accountability
15.3. Pillar 2: Data Privacy
15.4. Pillar 3: Informed Consent
15.5. Pillar 4: Mitigating Unintended Consequences
16. Traffic Stops Case Study
16.1. Study Background
16.2. Investigating Traffic Stops
Part II: Using Data to Understand Our World
17. Prediction and Correlation
17.1. Prediction
17.2. Correlation
18. Multiple Linear Regression
19. Feature Engineering and Feature Selection
19.1. Feature Engineering
19.2. Feature Selection
20. Regularization and Cross Validation
21. Classification with Logistic Regression
22. Classification with Nearest Neighbors
22.1. Nearest Neighbor
22.2. K Nearest Neighbors
22.3. Features to Consider
22.4. Multiple Classes
23. Neural Networks and Deep Learning
24. Tree Models
25. Unsupervised Learning and Clustering
25.1. K-Means Clustering
25.2. K-Means Clustering Example
25.3. Hierarchical Clustering
25.4. Hierarchical Clustering Examples
26. Data Wrangling
27. Relational Databases and SQL
28. Reproducibility
29. Case Study
Glossary of Terms
Glossary of Code
.ipynb
.pdf
Cloud Computing
29.
Cloud Computing
#
Forthcoming…