Skip to main content
Back to top
Ctrl
+
K
The Chicago Guide to Data Science
Part I: Exploring Data
1. What is Data Science?
2. Data Science Case Study
3. Programming in Python
3.1. Working with Numeric Data
3.2. Booleans
3.3. Assignment Statements
3.4. Strings
3.5. Functions
Built-In Functions and Methods
User-Defined Functions
4. Collections of Data
4.1. Lists
4.2. Tuples
4.3. Dictionaries
4.4. Sets
4.5. Arrays
5. Randomness and Control Statements
5.1. Random Choice
5.2. Conditional Statements
5.3. Iteration and Simulation
5.4. While Loops
6. DataFrames
6.1. Creating a DataFrame
6.2. Accessing Columns
6.3. Column Operations
6.4. Accessing Rows
6.5. Selection by Label
6.6. Selection by Condition
7. DataFrame Methods and Operations
7.1. Applying Functions
7.2. Merging Data
7.3. Grouping Data
7.4. Pivot Tables
8. Data Visualization
8.1. Introduction to Matplotlib
8.2. Numerical Data
8.3. Categorical Data
8.4. Other Visualization Techniques
9. Data Collection
9.1. Causality versus Association
9.2. Observation versus Experimental Studies
9.3. Sampling
9.4. Biases
10. Probability
10.1. Definitions and Rules
10.2. A Simulation-Based Solution
10.3. Mathematical Derivation vs Computational Estimation
10.4. The Birthday Problem: Relaxed Assumptions
11. Empirical and Probability Distributions
11.1. Distributions Overview
11.2. Uniform Distribution
11.3. Normal Distribution
11.4. Binomial Distribution
12. Hypothesis Testing
12.1. Evaluating Consistency Between Data and a Model
12.2. Hypothesis Testing
12.3. Two-Sample Testing
12.4. Categorical Data
12.5. Connections with Classical Statistical Methods
13. Estimation and Confidence Intervals
13.1. Theoretical Justification for Confidence Intervals
13.2. The Bootstrap
13.3. Percentile Bootstrap Confidence Intervals
14. Ethics and Pitfalls in Data Science
14.1. Data Ethics and the Law
14.2. Pillar 1: Data Transparency & Accountability
14.3. Pillar 2: Data Privacy
14.4. Pillar 3: Informed Consent
14.5. Pillar 4: Mitigating Unintended Consequences
15. Traffic Stops Case Study
15.1. Study Background
15.2. Investigating Traffic Stops
Part II: Using Data to Understand Our World
16. Prediction and The Data Science Pipeline
16.1. Prediction
16.2. Correlation
17. Simple Linear Regression
17.1. Correlation vs. Regression
17.3. Regression and Objective Functions
17.6. Hypothesis Testing and Confidence Intervals
17.15. Linear Regression and Prediction
18. Multiple Linear Regression
19. Introduction to Machine Learning
19.1. Types of Machine Learning
20. Feature Engineering and Feature Selection
20.1. Feature Engineering
20.2. Feature Selection
20.3. Feature Engineering and Selection in Practice
21. Regularization and Cross Validation
21.1. Cross-Validation and Regularization
22. Classification with Logistic Regression
23. Classification with Nearest Neighbors
23.1. Nearest Neighbor
23.2. K Nearest Neighbors
23.3. Features to Consider
23.4. Multiple Classes
24. Neural Networks and Deep Learning
24.1. The Inspiration
24.2. Artificial Neurons
24.3. Neurons to Neural Networks
24.4. Learning in Neural Networks
25. Tree Models
26. Unsupervised Learning and Clustering
26.1. K-Means Clustering
26.2. K-Means Clustering Example
26.3. Hierarchical Clustering
26.4. Hierarchical Clustering Examples
27. Data Wrangling
28. Relational Databases and SQL
29. String Data and Fuzzy Matching
29.1. Set-Based (Jaccard) Similarity
29.2. Sequence-Based Similarlity
29.3. Canonicalization
29.4. Reduced Alphabet Similarity
29.5. Example: Building Inspection reports
29.6. Encoding and Unicode
30. Reproducibility
31. Case Study
Glossary of Terms
Glossary of Code
.ipynb
.pdf
Scalable Data Processing
30.
Scalable Data Processing
#
Forthcoming…