#
**Quantitative Social Science**
An Introduction

432 pages, Hardcover

Japanese

2018

9780691167039

Princeton University Press

I decided to write this book in order to convince the next generation of students and researchers that data analysis is a powerful tool for answering many important and interesting questions about societies and human behavior. Today's societies confront a number of challenging problems, including those in economics, politics, education, and public health. Data-driven approaches are useful for solving these problems, and we need more talented individuals to work in this area. I hope that this book will entice young students and researchers into the fast-growing field of quantitative social science.

This book grew out of the two undergraduate courses I have taught at Princeton over the last several years: POL 245: Visualizing Data and POL 345: Quantitative Analysis and Politics. While teaching these courses, I realized that students need to be exposed to exciting ideas from actual quantitative social science research as early in the course as possible. For this reason, unlike traditional introductory statistics textbooks, this book features data analysis from the very beginning, using examples directly taken from published social science research. The book provides readers with extensive data analysis experience before introducing probability and statistical theories. The idea is that by the time they reach those challenging chapters, readers will understand why those materials are necessary in order to conduct quantitative social science research.

The book starts with a discussion of causality in both experimental and observational studies using the examples of racial discrimination and get-out-the-vote campaigns. We then cover measurement and prediction as two other primary goals of data analysis in social science research. The book also includes a chapter on the analysis of textual, network, and spatial data, giving readers a glimpse of modern quantitative social science research. Probability and statistical theories are introduced after these data analysis chapters. The mathematical level of the book is kept to a minimum and neither calculus nor linear algebra is used. However, the book introduces probability and statistical theories in a conceptually rigorous manner so that readers can understand the underlying logic. (Taken from the preface of the book)

(Written by Kosuke Imai, Professor of Government and of Statistics, Harvard University / Professor of Graduate Schools for Law and Politics / 2018)

## Table of Contents

**1 Introduction**

1.1 Overview of the Book

1.2 How to Use this Book

1.3 Introduction to R

1.3.1 Arithmetic Operations

1.3.2 Objects

1.3.3 Vectors

1.3.4 Functions

1.3.5 Data Files

1.3.6 Saving Objects

1.3.7 Packages

1.3.8 Programming and Learning Tips

1.4 Summary

1.5 Exercises

1.5.1 Bias in Self-Reported Turnout

1.5.2 Understanding World Population Dynamics

**2 Causality**

2.1 Racial Discrimination in the Labor Market

2.2 Subsetting the Data in R

2.2.1 Logical Values and Operators

2.2.2 Relational Operators

2.2.3 Subsetting

2.2.4 Simple Conditional Statements

2.2.5 Factor Variables

2.3 Causal Effects and the Counterfactual

2.4 Randomized Controlled Trials

2.4.1 The Role of Randomization

2.4.2 Social Pressure and Voter Turnout

2.5 Observational Studies

2.5.1 Minimum Wage and Unemployment

2.5.2 Confounding Bias

2.5.3 Before-and-After and Difference-in-Differences Designs

2.6 Descriptive Statistics for a Single Variable

2.6.1 Quantiles

2.6.2 Standard Deviation

2.7 Summary

2.8 Exercises

2.8.1 Efficacy of Small Class Size in Early Education

2.8.2 Changing Minds on Gay Marriage

2.8.3 Success of Leader Assassination as a Natural Experiment

**3 Measurement**

3.1 Measuring Civilian Victimization during Wartime

3.2 Handling Missing Data in R

3.3 Visualizing the Univariate Distribution

3.3.1 Bar Plot

3.3.2 Histogram

3.3.3 Box Plot

3.3.4 Printing and Saving Graphs

3.4 Survey Sampling

3.4.1 The Role of Randomization

3.4.2 Nonresponse and Other Sources of Bias

3.5 Measuring Political Polarization

3.6 Summarizing Bivariate Relationships

3.6.1 Scatter Plot

3.6.2 Correlation

3.6.3 Quantile–Quantile Plot

3.7 Clustering

3.7.1 Matrix in R

3.7.2 List in R

3.7.3 The Means Algorithm

3.8 Summary

3.9 Exercises

3.9.1 Changing Minds on Gay Marriage: Revisited

3.9.2 Political Efficacy in China and Mexico

3.9.3 Voting in the United Nations General Assembly

**4 Prediction**

4.1 Predicting Election Outcomes

4.1.1 Loops in R

4.1.2 General Conditional Statements in R

4.1.3 Poll Predictions

4.2 Linear Regression

4.2.1 Facial Appearance and Election Outcomes

4.2.2 Correlation and Scatter Plots

4.2.3 Least Squares

4.2.4 Regression towards the Mean

4.2.5 Merging Data Sets in R

4.2.6 Model Fit

4.3 Regression and Causation

4.3.1 Randomized Experiments

4.3.2 Regression with Multiple Predictors

4.3.3 Heterogenous Treatment Effects

4.3.4 Regression Discontinuity Design

4.4 Summary

4.5 Exercises

4.5.1 Prediction Based on Betting Markets

4.5.2 Election and Conditional Cash Transfer

4.5.3 Government Transfer and Poverty Reduction in Brazil

**5 Discovery**

5.1 Textual Data

5.1.1 The Disputed Authorship of The Federalist Papers

5.1.2 Document-Term Matrix

5.1.3 Topic Discovery

5.1.4 Authorship Prediction

5.1.5 Cross Validation

5.2 Network Data

5.2.1 Marriage Network in Renaissance Florence

5.2.2 Undirected Graph and Centrality Measures

5.2.3 Twitter-Following Network

5.2.4 Directed Graph and Centrality

5.3 Spatial Data

5.3.1 The 1854 Cholera Outbreak in London

5.3.2 Spatial Data in R

5.3.3 Colors in R

5.3.4 US Presidential Elections

5.3.5 Expansion of Walmart

5.3.6 Animation in R

5.4 Summary

5.5 Exercises

5.5.1 Analyzing the Preambles of Constitutions

5.5.2 International Trade Network

5.5.3 Mapping US Presidential Election Results over Time

**6 Probability**

6.1 Probability

6.1.1 Frequentist versus Bayesian

6.1.2 Definition and Axioms

6.1.3 Permutations

6.1.4 Sampling with and without Replacement

6.1.5 Combinations

6.2 Conditional Probability

6.2.1 Conditional, Marginal, and Joint Probabilities

6.2.2 Independence

6.2.3 Bayes’ Rule

6.2.4 Predicting Race Using Surname and Residence Location

6.3 Random Variables and Probability Distributions

6.3.1 Random Variables

6.3.2 Bernoulli and Uniform Distributions

6.3.3 Binomial Distribution

6.3.4 Normal Distribution

6.3.5 Expectation and Variance

6.3.6 Predicting Election Outcomes with Uncertainty

6.4 Large Sample Theorems

6.4.1 The Law of Large Numbers

6.4.2 The Central Limit Theorem

6.5 Summary

6.6 Exercises

6.6.1 The Mathematics of Enigma

6.6.2 A Probability Model for Betting Market Election Prediction

6.6.3 Election Fraud in Russia

**7 Uncertainty**

7.1 Estimation

7.1.1 Unbiasedness and Consistency

7.1.2 Standard Error

7.1.3 Confidence Intervals

7.1.4 Margin of Error and Sample Size Calculation in Polls

7.1.5 Analysis of Randomized Controlled Trials

7.1.6 Analysis Based on Student’s

7.2 Hypothesis Testing

7.2.1 Tea-Tasting Experiment

7.2.2 The General Framework

7.2.3 One-Sample Tests

7.2.4 Two-Sample Tests

7.2.5 Pitfalls of Hypothesis Testing

7.2.6 Power Analysis

7.3 Linear Regression Model with Uncertainty

7.3.1 Linear Regression as a Generative Model

7.3.2 Unbiasedness of Estimated Coefficients

7.3.3 Standard Errors of Estimated Coefficients

7.3.4 Inference about Coefficients

7.3.5 Inference about Predictions

7.4 Summary

7.5 Exercises

7.5.1 Sex Ratio and the Price of Agricultural Crops in China

7.5.2 File Drawer and Publication Bias in Academic Research

7.5.3 The 1932 German Election in the Weimar Republic

**8 Next**

General Index

R Index