R Econometrics
Purpose
This skill helps economists run rigorous econometric analyses in R, including Instrumental Variables (IV), Difference-in-Differences (DiD), and Regression Discontinuity Design (RDD). It generates publication-ready code with proper diagnostics and robust standard errors.
When to Use
- Running causal inference analyses
- Estimating treatment effects with panel data
- Creating publication-ready regression tables
- Implementing modern econometric methods (two-way fixed effects, event studies)
Instructions
Step 1: Understand the Research Design
Before generating code, ask the user:
- What is your identification strategy? (IV, DiD, RDD, or simple regression)
- What is the unit of observation? (individual, firm, country-year, etc.)
- What fixed effects do you need? (entity, time, two-way)
- How should standard errors be clustered?
Step 2: Generate Analysis Code
Based on the research design, generate R code that:
- Uses the
fixestpackage - Modern, fast, and feature-rich for panel data - Includes proper diagnostics:
- For IV: First-stage F-statistics, weak instrument tests
- For DiD: Parallel trends visualization, event study plots
- For RDD: Bandwidth selection, density tests
- Uses robust/clustered standard errors appropriate for the data structure
- Creates publication-ready output using
modelsummaryoretable
Step 3: Structure the Output
Always include:
# 1. Setup and packages
# 2. Data loading and preparation
# 3. Descriptive statistics
# 4. Main specification
# 5. Robustness checks
# 6. Visualization
# 7. Export results
Step 4: Add Documentation
Include comments explaining:
- Why each specification choice was made
- Interpretation of key coefficients
- Limitations and assumptions
Example Prompts
- “Run a DiD analysis with state and year fixed effects, clustering at the state level”
- “Estimate the effect of X on Y using Z as an instrument”
- “Create an event study plot showing treatment effects by year”
- “Run a sharp RDD with optimal bandwidth selection”
Example Output
# ============================================
# Difference-in-Differences Analysis
# ============================================
# Setup
library(tidyverse)
library(fixest)
library(modelsummary)
# Load data
df <- read_csv("data.csv")
# Prepare treatment variable
df <- df %>%
mutate(
post = year >= treatment_year,
treated = state %in% treatment_states,
treat_post = treated * post
)
# ----------------------------------------
# Main DiD Specification
# ----------------------------------------
# Two-way fixed effects
did_model <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state
)
# View results
summary(did_model)
# ----------------------------------------
# Event Study
# ----------------------------------------
# Create relative time variable
df <- df %>%
mutate(rel_time = year - treatment_year)
# Event study regression
event_study <- feols(
outcome ~ i(rel_time, treated, ref = -1) | state + year,
data = df,
cluster = ~state
)
# Plot coefficients
iplot(event_study,
main = "Event Study: Effect on Outcome",
xlab = "Years Relative to Treatment")
# ----------------------------------------
# Robustness: Alternative Specifications
# ----------------------------------------
# Different clustering
did_robust <- feols(
outcome ~ treat_post | state + year,
data = df,
cluster = ~state + year # Two-way clustering
)
# ----------------------------------------
# Export Results
# ----------------------------------------
modelsummary(
list("Main" = did_model, "Two-way Cluster" = did_robust),
stars = c('*' = 0.1, '**' = 0.05, '***' = 0.01),
output = "results/did_table.tex"
)
Requirements
Software
- R 4.0+
Packages
fixest- Fast fixed effects estimationmodelsummary- Publication-ready tablestidyverse- Data manipulationggplot2- Visualization
Install with:
install.packages(c("fixest", "modelsummary", "tidyverse"))
Best Practices
- Always cluster standard errors at the level of treatment assignment
- Run pre-trend tests for DiD designs
- Report first-stage F-statistics for IV (should be > 10)
- Use
feolsoverlmfor panel data (faster and more features) - Document all specification choices in your code comments
Common Pitfalls
- ❌ Not clustering standard errors at the right level
- ❌ Ignoring weak instruments in IV estimation
- ❌ Using TWFE with staggered treatment timing (use
didorsunab()instead) - ❌ Not reporting robustness checks
References
- fixest documentation
- Cunningham (2021) Causal Inference: The Mixtape
- Angrist & Pischke (2009) Mostly Harmless Econometrics
Changelog
v1.0.0
- Initial release with IV, DiD, RDD support