Executive Summary
Dataset Scale
58,123 branch records
3,207 unique banks
393 MSAs
Methodology
Spatial econometric modeling
Haversine distance calculation
HHI concentration analysis
Key Tools
R (tidyverse, geosphere)
Statistical inference
Data visualization
Research Question
What determines a bank's geographic presence—its "lending radius"—in a given market? This project tests two competing hypotheses:
- Competition Hypothesis: Banks expand geographically in response to competitive intensity (HHI)
- Opportunity Hypothesis: Banks expand based on market size and deposit availability
Understanding these dynamics has significant implications for financial inclusion policy, competitive strategy, and addressing banking deserts in underserved communities.
Data Engineering & Feature Construction
Data Source & Cleaning
The analysis utilizes publicly available Federal Reserve branch-level deposit data, comprehensively cleaned and engineered for spatial analysis:
Initial Dataset: 75,000+ branch observations with 40+ variables
Final Analytical Dataset: 58,123 branches (77% retention) after outlier removal and geographic validation
Key Feature Engineering
1. Dependent Variable: Geographic Lending Radius
Calculated as the mean pairwise distance between all branches operated by a single bank within an MSA using the Haversine formula:
Where φ represents latitude, λ represents longitude, and r is Earth's radius (6,371 km)
2. Independent Variable: Market Concentration (HHI)
Herfindahl-Hirschman Index calculated at the MSA level:
Where si is the market share (by deposits) of bank i in the MSA
3. Control Variable: Market Size
Log-transformed total deposit volume per MSA to normalize right-skewed distribution:
Figure 2: Market Concentration (HHI) vs. Lending Radius - No systematic relationship (p = 0.502)
Figure 3: Market size drives geographic spread (R² = 0.68)
Statistical Modeling & Inference
Model Specification
Ordinary Least Squares (OLS) regression with robust standard errors:
Where i indexes banks and m indexes MSAs
Regression Results
| Variable | Coefficient | Std. Error | t-statistic | p-value |
|---|---|---|---|---|
| Intercept | -45.23 | 3.41 | -13.26 | < 0.001 *** |
| HHI (Market Concentration) | 0.0012 | 0.0018 | 0.67 | 0.502 |
| ln(Total Deposits) | 5.87 | 0.34 | 17.26 | < 0.001 *** |
Model R² = 0.683, Adjusted R² = 0.681, N = 3,207 bank-MSA combinations
*** indicates significance at p < 0.001
Key Findings & Interpretation
Finding 1: Competition Doesn't Drive Spatial Strategy
The HHI coefficient is statistically insignificant (p = 0.502) and economically negligible (β = 0.0012), indicating that banks do not systematically alter their geographic spread in response to market concentration.
Finding 2: Market Size Dominates Branch Dispersion
A one-unit increase in log(deposits)—approximately 2.7x increase in market size—corresponds to a 5.87-mile increase in average lending radius, significant at p < 0.001. This effect is both statistically and economically substantial.
Finding 3: High Explanatory Power
The model explains 68.3% of variance in lending radius using just two variables, with market size as the primary driver. This suggests spatial expansion follows economic opportunity rather than competitive positioning.
Model Validation & Robustness
Figure 4: HHI vs Lending Radius - Animated demonstration of null relationship (β = 0.0012, p = 0.502)
Figure 5: Hypothesis Testing Comparison - Competition rejected vs Market Size confirmed
Robustness Checks Performed
- Heteroskedasticity-robust standard errors (reported above)
- Outlier sensitivity analysis (Cook's distance threshold at 4/n)
- Alternative functional forms (quadratic, log-log specifications)
- Geographic fixed effects (regional dummy variables)
Strategic & Policy Implications
For Banking Strategy
Banks should prioritize branch expansion decisions based on market deposit volumes rather than competitive positioning. This suggests a first-mover advantage in high-deposit markets rather than defensive responses to competitor moves.
For Regulatory Policy
Findings challenge the effectiveness of competition-focused branch policies. Regulators seeking to improve banking access in underserved areas should focus on economic development and deposit growth rather than micro-managing competitive structure.
Ethical Considerations
⚠ Critical Caveat: The market-size finding could be misinterpreted to justify neglecting low-deposit communities, potentially exacerbating banking deserts. Policy must balance economic efficiency with spatial equity through targeted incentives for rural and low-income area expansion.
Technical Skills Demonstrated
Data Wrangling
- Large-scale dataset cleaning (75K+ observations)
- Missing data imputation strategies
- Geographic data validation
- Feature engineering from raw data
Statistical Analysis
- OLS regression with robust inference
- Hypothesis testing (t-tests, F-tests)
- Model diagnostics & validation
- Causal interpretation frameworks
Spatial Analysis
- Haversine distance calculations
- MSA-level aggregation
- Geographic clustering analysis
- Market delineation methods
Tools & Languages
- R (tidyverse, geosphere, lm)
- R Markdown for reproducible research
- Data visualization (ggplot2)
- Statistical computing
Academic Context & Citations
This analysis builds on recent spatial banking literature:
Begenau, A., Oberfield, E., Rossi-Hansberg, E., & Wenning, D. (2024). Banks in Space. National Bureau of Economic Research Working Paper 32262.
Bouakez, H., Côté, J., & D'Souza, C. (2020). A Spatial Model of Bank Branches in Canada. Bank of Canada Staff Working Paper 2020-4.