Phase 1Spec
Day-level. Cycle Mar 1 – Apr 30 (425 days). Treated = 2025. Control = 2023. Cluster genre. 365 day-of-year FE + 10 floating-holiday dummies. Pooled: dataset×day + dataset×cycle + dataset×genre FE.
CriticalPre-May-22 parallel-trends test (coworker concern)
Joint Wald F-test on event-study leads (bins 4-10, ref=bin 11 just before May 22). Linear slope test on the same leads. Rambachan-Roth M-sensitivity bounds.
Bin-by-bin pre-May-22 coefficients
Modern PT testRambachan-Roth M-sensitivity for Nov-24 effect
Honest CI: βNov24_inc ± [1.96·SE + M·max|pre-lead|]. Breakdown M* = (|β̂| − 1.96·SE) / max|pre|. Per top-econ convention, M* > 1 ⇒ result robust to post-trend violations as large as worst observed pre-trend.
Android: M* = 1.27 — barely passes M=1 threshold. CI starts including 0 at M=1.5. PT-fragile but survives standard RR test.
Pooled: M* = 2.53 — pooling halves max|pre| (averages out Android noise), substantially improves inference. CI excludes 0 at M=2.
Event-study (vs C2023)
95% CI shaded. Reference bin 11 (just before May 22). Orange line = May 22 cutoff (bin 12). Blue line = Nov 24 cutoff (bin 38).
Phase 5 · Step 13Code-sufficiency stratification (paper §5.3 mechanism)
Paper §5.3: code-sufficient genres (Utilities, Productivity, Games, etc.) should show LARGER β than code-insufficient (Social, Shopping, Medical).
Description-keyword (paper §5.3, AI-branded vs not)
Split cohort by AI-keyword regex in title/description. If shock were just AI-branded surge, non-AI would have small β.
Verdict summary
- iOS βNov24_inc = +38% (M* = 3.39, PT clean)
- Cross-platform entry effect replicates (iOS +38%, Android +27%, Pooled +30%)
- Code-sufficient mechanism (paper §5.3) replicated on both
- Description-keyword: broader entry, not AI-only
- Pooled spec improves inference (M*=2.53)
- Android strict Wald PT fails (p=0.0003); bin 7 (Apr 19) +14% suggests pre-May-22 ramp — likely Bass diffusion of earlier coding-agent shocks (Cursor/Claude Code GA Feb-Mar 2025)
- Android RR M*=1.27 barely passes M=1; CI includes 0 at M=1.5
- Pooled spec strict Wald p=0.026 marginal, but RR M*=2.53 strong
- Dropping iOS V1 (importance-sample) eliminates the marginal PT failure issue there