When Code Gets Cheap · iOS + Android · two-node DiD · 2026-05-16

iOS + Android entry-margin replication, with parallel-trends honesty

Two datasets: iOS (friend's uniform-sample, 203K apps) and Android (new Play Store scrape, 305K apps × 49 countries). iOS V1 (paper's importance-sampled) dropped — V2 has cleaner pre-trends and more apps. Coworker raised pre-May-22 PT concern: iOS passes cleanly, Android fails strict Wald but survives Rambachan-Roth M=1. Pooled spec gives M* = 2.53.

iOS βNov24_inc +38.0% M* = 3.39 · Joint Wald p = 0.76 PT clean ✓
Android βNov24_inc +26.5% M* = 1.27 · Joint Wald p = 0.0003 PT fails strict, RR M=1 holds
Pooled βNov24_inc +30.4% M* = 2.53 · Joint Wald p = 0.026 Pooling improves inference

Phase 1Spec

log_appsgdc = αg + γd + δc + βM22·(treated×post_May22) + βNov24·(treated×post_Nov24) + Σh βh·holidaydc + ε
Day-level. Cycle Mar 1 – Apr 30 (425 days). Treated = 2025. Control = 2023. Cluster genre. 365 day-of-year FE + 10 floating-holiday dummies. Pooled: dataset×day + dataset×cycle + dataset×genre FE.

CriticalPre-May-22 parallel-trends test (coworker concern)

Joint Wald F-test on event-study leads (bins 4-10, ref=bin 11 just before May 22). Linear slope test on the same leads. Rambachan-Roth M-sensitivity bounds.

Dataset Joint Wald Fp (joint χ²) Linear slope/wkslope p Sig bins max|pre| PT verdict

Bin-by-bin pre-May-22 coefficients

Dataset bin 4
Mar 29
bin 5
Apr 5
bin 6
Apr 12
bin 7
Apr 19
bin 8
Apr 26
bin 9
May 3
bin 10
May 10
bin 11
(ref)

Modern PT testRambachan-Roth M-sensitivity for Nov-24 effect

Honest CI: βNov24_inc ± [1.96·SE + M·max|pre-lead|]. Breakdown M* = (|β̂| − 1.96·SE) / max|pre|. Per top-econ convention, M* > 1 ⇒ result robust to post-trend violations as large as worst observed pre-trend.

Dataset βNov24_inc max|pre| M* (breakdown) M=0.5 CI M=1.0 CI M=1.5 CI M=2.0 CI RR verdict
iOS: M* = 3.39 — very robust. CI excludes 0 even at M=2.
Android: M* = 1.27 — barely passes M=1 threshold. CI starts including 0 at M=1.5. PT-fragile but survives standard RR test.
Pooled: M* = 2.53 — pooling halves max|pre| (averages out Android noise), substantially improves inference. CI excludes 0 at M=2.

Event-study (vs C2023)

95% CI shaded. Reference bin 11 (just before May 22). Orange line = May 22 cutoff (bin 12). Blue line = Nov 24 cutoff (bin 38).

Phase 5 · Step 13Code-sufficiency stratification (paper §5.3 mechanism)

Paper §5.3: code-sufficient genres (Utilities, Productivity, Games, etc.) should show LARGER β than code-insufficient (Social, Shopping, Medical).

DatasetStratum βM22M22 % βNov24_incNov24 % p
Cross-platform mechanism confirmed. iOS: CS +41.7% vs non-CS +28.9% (diff +12.8pp). Android: CS +32.3% vs non-CS +12.4% (diff +19.9pp).

Description-keyword (paper §5.3, AI-branded vs not)

Split cohort by AI-keyword regex in title/description. If shock were just AI-branded surge, non-AI would have small β.

DatasetStratum βM22M22 % βNov24_incNov24 %
Broader entry, not just AI-branded. Non-AI subset shows LARGER β than AI-branded in both datasets — shock affects generic entry pipeline, not just AI-themed products.

Verdict summary

✓ Passes
  • iOS βNov24_inc = +38% (M* = 3.39, PT clean)
  • Cross-platform entry effect replicates (iOS +38%, Android +27%, Pooled +30%)
  • Code-sufficient mechanism (paper §5.3) replicated on both
  • Description-keyword: broader entry, not AI-only
  • Pooled spec improves inference (M*=2.53)
⚠ Honest caveats
  • Android strict Wald PT fails (p=0.0003); bin 7 (Apr 19) +14% suggests pre-May-22 ramp — likely Bass diffusion of earlier coding-agent shocks (Cursor/Claude Code GA Feb-Mar 2025)
  • Android RR M*=1.27 barely passes M=1; CI includes 0 at M=1.5
  • Pooled spec strict Wald p=0.026 marginal, but RR M*=2.53 strong
  • Dropping iOS V1 (importance-sample) eliminates the marginal PT failure issue there