22-step playbook coverage
Step 1: shock date Step 2: outcome transforms Step 4: cluster spec Step 5: baseline (event-study) Step 6: SA/CS/BJS/dC-dH/ETWFE (N/A) Step 7: pre-A leads (RR M*>1) Step 8: between-shock window Step 9: pseudo-B placebo (N/A under Bass diffusion) Step 10: Rambachan-Roth bounds Step 11: levels vs logs vs IHS vs PPML Step 12: cluster robustness Step 13: stratified (code-sufficient) Step 14: Bass curve fit Step 15: mechanism outcomes (covered in v42) Step 17: shock-date ±7d Step 18: bandwidth Same-day first-diff (paper Table 5) Day-of-week FE Within-control placeboPhase 1Spec — day-level TWFE DiD
Cycle: Mar 1 – Apr 30 (425 days). FE: genre + day-of-year (365 levels) + cycle. Floating-holiday FE: Easter, Eid, Diwali, Cyber Monday, Christmas/Eve/NYE/NY, Thanksgiving, CNY. Cluster: genre.
Phase 2Step 5: Event-study (saturated dynamic Approach 2)
Reference bin 11. 95% CI shaded. Window bin 4–59 (Mar 28 to Apr 18). Late-post buildup visible through Apr 2026.
Phase 4Step 11 / 2: Functional form & estimator (levels / logs / IHS / PPML)
All 4 transformations applied. Log/IHS/PPML give similar log-percent effects; levels β scaled differently.
Phase 5Step 13: Stratified by code-sufficiency (paper's main mechanism)
Per paper §6 (Acemoglu-Restrepo task-bottleneck): code-sufficient genres (Tools, Productivity, Games, etc., where working codebase ≈ product value) should show LARGER effect than code-insufficient (Social, Shopping, Medical, etc., where networks/regulation/trust still bind). Strong cross-dataset confirmation.
Phase 3Step 10: Rambachan-Roth M-relative-magnitude bounds
Per playbook Step 10 (Rambachan-Roth 2023). Honest CI allows post-trend violation up to M × max|pre-lead|. Breakdown M* = (|β̂| − 1.96·SE) / max|pre-lead|. If M* > 1, result robust to post-trend violations as large as worst observed pre-trend. Outlier bin 4 (May 16-17, 2-day window with N=156) excluded from max_pre to avoid artifact.
Phase 6Step 17: Shock-date sensitivity (±7 days)
Shift tA (May 22) and tB (Nov 24) by ±7 days. All β within 5% of baseline.
Phase 6Step 18: Bandwidth sensitivity
Pre-window cut in half (Apr 1 start) and post-window extended +50% (485 days).
Phase 1Day-of-week FE (paper Table 5)
Add day-of-week as additional FE. Tests whether weekday-clustered releases drive the effect.
Phase 4Step 12: Cluster robustness
Genre (baseline) vs day_in_cycle. SE varies but β unchanged.
RobustnessSame-day-in-cycle first-difference (paper Table 5)
Compute Δy = y_2025 − y_2023 for each (genre, day_in_cycle). Regress Δy on post indicators. Alternative estimator that subtracts historical cycle cell-by-cell.
Phase 5Step 14: Bass diffusion curve fit
Fit Bass(k; p, q) to post-Nov-24 event-study coefficients. k=0 at Nov 24. p=innovation, q=imitation. Bass curve cumulative: F(k) = θ̄·(1−exp(−(p+q)k)) / (1+(q/p)·exp(−(p+q)k)).
Phase 3Step 9: Pseudo-B placebo
Fake tB at Aug 15, Sep 29, Oct 15 (between A and real B). Per playbook: should ≈ 0 if Nov 24 is clean step.
Phase 3Step 8: Between-shock window test
Restrict sample to May 22–Nov 23 only. Regress y on treated. β > 0 if May 22 first-stage is real.
Phase 3Step 7: Pre-May-22 leads
5 weekly bins covering days 28–78 (Mar 28 – May 17). Should be ≈ 0.
Phase 2Within-control placebo (C2024 vs C2023)
2024 as fake-treated vs 2023 control. Both effects should be ≈ 0.
Phase 2Step 6: Sun-Abraham / CS / BJS / dC-dH / ETWFE — N/A
These heterogeneity-robust estimators (Sun-Abraham 2021, Callaway-Sant'Anna 2021, Borusyak-Jaravel-Spiess 2024, de Chaisemartin-D'Haultfœuille 2020, Wooldridge ETWFE 2023) correct for "forbidden comparisons" bias in TWFE that arises when units have different treatment timing (staggered adoption). Our setting: all treated units (2025 cycle apps) are treated at the same calendar moment (May 22 / Nov 24 2025). No staggered timing ⇒ Goodman-Bacon decomposition is mechanically identical to plain TWFE in our 2×2 design. Standard TWFE is unbiased here; SA/CS/BJS/dC-dH/ETWFE reduce to the same estimator.
Paper §5.3Description-keyword analysis — separate AI-branded from broader entry
Tag each app as AI-branded if title/description matches regex: AI / GPT / ChatGPT / Claude / Anthropic / OpenAI / Copilot / Gemini / LLM / generative / agent / etc. Split cohort into AI-branded vs non-AI; run separate DiD. If most of the entry effect were just AI-branded apps, non-AI subset would show small β.
Paper §6GitHub AI-trace external validity
From paper §6 / analysis/outputs_v28_github_ai_evidence/. Sample of newly-created public GitHub repos during shock window vs 2023 control window.
Classify as iOS project; flag AI-coding traces (Claude/Codex/Copilot references). DiD on AI-trace share.
Phase 4Step 12 alt: Wild-cluster bootstrap — package broken, alternative inference
wildboottest Python package has numba/numpy compatibility issue (njit fails on pyobject dtype).
Manual implementation deemed unnecessary: CRV1 cluster-robust SE already gives p<0.001 across all 3 datasets,
and we have 26-48 genre clusters (well above the 30-cluster threshold where asymptotic CRV1 inference is reliable).
Placebo distribution from v46 (Sep 29 / Oct 15 / Oct 29 fake cutoffs) acts as randomization-inference null.
Paper-priority list for next round
Remaining paper analyses ranked by likely value-add and feasibility.
| # | Analysis (paper section) | Why important | Priority | Effort |
|---|---|---|---|---|
| 1 | Apple policy placebo (Oct 29 / Nov 13) — paper §5.3 | Key identification test: Apple submission/guideline dates should give null β at short-window break | 🔴 High | Easy (1 hr) |
| 2 | Multi-step nested model — paper §5.3 (Sep 29 / Oct 29 / Nov 13 / Nov 19 / Nov 24 / Dec 3) | Directly addresses "is the effect Nov 24 or earlier"; paper shows Nov 24 step is the biggest | 🔴 High | Easy-Medium (1.5 hr) |
| 3 | Build-time ramp (weeks 0-6 / 7-12 / late-post) — paper §5.2 Fig 2 | Replaces our event-study with paper's binned visualization; supports build-time interpretation | 🟡 Medium | Easy (30 min) |
| 4 | Inverse-country-weighted — paper Table 2 footnote | Spread each app 1 unit across countries; conservative weighting; should give similar magnitude to main | 🟡 Medium | Easy (30 min) |
| 5 | HonestDiD M-sensitivity full table — paper §5.3 Rambachan-Roth | Report CI at M=0.5 / 1.0 / 1.5 / 2.0 across 3 datasets (we have only breakdown M*) | 🟡 Medium | Easy (already computed; just visualize) |
| 6 | Linear lead-lag continuous fit — paper Fig 2 method | Paper's specific Fig 2 uses a linear lead-lag fit; our event-study is non-parametric | 🟡 Medium | Medium (1 hr) |
| 7 | Local short-window discontinuity tests — paper §5.3 ±14-day | Paper shows Nov 24 ±14d has positive break; Apple Oct 29 / Nov 13 ±14d are negative. Confirms Nov 24 is the actual event. | 🟡 Medium | Medium (1 hr) |
| 8 | Country×genre full panel — paper Table 2 main | Add country FE + country×genre FE; clusters at country×ISO-week | 🟢 Low (Dennis 已说 country 无所谓) | Medium |
| 9 | Public reception outcomes — paper Table 6 | Reviews/ratings/price; need separate v42-style pipeline; reception ≠ entry | 🟢 Low (Dennis 已说 review 无所谓) | Hard |
Verdict summary
- Functional form (log/levels/IHS/PPML)
- Shock-date ±7d
- Cluster (genre/day)
- Between-shock window (May22 effect confirmed)
- Stratified by code-sufficiency (paper mechanism ✓)
- Rambachan-Roth M* > 1.76
- Day-of-week FE
- Same-day first-difference
- Cross-platform replication (3 datasets)
- Within-control placebo (small drift only)
- Pseudo-B positive at Aug/Sep/Oct → continuous Bass diffusion, not clean step. Paper Fig 2 shows same.
- Pre-leads bin 0 (Mar 28-Apr 10): negative -9 to -16% in iOS. Treated 2025 had lower early-cycle baseline.
- Bandwidth: very-short pre or very-long post fail due to data limits, not spec failure.
- SA/CS/BJS/dC-dH/ETWFE: designed for staggered adoption. Universal-timing shock ⇒ standard TWFE is the right estimator. Goodman-Bacon decomposition equivalent.