How CounterFact evaluates a policy change
CounterFact compares a proposed policy with what happened under the current policy, estimates the outcome difference, and checks whether the logs support that estimate.
CounterFact answers two questions
What changes under the proposed policy?
How reliable is the estimate?
Estimated change
Estimate the target outcome under a proposed policy and compare it with the current policy.
Evidence verdict
Check whether there is enough data, whether estimates agree, and whether the outcome is ready to measure.
If outcomes take time to appear, CounterFact treats early results cautiously.
Reliable
Supports live-test planning
Directional
Useful for prioritization
Improve Logging
Capture missing data first
Outside Scope
Reframe the policy question
Estimated change and evidence strength are separate. A policy can look promising while the evidence remains weak.
Related articles
Practical notes on logging quality, available actions, and where offline evaluation can fail.
Why Offline Wins Keep Dying in A/B Tests
Why strong offline metrics often fail in live tests, and what to examine before trusting them.
. Opens in a new tab.Candidate Sets: The Invisible Boundary of Offline Evaluation
Why knowing which actions were available at decision time is one of the most dangerous failure modes in offline evaluation.
. Opens in a new tab.The Hidden Foundation of Offline Policy Evaluation
What propensities are, why they need to be logged at decision time, and what breaks when they are missing or wrong.
. Opens in a new tab.Beyond A/B Testing: How Off-Policy Evaluation Transforms Recommendation Systems
An earlier framing of CounterFact's roots in recommendation and ranking evaluation.
. Opens in a new tab.