How CounterFact evaluates a policy change

CounterFact compares a proposed policy with what happened under the current policy, estimates the outcome difference, and checks whether the logs support that estimate.

CounterFact answers two questions

What changes under the proposed policy?

How reliable is the estimate?

Estimated change

Estimate the target outcome under a proposed policy and compare it with the current policy.

Current policyWhat happened in the logs

Proposed policyThe rule or action strategy to test

Outcome differenceEstimated change vs current policy

Evidence verdict

Check whether there is enough data, whether estimates agree, and whether the outcome is ready to measure.

Data coverageEnough examples to compare

Method agreementDo estimates point the same way?

Stress checksDoes the result hold up?

If outcomes take time to appear, CounterFact treats early results cautiously.

Reliable

Supports live-test planning

Directional

Useful for prioritization

Improve Logging

Capture missing data first

Outside Scope

Reframe the policy question

Estimated change and evidence strength are separate. A policy can look promising while the evidence remains weak.

Practical notes on logging quality, available actions, and where offline evaluation can fail.

Why Offline Wins Keep Dying in A/B Tests

Why strong offline metrics often fail in live tests, and what to examine before trusting them.

. Opens in a new tab.

Candidate Sets: The Invisible Boundary of Offline Evaluation

Why knowing which actions were available at decision time is one of the most dangerous failure modes in offline evaluation.

. Opens in a new tab.

The Hidden Foundation of Offline Policy Evaluation

What propensities are, why they need to be logged at decision time, and what breaks when they are missing or wrong.

. Opens in a new tab.

Beyond A/B Testing: How Off-Policy Evaluation Transforms Recommendation Systems

An earlier framing of CounterFact's roots in recommendation and ranking evaluation.

. Opens in a new tab.

External resources

Tutorial: Counterfactual Learning and Evaluation for Recommender Systems

ACM RecSys 2021 tutorial on YouTube covering foundations, implementations, and recent advances in counterfactual learning and offline evaluation for recommender systems.

. Opens in a new tab.

How CounterFact evaluates a policy change

Estimated change

Evidence verdict

Reliable

Directional

Improve Logging

Outside Scope

Related articles

Why Offline Wins Keep Dying in A/B Tests

Candidate Sets: The Invisible Boundary of Offline Evaluation

The Hidden Foundation of Offline Policy Evaluation

Beyond A/B Testing: How Off-Policy Evaluation Transforms Recommendation Systems

External resources

Tutorial: Counterfactual Learning and Evaluation for Recommender Systems