How CounterFact evaluates a decision policy

CounterFact uses logged decision data to evaluate a candidate policy before rollout and shows whether the evidence is strong enough to rely on.

Logged decision data

  • Decisions
  • Actions
  • Context
  • Outcomes
  • Outcome horizon

Candidate policy

The proposed action rule or assignment.

CounterFact evaluation

Checks whether the logged decisions can support a pre-rollout read of the candidate policy.

Outcome read

What appears to change under the candidate policy.

Evidence verdict

Whether the logged evidence supports that read.

Next step

What to do with the result.

The result page also shows Why this verdict, Evaluation summary, and Estimate comparison as supporting detail.

What the result means

CounterFact separates the result into an Outcome read, which says what appears to change under the candidate policy, and an Evidence verdict, which says how strongly the logged data supports trusting that read. A promising read with weak evidence is not enough to act on, while a no-clear-change read with strong evidence can still be useful.

What CounterFact checks before trusting the read

Policy coverage
Does the logged behavior cover the candidate policy?
Data readiness
Are decisions, actions, context, and outcomes usable?
Estimator agreement
Do multiple estimators point to the same read?
Precision check
Is the interval clear enough to interpret?
Robustness
Does the result hold under stress and sensitivity checks?
Outcome maturity
Are outcomes measured over the right horizon?

What the verdicts mean

  • Reliable

    Strong offline read; still validate before rollout.

  • Directional

    Useful for prioritization, not deployment proof.

  • Limited Evidence

    Logging or data gaps block a trustworthy read.

  • Outside Scope

    The setup does not fit this evaluation approach.

A no-clear-change Outcome read can still be useful when the evidence supports that the candidate policy is unlikely to move the outcome much.

Limited Evidence and Outside Scope are honest verdicts, not failures.

What CounterFact does not claim

CounterFact does not guarantee production impact or replace rollout monitoring. Not every dataset is suitable. Sometimes the right next step is better logging or instrumentation.

See the result shape with demo data.

Related articles

Practical notes on logging quality, available actions, and where offline evaluation can fail.

External resources