top of page

Episode 06

How to Collect Impartial Trading Data (Manual Backtesting Done Right)
Key Takeaways
  • Most retail backtesting fails because sample selection and interpretation contaminate results before probabilities are even calculated.

  • If scenarios, outcomes, and variables are already defined, the only remaining threat is emotional data collection.

  • Backtesting must reflect the environment you actually trade; mixing sessions mixes behaviour and distorts probability.

  • Manual backtesting builds structural discipline and conviction before scale is introduced.

  • Data collection is not interpretation — classification comes first, analysis comes later.

  • Outliers must be excluded using predefined rules, not removed after reviewing results.

  • Impartial repetition creates stable datasets; stability is required before probability has meaning.

The Problem This Episode Solves

 

Most traders do not fail because they avoid backtesting.

 

They fail because their backtesting produces false confidence.

 

Retail backtesting typically breaks down for one of four reasons:

 

  1. Scenarios are undefined.

  2. Outcomes are not binary.

  3. Variables are undefined.

  4. Sample selection shifts emotionally.

 

The first three distort structure.

The fourth distorts reality.

 

If scenarios are loosely defined, you end up measuring multiple different behaviors under one label. Instead of isolating the behavior of a specific scenario, you measure the average of many unrelated conditions without realizing it.

 

If outcomes are vague — “it kind of worked” or “it moved a bit” — probabilities become subjective. Clean statistical comparison becomes impossible.

 

If variables are not defined, you cannot isolate what actually caused different outcomes. Without that clarity, strategy improvement becomes guesswork.

 

And finally, if you select backtesting windows or examples based on recent performance or emotional bias, your dataset stops reflecting market behavior and starts reflecting your psychology.

 

If you’ve followed the series correctly, the first three threats have already been neutralized. Anchor events are defined. Outcomes are binary. Variables are structured.

 

The only remaining threat is emotional sample selection.

 

This episode addresses that directly.

​

Where We Are in the Pipeline

 

Within the quant-inspired framework, this is the first validation stage.

 

The hypothesis has been constructed.

Variables have been defined.

Now we test whether the hypothesis holds significance.

 

Within the series progression:

 

  • Anchor event selected

  • Features defined

  • Market regimes structured

  • Binary hypothesis built

  • Variables formalized

 

Now we move into impartial validation.

 

The objective is simple:

 

Collect data without interpretation.

​

Objectives of This Stage

​

The purpose of this episode is to establish disciplined manual backtesting.

 

Specifically:

 

  • Select unbiased backtesting windows

  • Apply fixed scenario and variable definitions

  • Collect probability-focused data correctly

  • Avoid common retail backtesting bias

  • Exclude outliers using rules, not discretion

 

This is not optimization.

This is not refinement.

This is clean measurement.

 

Manual vs Automated Backtesting

 

There are two broad approaches:

 

Manual backtesting applies predefined rules by hand, one example at a time.

 

Automated backtesting codes those rules so a computer applies them to historical data.

 

At the beginning of strategy development, manual backtesting is superior.

 

It forces you to:

 

  • Recognize anchor events precisely

  • Identify variables consistently

  • Confirm regime correctly

  • Build conviction in your structural logic

​

Automation increases sample size.

Manual work increases structural understanding.

 

At this stage, structural clarity matters more than scale.

 

Selecting an Unbiased Backtesting Window

 

One of the most overlooked factors in retail backtesting is session selection.

 

Markets behave differently across sessions.

Liquidity changes.

Participants change.

Volatility profiles change.

 

A scenario in Asian session is not the same environment as the same scenario in London or New York.

 

If you combine sessions indiscriminately, you mix behavioral environments. That corrupts probability.

 

Backtesting must reflect how you actually trade.

 

The process is straightforward:

 

  1. Align with real availability.

  2. Choose liquid, repeatable conditions.

  3. Maintain consistency across datasets.

 

For example, my backtesting window runs from one hour before London open to one hour before New York open. This aligns with availability and captures the most liquid and repeatable market behavior.

 

Once defined, that window does not change.

 

Consistency prevents environmental mixing.

 

The Manual Backtesting Procedure

​

Once the window is locked, the procedure is mechanical.

 

First, confirm the scenario before opening charts. Anchor, features, variables, and regime definitions must already exist.

 

Second, operate strictly within the defined time window.

 

Third, identify the complete scenario:

 

  • Anchor event

  • Market regime

  • Features/Variables

​

Only after the full scenario is defined do you assess outcome.

 

Fourth, record — do not interpret.

 

You do not analyze probabilities yet.

You do not speculate about patterns.

You do not adjust rules mid-process.

 

You simply classify:

 

Valid or invalid.

 

That is all.

 

Interpretation happens later.

​

Handling Outliers Impartially

​

Outliers must be defined before reviewing results.

 

Removing examples after seeing performance is curve fitting.

 

In my model, outliers are predefined as:

 

  • Forecasted high-impact news events

  • Abnormal volatility spikes

  • US or UK bank holidays

​

These exclusions are structural, not emotional.

 

They are defined in advance.

 

Anything outside predefined criteria remains in the dataset.

​

Example: Applying the Process

​

Within the defined session window, the process is executed identically each time.

 

An anchor event appears — weak displacement through 15-minute swing liquidity.

 

Market structure is identified across timeframes.

 

In the example provided:

 

  • Weekly: Lack of structure

  • Daily: Structured

  • 4-hour: Lack of structure

  • Hourly: Lack of structure

​

This classifies as lack of hourly and below.

 

Supplementary confluences are evaluated using the predefined 5-minute validation rule.

 

Initiation liquidity is identified and confirmed at the correct timeframe.

 

Once the scenario is fully classified, the objective is defined.

 

Then price action is replayed.

 

Did price reach the objective before invalidation?

 

Yes.

 

The outcome is recorded as valid.

 

No interpretation.

 

Just classification.

 

For bookkeeping, date and screenshots are logged. This allows later auditing if required.

 

This is repeated for every example.

​

The Critical Discipline

 

The strength of this method lies in repetition.

 

Every scenario is:

 

  • Identified the same way

  • Classified the same way

  • Evaluated the same way

 

No discretionary interpretation is introduced during collection.

 

By separating data collection from analysis, you prevent narrative contamination.

​

Why This Matters

​

Improper backtesting builds confidence.

Proper backtesting builds evidence.

 

Confidence feels good.

Evidence holds under scrutiny.

 

This stage ensures that when probabilities are calculated in the next episode, they reflect behavior — not bias.

​

Closing Thoughts

​

By now you should understand:

 

  • Why most retail backtesting produces false confidence

  • How session selection influences probability

  • Why manual backtesting builds structural discipline

  • How to collect data impartially

  • How to exclude outliers without curve fitting

​

The goal is not to prove the strategy works.

 

The goal is to build a clean dataset.

 

In Episode 07, raw data becomes probability structure.

 

Now that the collection process is defined, the next step is interpretation — converting rows into meaning.

Transcript

© 2026 One of None Trading. All rights reserved.
Trading foreign exchange (forex) and CFDs carries a high level of risk and may not be suitable for all investors. Past performance is not indicative of future results.
All information presented on this website is for educational and informational purposes only and should not be considered financial or investment advice. One of None Trading does not provide any guarantees of profit or performance.

bottom of page