Episode 07

Structuring & Evaluating Trading Data (From Features To Predictive Value)

Key Takeaways

Data alone does not prove an edge; predictive value only exists if outcome probabilities change under different conditions.
A high win rate does not define an edge - structural edge is the difference in probability between scenarios.
Probability must be measured before profitability to determine whether results come from structure or variance.
Variables measured in isolation produce averaged behaviour that hides the true probabilities of specific scenarios.
Comparing distributions while holding other variables constant reveals whether a variable actually has predictive value.
Small probability differences can still represent meaningful edge if they are consistent and repeatable.
Variables that behave similarly should be combined to reduce data fragmentation and increase statistical confidence.

The Problem This Episode Solves

Most traders collect data.

They log their trades, track their wins and losses, and calculate percentages. From this information, they assume they can determine whether or not they have an edge.

The problem is that data alone does not prove predictive value.

If changing market conditions or changing scenarios does not impact outcome distribution, then the strategy does not possess a structural edge. Instead, it is simply producing variance.

An edge is not defined by a high win rate.

An edge is defined by a difference in probability under different conditions.

Understanding this distinction is critical.

Why Probability Matters Before Profitability

Many traders attempt to jump directly into measuring profitability or expected value. However, doing so without first understanding probability can lead to false conclusions.

Profitability answers a very simple question:

Did this make money?

Probability answers a different and far more important question:

Does behaviour change under different conditions?

Without identifying whether behaviour changes across different environments, it becomes extremely difficult to determine whether profitability is the result of true structural advantage or merely random variance.

In other words:

Probability reveals whether behaviour changes.
Profitability reveals whether that behavioural change can be monetised.

You must first understand behaviour before you can confidently monetise it.

Where We Are in the Strategy Development Process

This is Episode 07 of the series.

Within the broader quant-inspired strategy pipeline, we are currently in the validation stage, where the goal is to determine whether the original hypothesis holds significance.

Within the specific framework used in this series, we are still completing the hypothesis validation phase, before moving into conditional dependence.

If you have been following the process so far, you should already have completed several key steps:

Selected an anchor event
Defined the features of the strategy
Established market regimes
Constructed a binary hypothesis
Defined variables
Collected impartial historical data

The purpose of this episode is to transform that raw data into structured probability.

The objectives are to:

Convert collected examples into probability tables
Test whether variables change outcome distribution
Measure probability for each structured scenario
Identify meaningful separation between environments
Distinguish probability from profitability

From Raw Data to Structured Probability

At this stage of the process, your collected data should resemble a structured log.

Each example should include:

Whether the outcome was valid or invalid
The date of the scenario
Screenshots of the relevant charts
The variables present during the event

The screenshots and dates are primarily for traceability. They allow you to revisit and verify the exact market scenario if necessary.

Some additional columns may exist within the dataset that are not yet relevant to the current stage of analysis. These may remain for internal consistency or will become useful later in the research process.

For now, the only important components are:

Valid outcomes
Invalid outcomes
The variables associated with each example

Structuring the Data

To convert raw observations into probabilities, the data must first be grouped according to how the hypothesis itself is structured.

In this model, the hypothesis is defined by three core components:

Initiation liquidity
Supplementary confluences
Market regime

Rather than measuring every possible combination immediately, the first step is to measure each variable independently.

For example:

Group all examples where a specific initiation liquidity occurs.
Count how many resulted in valid outcomes.
Count how many resulted in invalid outcomes.
Calculate the probability of a valid outcome occurring.

This process is repeated for every variable individually.

Once completed, each variable will have its own probability distribution.

Organising Probability Tables

A practical way to organise this information is by creating separate pages for each variable.

For example:

A page for 15-minute swing liquidity as initiation
A page for fair value gap confluence
A page for specific market regimes

Each page contains all examples where that variable is present.

Within the page:

Valid examples appear at the top
Invalid examples appear below
Totals are recorded
Probabilities are calculated

This allows you to easily see the distribution of outcomes associated with each variable.

For example:

494 valid outcomes
207 invalid outcomes

This produces a 71% probability of a valid outcome occurring under that variable.

Once completed across all variables, you will have a clear probability distribution for every structural component of the strategy.

Creating an Overview of Variable Probabilities

After calculating probabilities for each variable, it becomes useful to create a summary page that displays the probabilities for all variables in one location.

This overview allows you to quickly compare variables across categories such as:

Market regimes
Confluences
Initiation liquidity

At this stage, the data should not yet be used for final conclusions.

Instead, it should be treated as informational.

The goal here is simply to observe whether certain variables appear to produce similar distributions or significantly different distributions.

If two variables appear to behave very similarly, they may later be combined to reduce fragmentation and increase sample size.

However, these decisions must only be made after further analysis.

Why Variables Cannot Be Measured in Isolation

While analysing variables individually provides useful insights, it does not tell the complete story.

Variables do not operate independently.

Their behaviour depends on the interaction between conditions.

To illustrate this concept, consider the question:

What makes a car fast?

Suppose we define three contributing factors:

Engine size
Tire type
Road surface

Each factor has two possible states.

If we measure engine size alone, we might conclude that larger engines produce faster cars.

However, this ignores the influence of tires and road conditions.

A large engine combined with bald tires on an icy road will not produce a fast car.

By measuring variables independently, we produce an averaged result that does not accurately represent individual scenarios.

This is exactly what happens when trading variables are measured in isolation.

Without understanding how variables interact, the results become diluted and misleading.

Scenario Combinations

Once multiple variables exist, their combinations increase rapidly.

If three variables each have two possible states, there are eight potential combinations.

This phenomenon highlights the importance of variable discipline.

Too many variables create too many combinations, which fragments the dataset and reduces sample sizes.

When sample sizes become too small, reliable conclusions become impossible.

Therefore, the number of variables must remain carefully controlled.

Measuring Conditional Scenarios

To solve this problem, the next step is to measure variable combinations.

This process follows a similar structure to the previous step but introduces one adjustment.

Instead of grouping by individual variables, we group by overall scenarios.

For example:

Select a specific initiation variable.
Within that group, subdivide by confluence type.
Within each of those groups, observe market regimes.
Measure valid and invalid outcomes for each combination.

At the end of this process, every scenario will have its own probability.

This produces a much clearer understanding of how variables interact.

Determining Predictive Significance

Once probabilities exist for each scenario, the next task is determining whether variables actually influence outcomes.

This can be done through a simple comparison process:

Hold all variables constant except one.
Change the variable being tested.
Compare the resulting probability distributions.

If altering the variable produces a meaningful difference in probabilities, the variable likely holds predictive value.

If probabilities remain similar across different values of the variable, the variable likely does not influence the outcome.

Predictive variables are those that shift outcome distributions.

Recognising Meaningful Differences

Variables must be evaluated relative to one another.

A single probability value does not reveal significance by itself.

For example:

A variable producing a 10% success rate
Another producing a 90% success rate

This difference strongly suggests that the variable influences outcomes.

However, smaller differences require broader comparison across multiple scenarios before conclusions can be drawn.

Patterns must be observed across the entire dataset, not from isolated cases.

Minimal Edge Is Still Valuable

One important psychological concept must be understood.

A meaningful edge does not need to produce dramatic results.

Most professional trading edges are small but consistent.

Some scenarios within the data may appear improbable.

Others may appear highly probable.

This is normal.

Not every scenario needs to be traded. The objective is to identify the scenarios where probability provides a meaningful advantage.

Simplifying Variables

If analysis reveals that certain variables behave nearly identically, they should be combined.

Combining variables:

Increases sample size
Reduces dataset fragmentation
Simplifies the strategy

This is one of the major benefits of conducting probability analysis before building a final trading model.

Why This Work Matters

This process may appear time-consuming, but it serves a critical purpose.

Probability analysis isolates behavioural causality.

It answers the question:

What causes the behaviour observed in the market?

Profitability, by contrast, only measures capital outcomes.

Without isolating behavioural causes first, it is impossible to determine whether profits come from:

Structural edge
Trading skill
Random variance

Long-term traders must know that their profits come from structure, not luck.

Measuring probability is the only reliable way to confirm this.

What Comes Next

During the process of collecting and analysing data, you will inevitably discover new observations about your anchor event.

Many traders make the mistake of rebuilding their entire strategy around these observations.

Instead, the correct approach is to test each observation individually.

The next episode will introduce the concept of sub-hypotheses.

In Episode 08, we will cover:

Engineering sub-hypotheses
Isolating new behavioural observations
Testing them without corrupting the original dataset

This allows the strategy to evolve without destroying the integrity of the original research.

Transcript

Episode 06

Episode 08