Episode 05

From Hypothesis to Measurable Data (Operationalising Trading Concepts)

Key Takeaways

Most traders believe they are collecting data, but without defined variables and allowable values they are only collecting descriptions.
Features are not measurable on their own — they must be decomposed into recordable, mutually exclusive variables to isolate behavioral influence.
Undefined or overlapping variables cause identical scenarios to be logged differently and different scenarios to be logged identically, corrupting probabilities.
Valid variables must be behaviorally relevant, identifiable live, mutually exclusive, and justified by logical separation — not added for complexity.
States exist to prevent behavioral mixing within variables, but they are only necessary when separation has not already occurred at the variable level.
Grouping supplementary confluences into finite classifications prevents double counting, dataset fragmentation, and loss of statistical clarity.
Objective measurement requires a fixed identification process so every scenario is classified the same way every time.

The Problem This Episode Solves

Most traders believe they are collecting data.

In reality, they are collecting descriptions.

When variables are undefined and allowable values are not clearly constrained, identical examples get logged differently. Different examples get logged identically. Results become narrative instead of measurement.

This episode exists to correct that problem.

The objective here is to understand why undefined variables corrupt trading data, to differentiate features from variables and states, to learn a structured process for defining both, and to formalize the variable system used in my own strategy.

By the end of this stage, you should be able to collect trading data impartially and build datasets that reflect actual probability — not interpretation.

We are still within the hypothesis stage of the quant pipeline. Anchor events, features, market regimes, and binary outcomes should already be defined. What remains is determining what exactly must be measured in relation to that binary hypothesis.

A simple “yes” or “no” outcome is not sufficient. Measurement requires structure beneath it.

Why Features Are Not Measurable

In earlier episodes, features were defined as descriptive tools used to make price action interpretable. But description is not measurement.

Markets are not measured by naming phenomena. They are measured by instrumenting price action through variables.

Consider a simple analogy.

If “weather” is the feature influencing the decision to go to the beach, weather itself is not measurable. You cannot measure “weather” directly. You must break it down into measurable components such as temperature, wind speed, and rainfall.

Only once those measurable components are defined can the influence of the feature be evaluated objectively.

The same applies to trading.

If liquidity or inefficiencies are features, they must be decomposed into variables before their influence on outcomes can be measured.

Feature Buckets: Structural Organisation

A feature bucket is an organizational layer used to group related concepts and maintain structural clarity.

In my strategy, there are two primary feature buckets:

Initiation Liquidity
Supplementary Confluences

Feature buckets are not mathematically required. They are organizational tools. Their purpose is to prevent logical sprawl and maintain consistency.

Initiation liquidity answers:

Where was the event anchor initiated?

Supplementary confluences answer:

What supporting behavior existed around the initiation?

From these buckets, variables are defined.

The Variable Selection Process

Defining variables requires discipline. The process follows five rules.

First, start from the scenario — not the feature. Ask: what differences could plausibly change the outcome of this specific scenario? Variables must be behaviorally relevant to the anchor event, not descriptive abstractions detached from it.

Second, variables must be recordable in every example. This does not mean they must be present every time. It means their presence or absence can always be logged objectively.

Third, variables must be mutually exclusive. A single example cannot fall into multiple categories simultaneously. If it can, probability cannot be isolated correctly.

Fourth, variables must be identifiable live. Anything that can only be determined retrospectively measures outcome, not condition.

Fifth, complexity must earn its place. If a variable does not create plausible behavioral separation, it fragments data unnecessarily and reduces sample reliability.

Understanding States

Variables sometimes require states.

A state is a predefined allowed value a variable can take. States prevent mixing different behaviors inside a single variable and improve comparability.

For example, temperature as a variable could be divided into five-degree increments. Without states, temperature becomes too granular and fragments data excessively.

However, states are not always required.

If a variable is already binary or categorical, and behavioral separation already exists at the variable level, additional states only introduce fragmentation without adding clarity.

In my strategy, states are not required because variables are already categorical and mutually exclusive.

My Variable Structure: The Pyramid

My system follows a simple hierarchy:

Feature Bucket

→ Feature Type

→ Variable

→ Final Classification

Initiation Liquidity

Feature Bucket: Initiation Liquidity

Feature Types:

Swing Liquidity
Low Resistance Liquidity

Variables: Timeframe-based distinctions beginning at the 15-minute timeframe (since the event anchor exists there).

Examples:

15-minute swing liquidity
15-minute low resistance liquidity
1-hour swing liquidity
4-hour swing liquidity

Timeframe matters because higher timeframes plausibly hold greater behavioral significance. This satisfies the behavioral relevance requirement.

Each variable is:

Recordable
Mutually exclusive (highest timeframe identifier used)
Identifiable live
Behaviorally hypothesized

Therefore, valid.

No states are added because timeframe separation already creates the necessary behavioral segmentation.

Supplementary Confluences

Feature Bucket: Supplementary Confluences

Feature Types:

Fair Value Gaps
Inverse Fair Value Gaps

Variables:

Already Filled
Unfilled

The behavioral logic is straightforward.

If a fair value gap has already been filled, the inefficiency has theoretically been rebalanced. An unfilled inefficiency may hold greater behavioral significance.

Again, each variable:

Is recordable
Is mutually exclusive
Is identifiable live
Has behavioral logic

Therefore, valid.

Preventing Fragmentation: The Six-Class System

Supplementary confluences could theoretically fragment into infinite combinations.

To prevent double counting and preserve behavioral meaning, I group them into six classifications:

Fair Value Gaps only
Inverse Fair Value Gaps only
Already Filled FVGs only
Already Filled IFVGs only
Conglomerate (any mixed combination)
None (no confluences respected)

Every scenario falls into one of these six classes.

This prevents infinite subdivision while preserving structural clarity.

Objective Identification: The 5-Minute Rule

Determining whether a confluence was respected requires consistency.

The 15-minute anchor is broken into 5-minute substructures. If price closes beyond a confluence on the 5-minute timeframe, that confluence is invalidated.

This prevents subjective interpretation and ensures only respected confluences are logged.

It is mechanical, repeatable, and live-applicable.

Market Structure Variables

From Episode 03, market structure context is already defined:

Lack of weekly and below
Lack of 4-hour and below
Lack of hourly and below
In favor
Against

These remain categorical and require no additional states.

Full Scenario Application

Once market structure, initiation liquidity, and supplementary confluences are defined, the full scenario becomes measurable.

Example classification:

Initiation Liquidity: 15-minute low resistance buyside liquidity

Market Structure: Lack of weekly and below

Supplementary Confluence: Inverse FVG

Objective: Nearest opposing confluence

Outcome is then measured relative to the predefined binary criteria.

This process is repeated identically for every example.

No discretion. No reinterpretation.

Only measurement.

Why This Stage Matters

Without defined variables:

Data fragments
Behaviors mix
Probabilities distort

With defined variables:

Behavioral separation becomes testable
Subsets remain large enough for statistical relevance
Strategy components become measurable

This is the difference between logging trades and building a model.

Closing Thoughts

By now, you should understand:

Why undefined variables corrupt data
The difference between features, variables, and states
How to define valid variables
How to prevent fragmentation
How the full variable system operates within my strategy

The next stage is conditioning — understanding how these variables interact and influence probability.

Join me in Episode 06.

Transcript

Episode 04

Episode 06