From Hypothesis to Measurable Data
Operationalising trading concepts into structured, consistent data.
Most traders believe they are collecting data. In reality, they are collecting descriptions.
When variables are undefined and allowable values are not clearly constrained, identical examples get logged differently. Different examples get logged identically. Results become narrative instead of measurement.
The objective here is to understand why undefined variables corrupt trading data, to differentiate features from variables and states, to learn a structured process for defining both, and to formalise the variable system used in this strategy. By the end of this stage, you should be able to collect trading data impartially and build datasets that reflect actual probability — not interpretation.
We are still within the hypothesis stage of the quant pipeline. Anchor events, features, market regimes, and binary outcomes should already be defined. What remains is determining what exactly must be measured in relation to that binary hypothesis. A simple yes or no outcome is not sufficient. Measurement requires structure beneath it.
In earlier episodes, features were defined as descriptive tools used to make price action interpretable. But description is not measurement. Markets are not measured by naming phenomena — they are measured by instrumenting price action through variables.
Consider a simple analogy. If "weather" is the feature influencing the decision to go to the beach, weather itself is not measurable. You cannot measure "weather" directly. You must break it down into measurable components such as temperature, wind speed, and rainfall. Only once those components are defined can the influence of the feature be evaluated objectively.
The same applies to trading. If liquidity or inefficiencies are features, they must be decomposed into variables before their influence on outcomes can be measured.
A feature bucket is an organisational layer used to group related concepts and maintain structural clarity. In this strategy, there are two primary feature buckets: Initiation Liquidity and Supplementary Confluences.
Feature buckets are not mathematically required. They are organisational tools. Their purpose is to prevent logical sprawl and maintain consistency. Initiation liquidity answers: where was the event anchor initiated? Supplementary confluences answer: what supporting behaviour existed around the initiation?
Defining variables requires discipline. The process follows five rules.
First, start from the scenario — not the feature. Ask: what differences could plausibly change the outcome of this specific scenario? Variables must be behaviourally relevant to the anchor event, not descriptive abstractions detached from it.
Second, variables must be recordable in every example. This does not mean they must be present every time — it means their presence or absence can always be logged objectively. Third, variables must be mutually exclusive. A single example cannot fall into multiple categories simultaneously. Fourth, variables must be identifiable live — anything that can only be determined retrospectively measures outcome, not condition. Fifth, complexity must earn its place. If a variable does not create plausible behavioural separation, it fragments data unnecessarily and reduces sample reliability.
Variables sometimes require states. A state is a predefined allowed value a variable can take. States prevent mixing different behaviours inside a single variable and improve comparability.
However, states are not always required. If a variable is already binary or categorical, and behavioural separation already exists at the variable level, additional states only introduce fragmentation without adding clarity. In this strategy, states are not required because variables are already categorical and mutually exclusive.
Feature bucket: Initiation Liquidity. Feature types: Swing Liquidity and Low Resistance Liquidity. Variables are timeframe-based distinctions beginning at the 15-minute timeframe, since the event anchor exists there.
Examples: 15-minute swing liquidity, 15-minute low resistance liquidity, 1-hour swing liquidity, 4-hour swing liquidity. Timeframe matters because higher timeframes plausibly hold greater behavioural significance. Each variable is recordable, mutually exclusive, identifiable live, and behaviourally hypothesised. The highest timeframe identifier is used where multiple timeframes are present.
Feature bucket: Supplementary Confluences. Feature types: Fair Value Gaps and Inverse Fair Value Gaps. Variables: Already Filled and Unfilled.
The behavioural logic is straightforward. If a fair value gap has already been filled, the inefficiency has theoretically been rebalanced. An unfilled inefficiency may hold greater behavioural significance.
To prevent double counting and preserve behavioural meaning, supplementary confluences are grouped into six classifications: Fair Value Gaps only, Inverse Fair Value Gaps only, Already Filled FVGs only, Already Filled IFVGs only, Conglomerate (any mixed combination), and None (no confluences respected). Every scenario falls into one of these six classes.
Determining whether a confluence was respected requires consistency. The 15-minute anchor is broken into 5-minute substructures. If price closes beyond a confluence on the 5-minute timeframe, that confluence is invalidated. This prevents subjective interpretation and ensures only respected confluences are logged. It is mechanical, repeatable, and live-applicable.
Once market structure, initiation liquidity, and supplementary confluences are defined, the full scenario becomes measurable. Example classification: Initiation Liquidity — 15-minute low resistance buyside liquidity. Market Structure — lack of weekly and below. Supplementary Confluence — Inverse FVG. Objective — nearest opposing confluence.
Outcome is then measured relative to the predefined binary criteria. This process is repeated identically for every example. No discretion. No reinterpretation. Only measurement.
Without defined variables, data fragments, behaviours mix, and probabilities distort. With defined variables, behavioural separation becomes testable, subsets remain large enough for statistical relevance, and strategy components become measurable. This is the difference between logging trades and building a model.
The next stage is conditioning — understanding how these variables interact and influence probability.