Episode 02
Feature Engineering For Traders (Structuring Price Data For Analysis)
Key Takeaways
-
Feature engineering = turning raw OHLC price into repeatable labels (features) that can be tested later.
-
Discretionary trading is often mental feature engineering — but without fixed definitions, it can’t be measured or improved.
-
A strategy-building process should start with an anchor event that “pins” observations to a consistent moment.
-
A good anchor event has: (1) a clear objective, (2) enough frequency for the style, (3) a binary trigger (happened / did not happen).
-
Anchor event selection process: observe a repeating behaviour → identify the first observable event before it → write a binary rule.
-
Good features follow the checklist: definable, repeatable, finite, relevant, non-redundant.
-
The episode’s feature set is grouped into Liquidity and Inefficiencies, ready for later testing.
The Problem This Episode Solves
​
Most retail trading strategies fail long before the first trade is placed, and it is rarely because “psychology is bad.” The more common problem is a lack of definitions. When a strategy cannot be written into repeatable labels, it cannot be tested. If it cannot be tested, it cannot be improved, maintained, or trusted over time.
The goal of this episode is to explain what feature engineering means in a trading context, why undefined discretion is effectively unmeasurable feature engineering, and how to convert raw OHLC price movement into structured concepts that can be tested later in the series. This episode does not attempt to prove anything. It focuses on building a clean “language layer” that makes later measurement possible.
A feature is a label. It becomes valuable only after testing.
Where This Episode Fits in the Series
​
The overall framework being followed is:
ORIENTATION → HYPOTHESES → CONDITIONAL DEPENDENCE → OBJECTIVES & EXECUTION → PROFITABILITY
This episode sits in the ORIENTATION phase. The purpose here is to create the definitions and labels that will later be measured. That means statistical significance, probabilities, expectancy, and profitability are intentionally not the focus yet. Those concepts arrive later, after the feature set is stable and consistent.
What Feature Engineering Is
​
Feature engineering is the act of transforming raw candle information into structured variables that might have predictive value. In trading, the raw input is OHLC data. The output is a set of consistent labels that describe market behaviour and context in a way that can be recorded into a dataset.
A clean way to visualise the process is:
Price (OHLC) → Feature Labels → Dataset Rows → (later) Testing
Raw charts are messy. They contain unlimited information at once. Feature engineering exists to compress that mess into a small number of measurable ideas. Nothing becomes “true” simply because it has been labelled. Labels exist so that testing can happen later.
A helpful analogy is sports scouting. Raw footage of an athlete is not very useful on its own, because it is difficult to compare one game to another. Once the footage is converted into statistics—speed, accuracy, decision-making under pressure, and so on—patterns can be measured and compared. Trading charts are the footage. Features are the stats.
Why Discretionary Trading Is Often “Mental Feature Engineering”
​
Discretionary trading still relies on features, even if those features are not written down. Terms like “rejection candle,” “momentum,” “liquidity sweep,” “imbalance,” or “strong move” are all labels being applied to price behaviour. The difference is that in most discretionary approaches, the labels are not defined mechanically, so they shift over time.
When definitions shift, results cannot be separated into strategy versus trader. Winning becomes proof of skill, and losing becomes proof of weak mindset, even when the underlying strategy might simply be negative expectancy. Discretion is not the enemy. Undefined discretion is. If a concept cannot be defined clearly enough that someone else can mark it the same way, it is not testable. If it is not testable, it cannot be validated, refined, or maintained.
A simple example is the phrase “strong rejection candle.” Strength relative to what—wick size, candle range, close location, relative size compared to the last N candles, location on the chart? Without definition, the label is just interpretation.
​
Anchor Events
Feature engineering becomes meaningful when features are built around an anchor event. An anchor event is the first clearly observable trigger that defines when measurement begins. Without an anchor event, features become floating concepts, and datasets become inconsistent because each sample starts at a different “moment” depending on interpretation.
​
Selection Criteria for a Good Anchor Event
​
A good anchor event must satisfy three requirements.
First, it must be clear and objective. The event should be observable on a chart with minimal interpretation. If it requires intuition to identify, it cannot anchor a dataset reliably.
Second, it must be frequent enough to collect. Frequency is relative to the style of trading being done. A scalping approach may require an anchor event that occurs frequently, while a swing approach can tolerate an anchor event that appears less often. The point is that the event must occur often enough to build sample size within the intended approach.
Third, it must be a binary trigger. The event must resolve cleanly into a yes/no label: it happened or it did not. A binary trigger prevents ambiguity and makes later comparisons meaningful.
​
The Process for Selecting an Anchor Event
Selecting an anchor event is a structured process rather than a creative one.
The first step is to identify a behaviour that is repeatedly observed. This is not the same as choosing an entry model. It is simply noticing something that occurs often enough to matter.
The second step is to ask a specific question: what is the first observable event that occurs prior to this behaviour?This question forces the search for a trigger that is earlier than the behaviour itself and is visible in price.
The third step is to convert that event into a rule with a binary label. If the event cannot be described as “present” or “not present,” the event is not yet suitable as an anchor. Refining the rule continues until the trigger is unambiguous.
Features are then selected as structured context around that anchor event. The event is the “start point.” Features describe what is true about the market at that start point.
How Features Should Be Selected
​
Once an anchor event exists, features can be selected. A feature is simply a structured concept that describes context around the anchor event and can be logged consistently.
A practical selection checklist is:
A feature should be definable, meaning another person could label it the same way. A feature should be repeatable, meaning it appears across time rather than only in rare situations. A feature should be finite, meaning it collapses into a small set of categories or a simple yes/no state rather than infinite interpretation. A feature should be relevant, meaning there is a plausible connection to the behaviour being measured. A feature should not be redundant, meaning it does not double-count the same information as another feature.
Features should begin small in number. A smaller feature set preserves sample size and makes later analysis cleaner. Features can always be expanded later once the dataset is stable, but beginning with too many concepts fragments the dataset and creates false confidence.
After the anchor event is defined, feature selection is straightforward: observe the anchor event repeatedly and identify concepts or variables that appear to matter, then keep only those that can be defined mechanically and that satisfy the checklist.
​
The Features Used in This Strategy
The feature set used in this strategy is built from OHLC-visible concepts and grouped into two categories.
One category is liquidity, which describes location and where price is interacting with likely order concentration.
The second category is inefficiencies, which describe how price moved and whether movement left imbalance or state-change behaviour behind.
These concepts are introduced here as feature candidates. Statistical significance is not assumed in this episode. Measurement happens later once the dataset has been structured.
Liquidity as a Feature
​
Liquidity, as a feature, is a label for locations where stop clusters and resting orders are likely concentrated. These often appear around highs, lows, swing points, and obvious structural reference points. Liquidity is used as a location-based feature to describe where the anchor event occurs and what nearby reference points are present.
Within this strategy, liquidity is also separated into two mechanical variables. Timeframe discussion is intentionally saved for later episodes; the focus here is purely on definitional structure.
​
Swing Liquidity
​
Swing liquidity is labelled when a swing high or swing low is formed in a way that is mechanically contained. In simple terms, the candle to the left of the high or low does not exceed the high or low that was generated. This forms a clean swing point that can be labelled consistently and treated as a likely location of resting stops or liquidity.
​
Low Resistance Buy-side / Sell-side Liquidity
Low resistance liquidity is labelled when a high or low is formed in a way that implies less “resistance” immediately prior to the print. In this variable, the candle to the left of the high or low does exceed the high or low that was generated. This creates a slightly different liquidity structure and is treated as a distinct, mechanically defined variable within the liquidity feature group.
These two variables exist to keep liquidity definitions finite and consistent while still capturing meaningful differences in structure. The exact timeframes these are applied to are discussed later in the series.
Inefficiencies as Features
​
Inefficiency features describe the quality of movement. Rather than focusing on location, they focus on whether price moved in a way that suggests imbalance or incomplete trading.
​
Fair Value Gaps / Imbalances
​
An imbalance is used as a label for rapid movement that leaves inefficient pricing behind. It captures the idea that price moved quickly through an area and did not trade evenly, leaving a zone that may be relevant later. The label itself is not a claim of predictive value. It exists as a structured description that can be tested later.
Inverse / State-Change Inefficiencies
​
An inverse or state-change inefficiency is a label for behaviour change after price trades through an area. It captures the idea that an area that once behaved one way may behave differently after being invalidated or traded through. This is again not presented as proof. It is presented as a structured state label that can be logged and tested later.
​
“Theory” in This Framework
In this series, “theory” refers to the conceptual logic behind a feature and the ability to identify it consistently on charts. That includes what the concept represents and what the mechanical conditions are for marking it. The most effective way to absorb the theory component is visually, so chart-based demonstrations are shown directly in the video. This transcript focuses on the definitional structure and how the concepts are being framed for later testing.
​
Why These Features Fit the Checklist
​
The reason these concepts were selected is not because they are popular, but because they can be made mechanical and logged consistently using OHLC data. Liquidity functions as a location feature and can be broken into finite mechanical variables. Inefficiencies function as movement-quality features and can be identified with consistent definitions. Together, they create a feature set that is repeatable, finite, and capable of being tested later without requiring intuition to reinterpret definitions.
This strategy also relies on feature types that are structurally compatible with the model: supplementary confluences are consistently present as context features, and initiation liquidity functions as a consistent anchor structure. That architecture avoids building a dataset where features only appear rarely, which would collapse sample size and prevent meaningful comparison.
What Should Be Done After This Episode
​
The outcome of this episode is not a strategy. It is a clean framework for converting charts into labels.
The next step is to define an anchor event using the process outlined earlier. Once the anchor event exists, feature selection becomes simple: observe the anchor event repeatedly and identify concepts or variables that can be defined mechanically and that satisfy the checklist. Start small. Prioritise clarity and repeatability. The purpose is to build a stable language layer that can be measured later.​
​
Bridge to the Next Episode
​
Once features can be labelled consistently, the next challenge becomes obvious: markets do not behave the same way all the time. If feature labels are applied without accounting for changing market conditions, datasets become inconsistent and conclusions become unreliable. The next episode explains market regimes and why mechanical structure matters for maintaining dataset integrity across time.