Feature Engineering For Traders
Structuring raw price data into testable, measurable features.
Most retail trading strategies fail long before the first trade is placed, and it is rarely because "psychology is bad." The more common problem is a lack of definitions. When a strategy cannot be written into repeatable labels, it cannot be tested. If it cannot be tested, it cannot be improved, maintained, or trusted over time.
The goal of this episode is to explain what feature engineering means in a trading context, why undefined discretion is effectively unmeasurable feature engineering, and how to convert raw OHLC price movement into structured concepts that can be tested later in the series. This episode does not attempt to prove anything. It focuses on building a clean language layer that makes later measurement possible.
A feature is a label. It becomes valuable only after testing.
The overall framework being followed is: ORIENTATION → HYPOTHESES → CONDITIONAL DEPENDENCE → OBJECTIVES & EXECUTION → PROFITABILITY.
This episode sits in the ORIENTATION phase. The purpose here is to create the definitions and labels that will later be measured. Statistical significance, probabilities, expectancy, and profitability are intentionally not the focus yet. Those concepts arrive later, after the feature set is stable and consistent.
Feature engineering is the act of transforming raw candle information into structured variables that might have predictive value. In trading, the raw input is OHLC data. The output is a set of consistent labels that describe market behaviour and context in a way that can be recorded into a dataset.
The process: Price (OHLC) → Feature Labels → Dataset Rows → (later) Testing.
Raw charts are messy. They contain unlimited information at once. Feature engineering exists to compress that mess into a small number of measurable ideas. Nothing becomes "true" simply because it has been labelled. Labels exist so that testing can happen later.
A helpful analogy is sports scouting. Raw footage of an athlete is not very useful on its own, because it is difficult to compare one game to another. Once the footage is converted into statistics — speed, accuracy, decision-making under pressure — patterns can be measured and compared. Trading charts are the footage. Features are the stats.
Discretionary trading still relies on features, even if those features are not written down. Terms like "rejection candle," "momentum," "liquidity sweep," "imbalance," or "strong move" are all labels being applied to price behaviour. The difference is that in most discretionary approaches, the labels are not defined mechanically, so they shift over time.
When definitions shift, results cannot be separated into strategy versus trader. Winning becomes proof of skill, and losing becomes proof of weak mindset — even when the underlying strategy might simply be negative expectancy. Discretion is not the enemy. Undefined discretion is. If a concept cannot be defined clearly enough that someone else can mark it the same way, it is not testable.
A simple example is the phrase "strong rejection candle." Strength relative to what — wick size, candle range, close location, relative size compared to the last N candles, location on the chart? Without definition, the label is just interpretation.
Feature engineering becomes meaningful when features are built around an anchor event. An anchor event is the first clearly observable trigger that defines when measurement begins. Without an anchor event, features become floating concepts, and datasets become inconsistent because each sample starts at a different moment depending on interpretation.
A good anchor event must satisfy three requirements. First, it must be clear and objective — observable on a chart with minimal interpretation. Second, it must be frequent enough to collect — occurring often enough to build sample size within the intended approach. Third, it must be a binary trigger — resolving cleanly into a yes/no label.
Selecting an anchor event is a structured process. Identify a behaviour that is repeatedly observed. Ask: what is the first observable event that occurs prior to this behaviour? Then convert that event into a rule with a binary label. If the event cannot be described as present or not present, it is not yet suitable as an anchor.
Once an anchor event exists, features can be selected. A practical selection checklist: a feature should be definable, meaning another person could label it the same way. It should be repeatable, meaning it appears across time rather than only in rare situations. It should be finite, collapsing into a small set of categories or a yes/no state. It should be relevant, with a plausible connection to the behaviour being measured. It should not be redundant, meaning it does not double-count the same information as another feature.
Features should begin small in number. A smaller feature set preserves sample size and makes later analysis cleaner. Features can always be expanded later once the dataset is stable, but beginning with too many concepts fragments the dataset and creates false confidence.
The feature set used in this strategy is built from OHLC-visible concepts and grouped into two categories: liquidity, which describes location and where price is interacting with likely order concentration; and inefficiencies, which describe how price moved and whether movement left imbalance or state-change behaviour behind.
These concepts are introduced here as feature candidates. Statistical significance is not assumed in this episode. Measurement happens later once the dataset has been structured.
Swing liquidity is labelled when a swing high or swing low is formed in a way that is mechanically contained. The candle to the left of the high or low does not exceed the high or low that was generated. This forms a clean swing point that can be labelled consistently and treated as a likely location of resting stops or liquidity.
Low resistance liquidity is labelled when a high or low is formed in a way that implies less resistance immediately prior to the print. The candle to the left of the high or low does exceed the high or low that was generated. This creates a slightly different liquidity structure and is treated as a distinct, mechanically defined variable within the liquidity feature group.
An imbalance is used as a label for rapid movement that leaves inefficient pricing behind. It captures the idea that price moved quickly through an area and did not trade evenly, leaving a zone that may be relevant later. The label itself is not a claim of predictive value. It exists as a structured description that can be tested later.
An inverse or state-change inefficiency is a label for behaviour change after price trades through an area. It captures the idea that an area that once behaved one way may behave differently after being invalidated or traded through. This is presented as a structured state label that can be logged and tested later — not as proof of anything.
These concepts were selected not because they are popular, but because they can be made mechanical and logged consistently using OHLC data. Liquidity functions as a location feature and can be broken into finite mechanical variables. Inefficiencies function as movement-quality features and can be identified with consistent definitions. Together, they create a feature set that is repeatable, finite, and capable of being tested later without requiring intuition to reinterpret definitions.
The outcome of this episode is not a strategy. It is a clean framework for converting charts into labels. The next step is to define an anchor event and begin selecting features around it. Start small. Prioritise clarity and repeatability.
Once features can be labelled consistently, the next challenge becomes obvious: markets do not behave the same way all the time. If feature labels are applied without accounting for changing market conditions, datasets become inconsistent and conclusions become unreliable. The next episode explains market regimes and why mechanical structure matters for maintaining dataset integrity across time.