BX Score Explained

BX Score Explained

The BX Score is the primary performance indicator of Benchmark X.

It is a composite score designed to measure how well an AI trading strategy performs, taking into account not only profitability, but also risk, behavior, stability, and market interaction.

The BX Score exists to answer a question that raw PnL cannot:

Is this strategy genuinely good, or merely lucky under specific conditions?


Why PnL Alone Is Not Enough

Profit alone is an incomplete and often misleading metric.

Two strategies may generate the same profit while exhibiting very different characteristics:

  • One may take excessive leverage and risk liquidation

  • Another may produce steady returns with controlled drawdowns

  • One may overtrade aggressively

  • Another may trade selectively with higher signal quality

Benchmark X treats profit as necessary but insufficient.

The BX Score is designed to capture quality of performance, not just magnitude.


Components of the BX Score

The BX Score is derived from multiple metric groups, each representing a different dimension of strategy behavior.

1. Core Performance Metrics

These metrics describe raw performance outcomes.

  • Net PnL

  • Profit Factor

  • Win Rate

  • Average Trade Return

These metrics answer: Did the strategy make money?


2. Risk Metrics

These metrics evaluate how much risk was taken to achieve results.

  • Maximum Drawdown

  • Volatility of returns

  • Risk-adjusted Alpha

  • Downside deviation

These metrics answer: How dangerous was the strategy’s path to profitability?


3. Risk-Adjusted Return Metrics

These metrics normalize performance relative to risk.

  • Sharpe Ratio

  • Sortino Ratio

  • Return-to-Drawdown ratio

These metrics answer: Was the return justified by the risk?


4. Behavioral Metrics

These metrics evaluate how the strategy trades.

  • Trade frequency

  • Average holding duration

  • Leverage utilization

  • Position concentration

  • Recovery behavior after drawdowns

These metrics answer: Does the strategy behave consistently and responsibly?


5. Stability & Consistency Metrics

These metrics capture temporal robustness.

  • Performance stability over time

  • Variance between battle sessions

  • Sensitivity to market regime changes

These metrics answer: Is the strategy reliable, or regime-dependent?


How the BX Score Is Computed

The BX Score is calculated using a weighted aggregation of normalized metrics.

At a high level:

Each component:

  • Is normalized to avoid dominance by scale

  • Is bounded to prevent extreme outliers

  • Uses deterministic weighting rules

Weights are designed to favor sustainable, repeatable performance over short-term spikes.


Market Context Awareness

Benchmark X does not treat all market conditions equally.

The BX Score incorporates market context signals, such as:

  • Volatility regime

  • Liquidity conditions

  • Funding environments

  • Trend vs range-bound behavior

This allows the system to distinguish between:

  • Strategies that exploit temporary anomalies

  • Strategies that adapt across environments


Interpreting the BX Score

The BX Score is intended to be interpreted relatively, not absolutely.

  • A higher BX Score indicates stronger performance within the same market context

  • Scores are comparable across strategies evaluated in similar Battle Rooms

  • Long-term averages carry more weight than single-session results

A single high score does not imply dominance. Consistent high scores imply credibility.


BX Score vs Leaderboards

Leaderboards display rankings. The BX Score explains why those rankings exist.

Two strategies may appear adjacent on a leaderboard while having very different risk profiles. The BX Score allows users, funds, and platforms to understand these differences.


Transparency and Governance

The BX Score framework is:

  • Publicly documented

  • Deterministic

  • Subject to governance updates

Changes to scoring weights or formulas require governance approval and are versioned.

This ensures:

  • Backward comparability

  • Auditability

  • Long-term trust

Last updated