Skip to main content
all case studies
AI/ML// case study

AlphaStream — ML Trading Signal Engine

Five models. Two hundred indicators. No black boxes.

A Python-based ML signal engine with 200+ technical indicators and 5 ensemble models — explainable, auditable, and open source.

Role
Design + build + operate
Client
Sage Ideas (Internal)
Category
AI/ML
Status
Operational
AlphaStream — ML Trading Signal Engine hero

Indicators

200+

ML Models

5

GitHub Stars

5★

External Forks

2

Living architecture

Surface ⇄ System

AlphaStream is presented as both the product people touch and the operating system underneath it: UI, data model, integration path, evidence, and outcome.

Build an AI workflow
  1. 01Visible productScreenshots and product frames show the user-facing surface without pretending concept art is production proof.
  2. 02Operating architectureThe case includes a system map so the architecture is visible, not buried in prose.
  3. 03Evidence registerMetrics, build logs, diagrams, CI artifacts, and links separate actual work from agency theater.
  4. 04Commercial pathThe page routes qualified buyers toward a matching build, automation, or lab entry.

// scroll to x-ray the build

AlphaStream product surfacesurface
ML pipelinesystem
Market data → feature engineering → five-model ensemble → walk-forward validation → streaming signal output. Every stage is observable, every prediction has provenance.

case flow

Surface ⇄ System

ProblemAI/MLSurface2 screensSystemmappedProof4 metricsRouteservice
A case study should prove both layers: the surface people see, and the system that keeps the product alive after launch.

AlphaStream operating map

The diagram is intentionally simplified: it shows the buying logic and operating path, not a decorative fantasy architecture.

client

Sage Ideas (Internal)

category

AI/ML

evidence

2 assets

Proof board

Receipts before claims.

This page separates shipped surface, system map, real metrics, and available artifacts so the work can be inspected instead of just admired.

proof assets

7

Screens, gallery, artifacts

screens

2

Real product surfaces

artifacts

3

Available during discovery

Primary evidence

200 indicators. 5 models. Open source.

Indicators

200+

Indicators

200+

ML Models

5

GitHub Stars

5★

External Forks

2

Surface

Product screenshots and interface frames show the user-facing layer. If real assets are unavailable, the page says so instead of dressing mockups as production proof.

System

Architecture diagrams, build logs, and artifacts make the hidden operating layer visible to technical buyers.

View AlphaStream on GitHub
motion proof mapalphastream · real-system storyboard
DataOHLCVFeatures200+ indicatorsModels5 ensembleBacktestwalk-forwardSignalexplainable

Signal engine proof loop

Surface, system, proof, route.

This storyboard turns the case study into a moving operating map: the buyer sees what was built, where the system lives, and which proof points are actually available.

indicators
200+
models
5
forks
2
01// the problem

What was broken.

Most algorithmic trading tools are black boxes — a signal output with no visibility into why it fired, what inputs drove it, or how it would have performed historically. For a technically sophisticated trader, that's not a tool. It's a guess with a UI.

AlphaStream was built around a different premise: every signal should be explainable, every model should be auditable, and the entire system should run in Python on hardware you control.

The challenge: building a signal engine simultaneously comprehensive enough to cover 200+ technical indicators across multiple timeframes, fast enough to process live market data without falling behind, and transparent enough that a practitioner can understand exactly what drove each output.

02// the approach

How it was built.

Data layer: market data ingestion from multiple sources, normalized into a unified OHLCV + extended data model. Indicator layer: 200+ technical indicators computed via pandas, TA-Lib, and custom implementations — RSI, MACD, Bollinger Bands, ADX, Ichimoku, custom momentum composites.

ML layer: 5 models trained per instrument/timeframe — XGBoost (gradient boosting, primary signal), LightGBM (secondary signal, speed-optimized), Random Forest (confidence calibration), Ridge Regression (trend baseline), and an Ensemble Voter combining all four with learned weights. Backtesting via walk-forward validation with held-out test sets — no look-ahead bias.

Feature engineering is where the edge lives. The 200+ indicators aren't noise — they're the vocabulary the models learn from. The ensemble architecture ensures no single model dominates, and the agreement score tells you when the models disagree (which is itself a signal).

03// architecture

The system map.

How the pieces talk to each other.

AlphaStream ML PipelineMarket data flows through 200+ engineered indicators into a five-model ensemble (XGBoost, LightGBM, Random Forest, Ridge, Ensemble Voter) which is walk-forward validated and emits signals with SHAP explainers.Market DataOHLCV + alt200+ Indicatorsmomentum / vol / regimeXGBoostgradient boostingLightGBMhistogram-basedRandom ForestbaggingRidgelinear baselineEnsemble Voterweighted soft-voteWalk-Forward CVrolling windowsSignallong / flat / shortSHAP ValuesexplainabilityingestfeaturespredictionsoutputMODELSDATA FLOWOUTPUT
04// the numbers

Measured, not asserted.

The real figures from the engagement, printed verbatim. Bars are scaled against the largest comparable magnitude in the set — a secondary cue, never the source of truth.

metric · valuescale 0 – 200
Indicators
200+
ML Models
5
GitHub Stars
5★
External Forks
2
05// built ui

Selected screens.

Real product surfaces from the engagement — not stock illustrations.

AlphaStream live dashboard with 200+ indicators and 14 active strategies
01 / 02

Live dashboard — 14 strategies running, 200+ indicators streaming, latency under 200ms.

06// evidence

What it actually looks like.

Architecture diagrams, CI runs, and dashboards from the engagement.

architectureML pipeline
Market data → feature engineering → five-model ensemble → walk-forward validation → streaming signal output. Every stage is observable, every prediction has provenance.
Market data → feature engineering → five-model ensemble → walk-forward validation → streaming signal output. Every stage is observable, every prediction has provenance.
screenshotVisual diff
Percy visual regression on the strategy dashboard. Pixel-level diffs catch chart regressions before they reach the people who actually trade off them.
Percy visual regression on the strategy dashboard. Pixel-level diffs catch chart regressions before they reach the people who actually trade off them.
07// the build log

What shipped.

The verbatim ship record, given timeline structure.

  1. log · entry 01

    Python package with clean CLI and programmatic API. 200+ indicator implementations (TA-Lib + pandas + custom). 5 trained model pipeline (XGBoost, LightGBM, RF, Ridge, Ensemble). Backtesting engine with walk-forward validation.

  2. log · entry 02

    Signal output with explainability layer (feature importance, SHAP values). Public GitHub repository: 5★, 2 forks, active maintenance. Full documentation including strategy examples.

08// the outcome

What it proved.

5★ GitHub rating from practitioners in the quant/algo trading community. 2 forks by external developers extending the system for their own use cases.

Walk-forward backtests across multiple instruments and timeframes demonstrating consistent signal quality. SHAP explainability output allows practitioners to understand per-signal feature attribution.

ML signal engines for trading don't require a hedge fund infrastructure team. A well-engineered Python package with the right architecture can be built, maintained, and extended by a single practitioner — and released as open source without compromising the core thesis.

09// artifacts

Available on request.

  • GitHub repository → github.com/jteixeira/alphastream
  • Strategy documentation
  • Backtesting methodology notes
// references

Talk to people on this work.

No fabricated quotes. Reference contacts are shared during discovery, with both parties' consent.

Reference available

Engineering lead

Fintech · 5 years

Worked alongside on production trading systems for 5+ years. Available for technical reference calls — code quality, on-call discipline, incident behavior.

Reference call shared during discovery, both consenting.
Reference available

Founder

Studio engagement

Engaged Sage Ideas for a Ship + Operate combination. Willing to talk about scope discipline, timeline accuracy, and what handoff actually looked like.

Reference call shared during discovery, both consenting.
A signal you can't explain is a signal you can't ship. Every prediction comes back with its feature importances attached.
// build log · entry 04
// honesty

What almost happened.

Every project has near-misses — decisions that, if we'd kept going, would have shipped a hole. This is the diff between the version that almost made it to prod and the version that did.

// near-miss · 01diff

beforeBacktests were going to use the same data the models trained on. A subtle look-ahead bug — the kind that makes Sharpe ratios look magical and PnL look real.

afterStrict purged k-fold cross-validation with embargo bars. Splits are walk-forward, no overlap, and the engine refuses to score a model whose train window touches the eval window.

costSharpe dropped from ~3.1 (fake) to ~1.4 (real). Still beats the SPY benchmark out-of-sample.

// near-miss · 02diff

beforeFeature engineering pipeline was about to ship as a notebook — 200+ indicators computed inline, no versioning, no reproducibility.

afterEach indicator is a pure function in `features/`, registered in a manifest, tagged with a semver. Backtests log the exact feature-set hash they used.

costTwo extra weeks. Now every published result is replayable from one commit.

// from the repo

Inline excerpts.

Trimmed, but real. The patterns that made the system survive retries, multi-tenant queries, and a bot that won't hallucinate.

Purged walk-forward split
python
# features/cv.py — production excerpt
def purged_kfold(idx: pd.Index, n_splits: int, embargo: int) -> Iterator[Split]:
    """Walk-forward CV with a hard embargo gap between train and test.

    embargo: number of bars to drop on each side of the test window
             so leakage from rolling-window features cannot bleed in.
    """
    fold_size = len(idx) // (n_splits + 1)
    for k in range(n_splits):
        test_start = (k + 1) * fold_size
        test_end = test_start + fold_size
        train_end = max(0, test_start - embargo)
        train = idx[:train_end]
        test = idx[test_start:test_end]
        yield Split(train=train, test=test, k=k)
// No look-ahead. The split refuses to leak future bars into the training window.
Explainable prediction envelope
python
# api/predict.py
@app.post("/predict")
def predict(req: PredictRequest) -> PredictResponse:
    x = build_feature_row(req.symbol, req.ts)
    proba = model.predict_proba(x)[0, 1]
    shap_values = explainer(x)
    top = sorted(
        zip(FEATURE_NAMES, shap_values),
        key=lambda kv: abs(kv[1]),
        reverse=True,
    )[:5]
    return PredictResponse(
        symbol=req.symbol,
        proba=float(proba),
        signal="long" if proba > 0.55 else "flat",
        attributions=[{"feature": f, "shap": float(v)} for f, v in top],
        model_version=MODEL_VERSION,
        feature_hash=FEATURE_HASH,
    )
// Every signal carries its top-N feature attributions — in the API response, not a separate dashboard.
livebuild a1556e22026-06-19 03:29Z
// solo studio// no analytics resold// every commit human-reviewed