In this tutorial, we explore skfolio, a scikit-learn compatible portfolio optimization library that helps us build, compare, and evaluate different investment strategies in a structured Python workflow. We start by loading S&P 500 price data, converting it into returns, and creating a time-based train-test split suitable for financial analysis. From there, we build simple baseline portfolios, test mean-variance optimization, compare alternative risk measures, apply risk-parity methods, and use hierarchical clustering techniques such as HRP and Nested Clusters Optimization. We also move into more advanced portfolio construction ideas, including robust covariance estimators, Black-Litterman views, factor models, pre-selection pipelines, walk-forward validation, and hyperparameter tuning with GridSearchCV.
import subprocess, sys
def _pip_install(pkg):
subprocess.check_call([sys.executable, "-m", "pip", "install", "-q", pkg])
try:
import skfolio # noqa: F401
except ImportError:
_pip_install("skfolio")
import skfolio # noqa: F401
print(f"skfolio version: {skfolio.__version__}")
import warnings
warnings.filterwarnings("ignore")
import numpy as np
import pandas as pd
import plotly.io as pio
try:
pio.renderers.default = "colab"
except Exception:
pio.renderers.default = "notebook"
from sklearn import set_config
from sklearn.model_selection import (
GridSearchCV, KFold, train_test_split,
)
from sklearn.pipeline import Pipeline
from skfolio import RatioMeasure, RiskMeasure, Population
from skfolio.datasets import load_factors_dataset, load_sp500_dataset
from skfolio.preprocessing import prices_to_returns
from skfolio.model_selection import WalkForward, cross_val_predict
from skfolio.optimization import (
EqualWeighted, InverseVolatility, Random,
MeanRisk, ObjectiveFunction,
RiskBudgeting,
HierarchicalRiskParity,
NestedClustersOptimization,
)
from skfolio.moments import (
EmpiricalCovariance, LedoitWolf, DenoiseCovariance, GerberCovariance,
EmpiricalMu, EWMu, ShrunkMu,
)
from skfolio.prior import EmpiricalPrior, BlackLitterman, FactorModel
from skfolio.pre_selection import SelectKExtremes
set_config(transform_output="pandas")
prices = load_sp500_dataset()
print("Prices shape:", prices.shape)
print(prices.tail(3))
X = prices_to_returns(prices)
print("\nReturns shape:", X.shape)
X_train, X_test = train_test_split(X, test_size=0.33, shuffle=False)
print(f"Train: {X_train.index.min().date()} → {X_train.index.max().date()} ({len(X_train)} days)")
print(f"Test : {X_test.index.min().date()} → {X_test.index.max().date()} ({len(X_test)} days)")We install and import all the required libraries, including skfolio, scikit-learn, pandas, NumPy, and Plotly. We load the S&P 500 dataset, convert asset prices into returns, and prepare the data for portfolio optimization. We split the returns into training and test sets in chronological order to avoid look-ahead bias.
benchmarks = {
"1/N (EqualWeighted)": EqualWeighted(),
"Inverse-Volatility": InverseVolatility(),
"Random (Dirichlet)": Random(),
}
baseline_population = Population([])
for name, mdl in benchmarks.items():
mdl.fit(X_train)
ptf = mdl.predict(X_test)
ptf.name = name
baseline_population.append(ptf)
print(f"{name:25s} Sharpe={ptf.annualized_sharpe_ratio:.3f} "
f"AnnRet={ptf.annualized_mean:.3%} AnnVol={ptf.annualized_standard_deviation:.3%}")
min_var = MeanRisk(risk_measure=RiskMeasure.VARIANCE)
min_var.fit(X_train)
print("\nMin-Variance weights (top 5):")
print(pd.Series(min_var.weights_, index=X_train.columns).sort_values(ascending=False).head())
ptf_min_var = min_var.predict(X_test)
ptf_min_var.name = "Min Variance"
max_sharpe = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
)
max_sharpe.fit(X_train)
ptf_max_sharpe = max_sharpe.predict(X_test)
ptf_max_sharpe.name = "Max Sharpe"
ef = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
efficient_frontier_size=20,
portfolio_params=dict(name="EF"),
)
ef.fit(X_train)
ef_population_test = ef.predict(X_test)
print(f"\nEfficient frontier produced {len(ef_population_test)} portfolios.")
fig = ef_population_test.plot_measures(
x=RiskMeasure.ANNUALIZED_VARIANCE,
y=RatioMeasure.ANNUALIZED_SHARPE_RATIO,
)
fig.show()We create simple benchmark portfolios using equal weighting, inverse volatility, and random allocation. We then build mean-variance portfolios, including minimum-variance and maximum Sharpe-ratio strategies. We also generate an efficient frontier and visualize the trade-off between portfolio risk and performance.
risk_measures = {
"Min CVaR-95": RiskMeasure.CVAR,
"Min Semi-Variance": RiskMeasure.SEMI_VARIANCE,
"Min CDaR": RiskMeasure.CDAR,
"Min Max Drawdown": RiskMeasure.MAX_DRAWDOWN,
}
risk_pop = Population([ptf_min_var, ptf_max_sharpe])
for name, rm in risk_measures.items():
m = MeanRisk(risk_measure=rm)
m.fit(X_train)
p = m.predict(X_test)
p.name = name
risk_pop.append(p)
print("\nRisk-measure comparison on test set:")
_summary = risk_pop.summary()
_wanted = ["Annualized Sharpe Ratio", "Annualized Sortino Ratio",
"CVaR at 95%", "Maximum Drawdown", "Max Drawdown"]
_have = [r for r in _wanted if r in _summary.index]
print(_summary.loc[_have].T)
rb_var = RiskBudgeting(risk_measure=RiskMeasure.VARIANCE)
rb_cvar = RiskBudgeting(risk_measure=RiskMeasure.CVAR)
rb_var.fit(X_train); rb_cvar.fit(X_train)
ptf_rb_var = rb_var.predict(X_test); ptf_rb_var.name = "Risk Parity (Var)"
ptf_rb_cvar = rb_cvar.predict(X_test); ptf_rb_cvar.name = "Risk Parity (CVaR)"
hrp = HierarchicalRiskParity(risk_measure=RiskMeasure.VARIANCE)
hrp.fit(X_train)
ptf_hrp = hrp.predict(X_test); ptf_hrp.name = "HRP"
nco = NestedClustersOptimization(
inner_estimator=MeanRisk(risk_measure=RiskMeasure.CVAR),
outer_estimator=RiskBudgeting(risk_measure=RiskMeasure.VARIANCE),
cv=KFold(n_splits=5),
n_jobs=-1,
)
nco.fit(X_train)
ptf_nco = nco.predict(X_test); ptf_nco.name = "Nested Clusters"
hrp.hierarchical_clustering_estimator_.plot_dendrogram().show()We compare different risk measures, including CVaR, semi-variance, CDaR, and maximum drawdown. We build risk-budgeting portfolios to more evenly distribute risk contributions across assets. We also apply hierarchical methods, such as HRP and Nested Clusters Optimization, to capture asset relationships through clustering.
robust = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=EmpiricalPrior(
mu_estimator=ShrunkMu(),
covariance_estimator=DenoiseCovariance(),
),
)
robust.fit(X_train)
ptf_robust = robust.predict(X_test); ptf_robust.name = "Max Sharpe (Robust)"
gerber = MeanRisk(
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=EmpiricalPrior(covariance_estimator=GerberCovariance()),
)
gerber.fit(X_train)
ptf_gerber = gerber.predict(X_test); ptf_gerber.name = "Min Var (Gerber)"
assets = list(X_train.columns)
group_a = assets[:10]; group_b = assets[10:]
groups = pd.DataFrame(
{a: ["GroupA" if a in group_a else "GroupB"] for a in assets},
index=["Sector"],
)
constrained = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
min_weights=0.0,
max_weights=0.20,
transaction_costs=0.0005,
groups=groups,
linear_constraints=[
"GroupA <= 0.6",
"GroupB >= 0.2",
],
l2_coef=0.01,
)
constrained.fit(X_train)
ptf_constr = constrained.predict(X_test); ptf_constr.name = "Constrained MV"
print("\nConstrained portfolio weights:")
print(pd.Series(constrained.weights_, index=assets).round(4))
print("\nAvailable tickers:", list(X_train.columns))
bl_views = [
"AAPL == 0.0008",
"JPM - BAC == 0.0002",
]
bl = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=BlackLitterman(views=bl_views),
)
bl.fit(X_train)
ptf_bl = bl.predict(X_test); ptf_bl.name = "Black-Litterman"We improve portfolio stability by using robust estimators such as shrunk mean, denoised covariance, and Gerber covariance. We add real-world constraints like maximum asset weights, group limits, transaction costs, and L2 regularization. We also apply Black-Litterman views to combine market-based assumptions with our own return expectations.
factor_prices = load_factors_dataset()
X_full, F_full = prices_to_returns(prices, factor_prices)
X_tr, X_te, F_tr, F_te = train_test_split(
X_full, F_full, test_size=0.33, shuffle=False
)
fm = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=FactorModel(),
)
fm.fit(X_tr, F_tr)
ptf_fm = fm.predict(X_te); ptf_fm.name = "Factor Model"
print(f"\nFactor-model Sharpe: {ptf_fm.annualized_sharpe_ratio:.3f}")
pipe = Pipeline([
("preselect", SelectKExtremes(k=8, highest=True)),
("optimize", MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE)),
])
pipe.fit(X_train)
ptf_pipe = pipe.predict(X_test); ptf_pipe.name = "Top-8 + Max Sharpe"
wf_model = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
)
mp_portfolio = cross_val_predict(
wf_model, X,
cv=WalkForward(train_size=252*2, test_size=63),
n_jobs=-1,
)
mp_portfolio.name = "Walk-Forward Max Sharpe"
print(f"\nWalk-forward portfolio Sharpe={mp_portfolio.annualized_sharpe_ratio:.3f} "
f"CalmarRatio={mp_portfolio.calmar_ratio:.3f}")
mp_portfolio.plot_cumulative_returns().show()
tuned = MeanRisk(
objective_function=ObjectiveFunction.MAXIMIZE_RATIO,
risk_measure=RiskMeasure.VARIANCE,
prior_estimator=EmpiricalPrior(mu_estimator=EWMu(alpha=0.1)),
)
grid = GridSearchCV(
estimator=tuned,
cv=WalkForward(train_size=252*2, test_size=63),
n_jobs=-1,
param_grid={
"l2_coef": [0.0, 0.01, 0.1],
"prior_estimator__mu_estimator__alpha": [0.05, 0.1, 0.2, 0.5],
},
)
grid.fit(X_train)
print("\nBest params:", grid.best_params_)
print(f"Best CV score (Sharpe): {grid.best_score_:.3f}")
ptf_tuned = grid.best_estimator_.predict(X_test); ptf_tuned.name = "Tuned Max Sharpe"
final = Population([
*baseline_population,
ptf_min_var, ptf_max_sharpe,
ptf_rb_var, ptf_rb_cvar,
ptf_hrp, ptf_nco,
ptf_robust, ptf_gerber,
ptf_constr, ptf_bl, ptf_fm,
ptf_pipe, ptf_tuned,
])
_full = final.summary()
_wanted_final = [
"Annualized Mean", "Annualized Standard Deviation",
"Annualized Sharpe Ratio", "Annualized Sortino Ratio",
"CVaR at 95%", "Maximum Drawdown", "Max Drawdown",
]
_have_final = [r for r in _wanted_final if r in _full.index]
summary = _full.loc[_have_final].T.sort_values(
"Annualized Sharpe Ratio", ascending=False
)
print("\n" + "=" * 80)
print("FINAL HORSE RACE — sorted by Sharpe (out-of-sample test set)")
print("=" * 80)
print(summary.to_string())
final.plot_cumulative_returns().show()
final.plot_composition().show()
ptf_rb_var.plot_contribution(measure=RiskMeasure.VARIANCE).show()
print("\nDone. Try swapping risk measures, adding constraints, or wiring in")
print("your own returns DataFrame — every estimator follows the sklearn API.")We build a factor model to explain asset returns using external factor data and optimize based on that structure. We create a pre-selection pipeline, run walk-forward validation, and tune hyperparameters using GridSearchCV. We finally compare all portfolio strategies in a single horse race using summary metrics, cumulative returns, composition, and risk-contribution plots.
In conclusion, we completed a full portfolio optimization workflow using Skfolio, moving from basic benchmark strategies to advanced model-driven portfolio construction techniques. We compared equal-weighted, inverse-volatility, mean-variance, risk-parity, hierarchical, robust, constrained, Black-Litterman, factor-based, and tuned portfolios on an out-of-sample test set. By using skfolio’s scikit-learn style API, we kept the workflow modular, readable, and easy to extend with new constraints, risk measures, estimators, or custom return data.
Check out the Full Codes here. Also, feel free to follow us on Twitter and don’t forget to join our 150k+ ML SubReddit and Subscribe to our Newsletter. Wait! are you on telegram? now you can join us on telegram as well.
Need to partner with us for promoting your GitHub Repo OR Hugging Face Page OR Product Release OR Webinar etc.? Connect with us
The post A Coding Implementation to Portfolio Optimization with skfolio for Building Testing, Tuning, and Comparing Modern Investment Strategies appeared first on MarkTechPost.




