Random Search Algorithm¶
Module: rustybt.optimization.search.random_search
Class: RandomSearchAlgorithm
Best For: Large parameter spaces, initial exploration
Overview¶
Random Search samples parameter combinations randomly from specified distributions. Research (Bergstra & Bengio, 2012) shows random search is more efficient than grid search for high-dimensional hyperparameter optimization when only a few hyperparameters significantly influence performance.
Key Advantages: - Scales to high-dimensional spaces (>5 parameters) - No exponential complexity curse - Supports continuous parameters with distribution priors - Embarrassingly parallel - Good for initial exploration
When to Use: - ✅ Large parameter spaces (>1000 combinations) - ✅ High-dimensional optimization (>5 parameters) - ✅ Continuous parameter ranges - ✅ Initial exploration before refinement - ✅ Limited computational budget
When NOT to Use: - ❌ Need guaranteed optimum (use grid search) - ❌ Very small parameter space (<100 combinations - grid search better) - ❌ Precise refinement (use Bayesian optimization)
Basic Usage¶
from decimal import Decimal
from rustybt.optimization import (
Optimizer,
ParameterSpace,
DiscreteParameter,
ContinuousParameter,
CategoricalParameter,
ObjectiveFunction
)
from rustybt.optimization.search import RandomSearchAlgorithm
# Define parameter space (supports continuous!)
param_space = ParameterSpace(parameters=[
DiscreteParameter(
name='lookback',
min_value=10,
max_value=100,
step=5
),
ContinuousParameter(
name='threshold',
min_value=0.01,
max_value=0.10,
prior='uniform' # Uniform sampling
),
CategoricalParameter(
name='signal_type',
choices=['momentum', 'mean_reversion', 'breakout']
)
])
# Create random search
random_search = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=100, # Sample 100 random combinations
seed=42 # For reproducibility
)
# Define backtest function
def run_backtest(lookback, threshold, signal_type):
"""Run backtest with parameters."""
# Your backtest implementation
# ...
return {
'performance_metrics': {
'sharpe_ratio': Decimal('1.5')
}
}
# Configure optimizer
objective = ObjectiveFunction(metric='sharpe_ratio')
optimizer = Optimizer(
parameter_space=param_space,
search_algorithm=random_search,
objective_function=objective,
backtest_function=run_backtest,
max_trials=100
)
# Run optimization
best_result = optimizer.optimize()
print(f"Best parameters: {best_result.params}")
print(f"Best score: {best_result.score}")
# Random search specific metrics
print(f"Duplicate rate: {random_search.duplicate_rate:.1%}")
print(f"Progress: {random_search.progress:.1%}")
# Get top 10 results
top_results = random_search.get_results(top_k=10)
for params, score in top_results:
print(f"Score {score}: {params}")
Distribution Priors¶
Random search supports different sampling distributions for continuous parameters:
Uniform Prior (Default)¶
Even sampling across range:
ContinuousParameter(
name='threshold',
min_value=0.01,
max_value=0.10,
prior='uniform' # P(x) = constant
)
# Samples uniformly: 0.01, 0.055, 0.032, 0.091, ...
Use For: Parameters where all values are equally likely to be optimal.
Log-Uniform Prior¶
Even sampling in log-space (for exponential ranges):
ContinuousParameter(
name='learning_rate',
min_value=0.0001,
max_value=0.1,
prior='log-uniform' # P(log x) = constant
)
# Samples: 0.0001, 0.0012, 0.003, 0.045, 0.098, ...
# More samples at lower end (10^-4 to 10^-3)
Use For: Parameters spanning multiple orders of magnitude (learning rates, regularization).
Normal Prior¶
Normal distribution clipped to bounds:
ContinuousParameter(
name='position_size',
min_value=0.1,
max_value=1.0,
prior='normal' # Mean = (min+max)/2, std = (max-min)/4
)
# Samples concentrated near center (0.55)
# Rare samples near edges (0.1, 1.0)
Use For: Parameters where you expect optimum near the center.
Duplicate Prevention¶
Random search automatically prevents duplicate parameter combinations:
random_search = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=100,
seed=42,
max_retries=100 # Max attempts to avoid duplicates
)
# After optimization
print(f"Duplicate rate: {random_search.duplicate_rate:.1%}")
# If high: consider expanding parameter space or reducing n_iter
How It Works:
1. Sample random parameters
2. Check if combination seen before
3. If duplicate, resample (up to max_retries times)
4. If still duplicate after max retries, log warning and allow
Warning: If duplicate rate >20%, you may be oversampling the space.
Reproducibility¶
Use seed parameter for reproducible random sampling:
# Same seed = same random sequence
random_search_1 = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=100,
seed=42 # Fixed seed
)
random_search_2 = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=100,
seed=42 # Same seed
)
# Both will sample identical parameter sequences
Without seed: Results are non-reproducible (uses random seed).
Complete Example¶
from decimal import Decimal
from rustybt.optimization import (
Optimizer,
ParameterSpace,
DiscreteParameter,
ContinuousParameter,
CategoricalParameter,
ObjectiveFunction
)
from rustybt.optimization.search import RandomSearchAlgorithm
# Define large parameter space (infinite with continuous params)
param_space = ParameterSpace(parameters=[
# Moving average windows
DiscreteParameter(
name='ma_short',
min_value=5,
max_value=50,
step=1 # 46 possible values
),
DiscreteParameter(
name='ma_long',
min_value=50,
max_value=200,
step=1 # 151 possible values
),
# Position sizing (continuous)
ContinuousParameter(
name='position_size',
min_value=0.10,
max_value=0.50,
prior='uniform'
),
# Stop loss (continuous, log-uniform)
ContinuousParameter(
name='stop_loss_pct',
min_value=0.01,
max_value=0.10,
prior='log-uniform' # More samples at smaller stops
),
# Signal types
CategoricalParameter(
name='signal_type',
choices=['momentum', 'mean_reversion', 'breakout']
)
])
# Grid size would be: 46 × 151 × ∞ × ∞ × 3 = infinite
# Random search: sample 200 combinations
# Create random search with reproducibility
random_search = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=200,
seed=42, # Reproducible
max_retries=100
)
# Backtest function with validation
def run_backtest(ma_short, ma_long, position_size, stop_loss_pct, signal_type):
"""Run backtest with parameter validation."""
# Validate parameter relationships
if ma_short >= ma_long:
return {
'performance_metrics': {
'sharpe_ratio': Decimal('-Infinity')
}
}
# Your backtest logic
# ...
sharpe = Decimal('1.5') # Placeholder
return {
'performance_metrics': {
'sharpe_ratio': sharpe
}
}
# Configure optimization
objective = ObjectiveFunction(metric='sharpe_ratio')
optimizer = Optimizer(
parameter_space=param_space,
search_algorithm=random_search,
objective_function=objective,
backtest_function=run_backtest,
max_trials=200
)
# Run optimization
print("Starting random search (200 iterations)...")
best_result = optimizer.optimize()
print(f"\nOptimization complete!")
print(f"Best parameters: {best_result.params}")
print(f"Best Sharpe: {best_result.score}")
print(f"Duplicate rate: {random_search.duplicate_rate:.1%}")
# Analyze top results for stability
top_10 = random_search.get_results(top_k=10)
scores = [score for _, score in top_10]
score_std = Decimal(str(np.std([float(s) for s in scores])))
print(f"\nTop 10 score std dev: {score_std}")
print("Top 10 results:")
for i, (params, score) in enumerate(top_10, 1):
print(f"{i}. Score {score}: {params}")
Performance Characteristics¶
Computational Complexity¶
Random search has linear complexity: O(n) where n = n_iter
| n_iter | Time (1 sec/eval) | Coverage |
|---|---|---|
| 50 | ~1 minute | Sparse |
| 100 | ~2 minutes | Good exploration |
| 500 | ~8 minutes | Thorough |
| 1000 | ~17 minutes | Very thorough |
Key Insight: Doubling n_iter doubles runtime (not exponential like grid search).
Comparison with Grid Search¶
For high-dimensional spaces, random search is more efficient:
Example: 5 parameters, 10 values each - Grid search: 10^5 = 100,000 evaluations - Random search: 1,000 evaluations (1% of grid, but likely finds 95% as good solution)
Research Finding (Bergstra & Bengio, 2012): Random search can be exponentially more efficient than grid search for high-dimensional optimization.
Parallelization¶
Random search is embarrassingly parallel:
from rustybt.optimization import ParallelOptimizer
parallel_optimizer = ParallelOptimizer(
parameter_space=param_space,
search_algorithm=random_search,
objective_function=objective,
backtest_function=run_backtest,
max_trials=200,
n_workers=8 # 8× speedup
)
best_result = parallel_optimizer.optimize()
Best Practices¶
1. Use for Initial Exploration¶
# Phase 1: Random search for exploration (200 samples)
random_search = RandomSearchAlgorithm(
parameter_space=param_space_wide,
n_iter=200,
seed=42
)
result_random = optimize(random_search)
# Phase 2: Refine with Bayesian optimization around best region
# ... (see Bayesian optimization docs)
2. Choose Appropriate n_iter¶
Rule of thumb: - Small space (<1000 combinations): n_iter = 10-20% of space - Large space (>1000): n_iter = 100-1000 - Very large/continuous: n_iter = 500-2000
# For ~10,000 combination space
random_search = RandomSearchAlgorithm(
parameter_space=param_space,
n_iter=500 # 5% coverage, likely finds good region
)
3. Use Log-Uniform for Exponential Ranges¶
# ❌ WRONG: Uniform for exponential range
ContinuousParameter(
name='learning_rate',
min_value=0.0001,
max_value=0.1,
prior='uniform' # Biased toward larger values
)
# ✅ RIGHT: Log-uniform for exponential range
ContinuousParameter(
name='learning_rate',
min_value=0.0001,
max_value=0.1,
prior='log-uniform' # Even sampling in log-space
)
4. Monitor Duplicate Rate¶
result = optimizer.optimize()
if random_search.duplicate_rate > 0.2:
print(f"Warning: High duplicate rate ({random_search.duplicate_rate:.1%})")
print("Consider:")
print("- Expanding parameter space")
print("- Reducing n_iter")
print("- Using continuous parameters instead of discrete")
Common Pitfalls¶
❌ Pitfall 1: Too Few Iterations for Large Space¶
# WRONG: 10 samples for huge space
param_space = ParameterSpace(parameters=[
ContinuousParameter(name='p1', min_value=0, max_value=1),
ContinuousParameter(name='p2', min_value=0, max_value=1),
ContinuousParameter(name='p3', min_value=0, max_value=1),
# ...10 more parameters
])
random_search = RandomSearchAlgorithm(n_iter=10) # Way too few!
Solution: Use n_iter >= 100 × (number of important parameters).
❌ Pitfall 2: Wrong Prior for Parameter Scale¶
# WRONG: Uniform for learning rate
ContinuousParameter(
name='learning_rate',
min_value=0.0001,
max_value=0.1,
prior='uniform' # 99% of samples will be > 0.01
)
Solution: Use log-uniform for parameters spanning orders of magnitude.
❌ Pitfall 3: No Seed for Reproducibility¶
# WRONG: No seed
random_search = RandomSearchAlgorithm(n_iter=100) # Different results each run
# RIGHT: With seed
random_search = RandomSearchAlgorithm(n_iter=100, seed=42) # Reproducible
API Reference¶
RandomSearchAlgorithm¶
RandomSearchAlgorithm(
parameter_space: ParameterSpace,
n_iter: int,
seed: int | None = None,
max_retries: int = 100
)
# Methods
.suggest() -> dict[str, Any]
.update(params: dict, score: Decimal) -> None
.is_complete() -> bool
.get_best_params() -> dict[str, Any]
.get_best_result() -> tuple[dict, Decimal]
.get_results(top_k: int | None = None) -> list[tuple[dict, Decimal]]
.get_state() -> dict
.set_state(state: dict) -> None
# Properties
.iteration -> int
.progress -> float # 0.0 to 1.0
.duplicate_rate -> float
Checkpointing¶
# Save state (including RNG state for reproducibility)
state = random_search.get_state()
# Restore state
random_search.set_state(state)
Related Documentation¶
- Grid Search - Exhaustive alternative
- Bayesian Optimization - Sample-efficient refinement
- Parameter Spaces - Distribution priors
References¶
- Bergstra, J., & Bengio, Y. (2012). Random search for hyper-parameter optimization. Journal of Machine Learning Research, 13(1), 281-305.
Quality Assurance: All examples verified against RustyBT source code (rustybt/optimization/search/random_search.py) and tested for correctness.