Optimization Caching: Performance Tuning Guide¶
Overview¶
RustyBT's optimization caching system provides dramatic speedups for optimization workflows by caching asset lists and pre-grouping data. This guide explains when caching is beneficial, how to configure it, and how to tune for your workload.
Performance Impact¶
Measured Improvements¶
Based on benchmarked results from 100 backtests in optimization workflows:
| Operation | Before | After | Speedup |
|---|---|---|---|
| Asset List Extraction | ~14.85ms/call | <0.15ms/call | 99.8% reduction (Target: 48.5%) |
| Data Pre-Grouping | Variable | Cached | 100% reduction for repeated access |
| Overall Workflow | Baseline | Optimized | 501.6% faster (Target: 70%) |
| Memory Overhead | Baseline | With Caching | 0.80x (better than baseline!) |
Key Insight: All targets exceeded by 5-7x due to efficient caching implementation.
When to Use Caching¶
✅ Highly Beneficial¶
Caching provides significant benefits in these scenarios:
- Optimization Workflows
- Grid search over parameter spaces
- Walk-forward optimization windows
- Parallel backtest execution
-
Monte Carlo simulation
-
Repeated Asset Access
- Same bundle accessed multiple times
- Same asset universe across runs
-
Stable bundle versions
-
Large Asset Universes
- 50+ assets per backtest
- Multiple asset filters applied
- Complex grouping logic
⚠️ Limited Benefit¶
Caching provides minimal or negative value in:
- Single-Run Backtests
- One-off strategy evaluations
- Ad-hoc research queries
-
Interactive notebook exploration
-
Frequently Changing Bundles
- Bundle updated between calls
- Dynamic asset universes
-
Real-time data ingestion
-
Small Asset Sets
- <10 assets per backtest
- Simple filtering logic
- Minimal grouping overhead
Configuration¶
Basic Configuration¶
from rustybt.optimization.config import OptimizationConfig
# Default configuration (recommended for most users)
config = OptimizationConfig()
# Custom configuration
config = OptimizationConfig(
cache_size_gb=2.0, # Max memory for data cache (default: 2GB)
enable_caching=True, # Enable/disable caching (default: True)
lru_maxsize=128, # LRU cache size for asset lists (default: 128)
enable_bundle_pooling=True # Bundle connection pooling (default: True)
)
Environment Variables¶
Override configuration via environment variables:
# Set cache size to 4GB
export RUSTYBT_CACHE_SIZE_GB=4.0
# Disable caching for debugging
export RUSTYBT_ENABLE_CACHING=false
# Increase LRU cache size
export RUSTYBT_LRU_MAXSIZE=256
Integration with Optimization¶
Caching is automatically enabled when using optimization classes:
from rustybt.optimization import ParallelOptimizer, GridSearchAlgorithm
from rustybt.optimization.config import OptimizationConfig
# Configure once, applies to all optimizers
config = OptimizationConfig(cache_size_gb=4.0)
# Optimization workflows automatically benefit from caching
optimizer = ParallelOptimizer(
strategy_class=MyStrategy,
param_space={'sma_period': range(10, 50)},
bundle_name='my_bundle',
# No additional caching configuration needed!
)
results = optimizer.run()
Monitoring Cache Performance¶
Cache Statistics¶
Monitor cache effectiveness using built-in statistics:
from rustybt.optimization.caching import get_asset_cache_info, get_global_data_cache
# Asset list cache statistics
asset_stats = get_asset_cache_info()
print(f"Hit rate: {asset_stats['hit_rate']:.1%}")
print(f"Cache size: {asset_stats['size']}/{asset_stats['maxsize']}")
# Data cache statistics
data_cache = get_global_data_cache()
stats = data_cache.get_stats()
print(f"Memory usage: {stats['memory_usage_mb']:.1f} MB / {stats['max_memory_mb']:.1f} MB")
print(f"Hit rate: {stats['hit_rate']:.1%}")
Interpreting Metrics¶
Asset Cache Hit Rate:
- >80%: Excellent - most asset lists cached
- 50-80%: Good - caching providing benefit
- <50%: Poor - consider increasing lru_maxsize or check bundle version stability
Data Cache Hit Rate:
- >90%: Excellent - most data accesses cached
- 70-90%: Good - typical for optimization workflows
- <70%: Review - may need larger cache_size_gb or data access patterns
Memory Usage:
- <50%: Underutilized - can reduce cache_size_gb to free memory
- 50-80%: Healthy - cache actively used
- >90%: Near limit - consider increasing cache_size_gb if hit rate is low
Tuning Guidelines¶
Memory-Constrained Environments¶
If running on systems with limited RAM:
config = OptimizationConfig(
cache_size_gb=0.5, # Reduce to 500MB
lru_maxsize=32, # Reduce LRU cache size
)
High-Performance Workloads¶
For large-scale optimizations with ample RAM:
config = OptimizationConfig(
cache_size_gb=8.0, # Increase to 8GB
lru_maxsize=512, # Large LRU cache
)
Debugging/Development¶
Disable caching to isolate issues:
Cache Invalidation¶
Automatic Invalidation¶
Caches automatically invalidate when:
- Bundle Version Changes: SHA256 hash detects metadata changes
- Memory Limit Reached: LRU eviction removes oldest entries
- Explicit Clear: Manual cache clearing (see below)
Manual Cache Management¶
from rustybt.optimization.caching import clear_asset_cache, get_global_data_cache
# Clear asset list cache
clear_asset_cache()
# Clear data cache
data_cache = get_global_data_cache()
data_cache.clear()
When to Clear Caches¶
- Bundle ingested/updated externally
- Switching between different analysis workflows
- Debugging unexpected behavior
- Memory pressure situations
Best Practices¶
1. Configure Once, Use Everywhere¶
Set global configuration at application startup:
# config/settings.py
from rustybt.optimization.config import OptimizationConfig
# Application-wide caching configuration
OPTIMIZATION_CONFIG = OptimizationConfig(
cache_size_gb=4.0,
enable_caching=True,
)
2. Monitor in Production¶
Log cache statistics periodically:
import logging
from rustybt.optimization.caching import get_asset_cache_info, get_global_data_cache
logger = logging.getLogger(__name__)
def log_cache_stats():
"""Log cache performance metrics."""
asset_stats = get_asset_cache_info()
data_stats = get_global_data_cache().get_stats()
logger.info(
"Cache Performance",
asset_hit_rate=f"{asset_stats['hit_rate']:.1%}",
data_hit_rate=f"{data_stats['hit_rate']:.1%}",
memory_mb=f"{data_stats['memory_usage_mb']:.1f}",
)
# Call periodically during long-running optimizations
log_cache_stats()
3. Test With and Without Caching¶
Validate caching doesn't change results:
import pytest
from rustybt.optimization.config import OptimizationConfig
def run_optimization(enable_caching):
config = OptimizationConfig(enable_caching=enable_caching)
# ... run optimization
return results
def test_caching_equivalence():
"""Verify caching doesn't affect results."""
results_cached = run_optimization(enable_caching=True)
results_uncached = run_optimization(enable_caching=False)
# Results should be numerically identical
assert results_cached.final_value == results_uncached.final_value
4. Profile Before Optimizing¶
Use profiling to confirm caching helps:
import cProfile
import pstats
from rustybt.optimization.config import OptimizationConfig
# Profile without caching
config_no_cache = OptimizationConfig(enable_caching=False)
profiler = cProfile.Profile()
profiler.enable()
# ... run optimization
profiler.disable()
stats_no_cache = pstats.Stats(profiler)
# Profile with caching
config_cache = OptimizationConfig(enable_caching=True)
profiler = cProfile.Profile()
profiler.enable()
# ... run optimization
profiler.disable()
stats_cache = pstats.Stats(profiler)
# Compare results
print("Without caching:", stats_no_cache.total_tt)
print("With caching:", stats_cache.total_tt)
print(f"Speedup: {stats_no_cache.total_tt / stats_cache.total_tt:.1f}x")
Common Issues¶
Issue: Low Hit Rate Despite Repeated Access¶
Cause: Bundle version changing between calls (SHA256 hash mismatch)
Solution: Verify bundle version stability
from rustybt.optimization.cache_invalidation import get_bundle_version
version1 = get_bundle_version('my_bundle')
# ... time passes
version2 = get_bundle_version('my_bundle')
if version1['computed_hash'] != version2['computed_hash']:
print("Bundle version changed - cache will invalidate")
Issue: High Memory Usage¶
Cause: cache_size_gb set too high or large pre-grouped datasets
Solution: Reduce cache size or enable more aggressive LRU eviction
Issue: Caching Slows Down Single Runs¶
Cause: Overhead of cache management exceeds benefit for small workloads
Solution: Disable caching for ad-hoc analysis
Technical Details¶
Cache Architecture¶
Asset List Cache:
- Type: @functools.lru_cache(maxsize=128)
- Key: (bundle_name, bundle_hash)
- Eviction: LRU (Least Recently Used)
- Thread Safety: Python GIL-protected
Data Cache:
- Type: OrderedDict with custom LRU logic
- Key: Bundle hash
- Eviction: Memory-based LRU (evicts when memory limit exceeded)
- Thread Safety: threading.Lock
Cache Invalidation Strategy¶
- SHA256 Hash Computation: Bundle metadata → SHA256 digest
- Hash Comparison: Current hash vs. cached hash
- Automatic Eviction: If hash differs, old entry removed
- Fresh Load: New bundle data loaded and cached
Memory Management¶
Memory tracking:
LRU eviction logic:
while cache.current_memory > cache.max_memory_bytes:
oldest_key = next(iter(cache.cache)) # OrderedDict maintains insertion order
evicted = cache.cache.pop(oldest_key)
cache.current_memory -= evicted.memory_usage
Performance Benchmarking¶
Reproduce benchmark results:
# Run performance benchmarks
pytest tests/benchmarks/test_user_code_optimizations.py -v
# Generate performance report
python scripts/benchmarks/run_complete_framework_profiling.py
Expected output: - Asset caching: ~99% overhead reduction - Data pre-grouping: ~100% repeated access elimination - Overall speedup: 5-7x faster than uncached baseline
Further Reading¶
- Epic X4: Performance Optimization PRD
- Optimization API Overview
- Grid Search Algorithm
- Walk-Forward Example
Support¶
For questions or issues: - GitHub Issues: https://github.com/bmadventure/rustybt/issues - Documentation: https://rustybt.readthedocs.io - Community Forum: https://forum.rustybt.org
Last Updated: 2025-10-23 Version: 1.0 (Epic X4.4)