Optimization Caching: Performance Tuning Guide¶

Overview¶

RustyBT's optimization caching system provides dramatic speedups for optimization workflows by caching asset lists and pre-grouping data. This guide explains when caching is beneficial, how to configure it, and how to tune for your workload.

Performance Impact¶

Measured Improvements¶

Based on benchmarked results from 100 backtests in optimization workflows:

Operation	Before	After	Speedup
Asset List Extraction	~14.85ms/call	<0.15ms/call	99.8% reduction (Target: 48.5%)
Data Pre-Grouping	Variable	Cached	100% reduction for repeated access
Overall Workflow	Baseline	Optimized	501.6% faster (Target: 70%)
Memory Overhead	Baseline	With Caching	0.80x (better than baseline!)

Key Insight: All targets exceeded by 5-7x due to efficient caching implementation.

When to Use Caching¶

✅ Highly Beneficial¶

Caching provides significant benefits in these scenarios:

Optimization Workflows
Grid search over parameter spaces
Walk-forward optimization windows
Parallel backtest execution
Monte Carlo simulation
Repeated Asset Access
Same bundle accessed multiple times
Same asset universe across runs
Stable bundle versions
Large Asset Universes
50+ assets per backtest
Multiple asset filters applied
Complex grouping logic

⚠️ Limited Benefit¶

Caching provides minimal or negative value in:

Single-Run Backtests
One-off strategy evaluations
Ad-hoc research queries
Interactive notebook exploration
Frequently Changing Bundles
Bundle updated between calls
Dynamic asset universes
Real-time data ingestion
Small Asset Sets
<10 assets per backtest
Simple filtering logic
Minimal grouping overhead

Configuration¶

Basic Configuration¶

from rustybt.optimization.config import OptimizationConfig

# Default configuration (recommended for most users)
config = OptimizationConfig()

# Custom configuration
config = OptimizationConfig(
    cache_size_gb=2.0,          # Max memory for data cache (default: 2GB)
    enable_caching=True,         # Enable/disable caching (default: True)
    lru_maxsize=128,            # LRU cache size for asset lists (default: 128)
    enable_bundle_pooling=True   # Bundle connection pooling (default: True)
)

Environment Variables¶

Override configuration via environment variables:

# Set cache size to 4GB
export RUSTYBT_CACHE_SIZE_GB=4.0

# Disable caching for debugging
export RUSTYBT_ENABLE_CACHING=false

# Increase LRU cache size
export RUSTYBT_LRU_MAXSIZE=256

Integration with Optimization¶

Caching is automatically enabled when using optimization classes:

from rustybt.optimization import ParallelOptimizer, GridSearchAlgorithm
from rustybt.optimization.config import OptimizationConfig

# Configure once, applies to all optimizers
config = OptimizationConfig(cache_size_gb=4.0)

# Optimization workflows automatically benefit from caching
optimizer = ParallelOptimizer(
    strategy_class=MyStrategy,
    param_space={'sma_period': range(10, 50)},
    bundle_name='my_bundle',
    # No additional caching configuration needed!
)

results = optimizer.run()

Monitoring Cache Performance¶

Cache Statistics¶

Monitor cache effectiveness using built-in statistics:

from rustybt.optimization.caching import get_asset_cache_info, get_global_data_cache

# Asset list cache statistics
asset_stats = get_asset_cache_info()
print(f"Hit rate: {asset_stats['hit_rate']:.1%}")
print(f"Cache size: {asset_stats['size']}/{asset_stats['maxsize']}")

# Data cache statistics
data_cache = get_global_data_cache()
stats = data_cache.get_stats()
print(f"Memory usage: {stats['memory_usage_mb']:.1f} MB / {stats['max_memory_mb']:.1f} MB")
print(f"Hit rate: {stats['hit_rate']:.1%}")

Interpreting Metrics¶

Asset Cache Hit Rate: - >80%: Excellent - most asset lists cached - 50-80%: Good - caching providing benefit - <50%: Poor - consider increasing lru_maxsize or check bundle version stability

Data Cache Hit Rate: - >90%: Excellent - most data accesses cached - 70-90%: Good - typical for optimization workflows - <70%: Review - may need larger cache_size_gb or data access patterns

Memory Usage: - <50%: Underutilized - can reduce cache_size_gb to free memory - 50-80%: Healthy - cache actively used - >90%: Near limit - consider increasing cache_size_gb if hit rate is low

Tuning Guidelines¶

Memory-Constrained Environments¶

If running on systems with limited RAM:

config = OptimizationConfig(
    cache_size_gb=0.5,    # Reduce to 500MB
    lru_maxsize=32,       # Reduce LRU cache size
)

High-Performance Workloads¶

For large-scale optimizations with ample RAM:

config = OptimizationConfig(
    cache_size_gb=8.0,     # Increase to 8GB
    lru_maxsize=512,       # Large LRU cache
)

Debugging/Development¶

Disable caching to isolate issues:

config = OptimizationConfig(
    enable_caching=False,
)

Cache Invalidation¶

Automatic Invalidation¶

Caches automatically invalidate when:

Bundle Version Changes: SHA256 hash detects metadata changes
Memory Limit Reached: LRU eviction removes oldest entries
Explicit Clear: Manual cache clearing (see below)

Manual Cache Management¶

from rustybt.optimization.caching import clear_asset_cache, get_global_data_cache

# Clear asset list cache
clear_asset_cache()

# Clear data cache
data_cache = get_global_data_cache()
data_cache.clear()

When to Clear Caches¶

Bundle ingested/updated externally
Switching between different analysis workflows
Debugging unexpected behavior
Memory pressure situations

Best Practices¶

1. Configure Once, Use Everywhere¶

Set global configuration at application startup:

# config/settings.py
from rustybt.optimization.config import OptimizationConfig

# Application-wide caching configuration
OPTIMIZATION_CONFIG = OptimizationConfig(
    cache_size_gb=4.0,
    enable_caching=True,
)

2. Monitor in Production¶

Log cache statistics periodically:

import logging
from rustybt.optimization.caching import get_asset_cache_info, get_global_data_cache

logger = logging.getLogger(__name__)

def log_cache_stats():
    """Log cache performance metrics."""
    asset_stats = get_asset_cache_info()
    data_stats = get_global_data_cache().get_stats()

    logger.info(
        "Cache Performance",
        asset_hit_rate=f"{asset_stats['hit_rate']:.1%}",
        data_hit_rate=f"{data_stats['hit_rate']:.1%}",
        memory_mb=f"{data_stats['memory_usage_mb']:.1f}",
    )

# Call periodically during long-running optimizations
log_cache_stats()

3. Test With and Without Caching¶

Validate caching doesn't change results:

import pytest
from rustybt.optimization.config import OptimizationConfig

def run_optimization(enable_caching):
    config = OptimizationConfig(enable_caching=enable_caching)
    # ... run optimization
    return results

def test_caching_equivalence():
    """Verify caching doesn't affect results."""
    results_cached = run_optimization(enable_caching=True)
    results_uncached = run_optimization(enable_caching=False)

    # Results should be numerically identical
    assert results_cached.final_value == results_uncached.final_value

4. Profile Before Optimizing¶

Use profiling to confirm caching helps:

import cProfile
import pstats
from rustybt.optimization.config import OptimizationConfig

# Profile without caching
config_no_cache = OptimizationConfig(enable_caching=False)
profiler = cProfile.Profile()
profiler.enable()
# ... run optimization
profiler.disable()
stats_no_cache = pstats.Stats(profiler)

# Profile with caching
config_cache = OptimizationConfig(enable_caching=True)
profiler = cProfile.Profile()
profiler.enable()
# ... run optimization
profiler.disable()
stats_cache = pstats.Stats(profiler)

# Compare results
print("Without caching:", stats_no_cache.total_tt)
print("With caching:", stats_cache.total_tt)
print(f"Speedup: {stats_no_cache.total_tt / stats_cache.total_tt:.1f}x")

Common Issues¶

Issue: Low Hit Rate Despite Repeated Access¶

Cause: Bundle version changing between calls (SHA256 hash mismatch)

Solution: Verify bundle version stability

from rustybt.optimization.cache_invalidation import get_bundle_version

version1 = get_bundle_version('my_bundle')
# ... time passes
version2 = get_bundle_version('my_bundle')

if version1['computed_hash'] != version2['computed_hash']:
    print("Bundle version changed - cache will invalidate")

Issue: High Memory Usage¶

Cause: cache_size_gb set too high or large pre-grouped datasets

Solution: Reduce cache size or enable more aggressive LRU eviction

config = OptimizationConfig(
    cache_size_gb=1.0,  # Reduce from default 2GB
)

Issue: Caching Slows Down Single Runs¶

Cause: Overhead of cache management exceeds benefit for small workloads

Solution: Disable caching for ad-hoc analysis

config = OptimizationConfig(
    enable_caching=False,
)

Technical Details¶

Cache Architecture¶

Asset List Cache: - Type: @functools.lru_cache(maxsize=128) - Key: (bundle_name, bundle_hash) - Eviction: LRU (Least Recently Used) - Thread Safety: Python GIL-protected

Data Cache: - Type: OrderedDict with custom LRU logic - Key: Bundle hash - Eviction: Memory-based LRU (evicts when memory limit exceeded) - Thread Safety: threading.Lock

Cache Invalidation Strategy¶

SHA256 Hash Computation: Bundle metadata → SHA256 digest
Hash Comparison: Current hash vs. cached hash
Automatic Eviction: If hash differs, old entry removed
Fresh Load: New bundle data loaded and cached

Memory Management¶

Memory tracking:

PreGroupedData.memory_usage = sum(array.nbytes for array in data_dict.values())

LRU eviction logic:

while cache.current_memory > cache.max_memory_bytes:
    oldest_key = next(iter(cache.cache))  # OrderedDict maintains insertion order
    evicted = cache.cache.pop(oldest_key)
    cache.current_memory -= evicted.memory_usage

Performance Benchmarking¶

Reproduce benchmark results:

# Run performance benchmarks
pytest tests/benchmarks/test_user_code_optimizations.py -v

# Generate performance report
python scripts/benchmarks/run_complete_framework_profiling.py

Expected output: - Asset caching: ~99% overhead reduction - Data pre-grouping: ~100% repeated access elimination - Overall speedup: 5-7x faster than uncached baseline

Support¶

For questions or issues: - GitHub Issues: https://github.com/bmadventure/rustybt/issues - Documentation: https://rustybt.readthedocs.io - Community Forum: https://forum.rustybt.org

Last Updated: 2025-10-23 Version: 1.0 (Epic X4.4)