Optimization Storage Efficiency¶

Overview¶

When running parameter optimization with hundreds or thousands of backtests, storage efficiency becomes critical. RustyBT's new entry point detection feature dramatically reduces storage consumption during optimization operations.

The Storage Problem¶

Old Behavior (Import Analysis)¶

Previously, RustyBT used import analysis to capture all imported local modules recursively:

Single backtest with 5 imported modules:
├── my_strategy.py (2 KB)
├── utils/indicators.py (3 KB)
├── utils/risk.py (2 KB)
├── models/signals.py (4 KB)
└── config/params.py (1 KB)
Total per backtest: ~12 KB

100-iteration optimization: 12 KB × 100 = 1.2 MB

1000-iteration optimization: 12 KB × 1000 = 12 MB

New Behavior (Entry Point Only)¶

With entry point detection, only the main file is captured by default:

Single backtest:
└── my_strategy.py (2 KB)
Total per backtest: ~2 KB

100-iteration optimization: 2 KB × 100 = 200 KB (83% reduction)

1000-iteration optimization: 2 KB × 1000 = 2 MB (83% reduction)

Real-World Example¶

Scenario: Grid Search Optimization¶

Strategy: RSI + Bollinger Bands momentum strategy Parameters: - RSI period: [10, 14, 20, 25, 30] - Bollinger std: [1.5, 2.0, 2.5, 3.0] - Total combinations: 5 × 4 = 20 backtests

Strategy structure:

my_strategy/
├── rsi_bb_strategy.py (main file, 3 KB)
├── indicators/
│   ├── rsi.py (4 KB)
│   └── bollinger.py (3 KB)
└── utils/
    ├── signals.py (5 KB)
    └── risk.py (4 KB)

Storage comparison:

Behavior	Per Backtest	20 Backtests	Savings
Old (import analysis)	19 KB (all files)	380 KB	-
New (entry point only)	3 KB (main file)	60 KB	84%

Storage Reduction by Optimization Scale¶

Iterations	Old Storage	New Storage	Reduction
10	190 KB	30 KB	84%
50	950 KB	150 KB	84%
100	1.9 MB	300 KB	84%
500	9.5 MB	1.5 MB	84%
1000	19 MB	3 MB	84%
5000	95 MB	15 MB	84%

Key insight: Storage grows linearly with iterations, not exponentially.

Performance Impact¶

Entry Point Detection Overhead¶

Entry point detection uses inspect.stack() for runtime introspection:

Operation	Time	Impact
Entry point detection	< 15ms	Negligible
File copy (entry point only)	~1ms	Minimal
Total overhead	< 20ms	<0.01% of typical backtest time

Example: For a backtest taking 5 seconds, entry point detection adds < 20ms (0.4% overhead).

100-Iteration Benchmark¶

Test setup: 100 backtests with 5-file strategy

Metric	Old Behavior	New Behavior	Improvement
Total runtime	503s	501s	0.4% (negligible)
Storage consumed	1.2 MB	200 KB	83% reduction
File operations	500 copies	100 copies	80% fewer I/O

When to Use YAML Configuration¶

While entry point detection is efficient, some scenarios benefit from explicit YAML configuration:

Use YAML When:¶

Multi-file reproducibility required: Strategy depends on specific versions of utility modules

# strategy.yaml
files:
  - my_strategy.py
  - utils/indicators.py  # Specific indicator implementations
  - config/params.json   # Exact parameters used

Complex project structure: Multiple entry points or dynamic imports

# strategy.yaml
files:
  - strategies/momentum_strategy.py
  - common/base_strategy.py
  - common/data_loader.py

Regulatory/audit requirements: Need complete code snapshot for compliance
Debugging optimization failures: Want to inspect all code for a specific run

Use Default (Entry Point Only) When:¶

Large-scale optimizations: 100+ iterations where storage matters
Single-file strategies: Everything in one file
Non-critical exploratory runs: Don't need full reproducibility
CI/CD automated testing: Frequent runs with storage constraints

Monitoring Storage Usage¶

Check Storage Consumption¶

# View total backtest storage
du -sh ~/.zipline/backtests/

# View storage per backtest
du -sh ~/.zipline/backtests/*/

# Count captured files
find ~/.zipline/backtests/ -name "*.py" | wc -l

Example Output (100-iteration optimization)¶

Old behavior:

$ du -sh ~/.zipline/backtests/
1.2M    ~/.zipline/backtests/
$ find ~/.zipline/backtests/ -name "*.py" | wc -l
500  # 5 files × 100 iterations

New behavior:

$ du -sh ~/.zipline/backtests/
200K    ~/.zipline/backtests/
$ find ~/.zipline/backtests/ -name "*.py" | wc -l
100  # 1 file × 100 iterations

Best Practices for Optimization Storage¶

1. Start Small, Scale Up¶

# Development: Small grid to test code
params = {
    'rsi_period': [14, 20],  # 2 values
    'bb_std': [2.0],         # 1 value
}
# Total: 2 backtests

# Production: Full grid after validation
params = {
    'rsi_period': [10, 14, 20, 25, 30],  # 5 values
    'bb_std': [1.5, 2.0, 2.5, 3.0],      # 4 values
}
# Total: 20 backtests

2. Clean Up Old Backtests¶

# Remove backtests older than 30 days
find ~/.zipline/backtests/ -type d -mtime +30 -exec rm -rf {} +

# Archive important runs
tar -czf important_backtest_20250121.tar.gz ~/.zipline/backtests/20250121_*/

3. Use Explicit YAML for Production¶

# For production optimization runs requiring full reproducibility
# Create strategy.yaml listing all dependencies

4. Monitor Disk Space¶

# Check available disk space before large optimization
df -h ~/.zipline/backtests/

# Set up alerts for disk usage (example for Linux/macOS)
if [ $(df -P ~/.zipline/ | tail -1 | awk '{print $5}' | sed 's/%//') -gt 80 ]; then
    echo "Warning: Disk usage above 80%"
fi

Troubleshooting¶

Storage Still Growing Too Fast¶

Check: Are you using YAML configuration?

# Look for strategy.yaml files
find . -name "strategy.yaml"

Solution: Remove YAML files to use default entry point detection, or reduce file list in YAML.

Entry Point Not Detected¶

Symptom: No code files stored at all

Solution: Create explicit strategy.yaml:

# strategy.yaml (minimal)
files:
  - my_strategy.py

Optimization Crashes Due to Disk Full¶

Prevention: 1. Monitor disk space before large runs 2. Use entry point detection (default behavior) 3. Clean up old backtests regularly 4. Consider external storage for archival

Migration from Old Behavior¶

If You Have Existing Optimization Workflows¶

No changes needed if you're satisfied with storage usage.

To adopt new behavior: 1. Remove any strategy.yaml files 2. Re-run optimization 3. Verify storage reduction

Backward compatibility: 100% maintained. Existing YAML configs work exactly as before.

Code Capture Guide - Detailed code capture documentation
Optimization API Overview - Parameter optimization strategies
Installation Guide - Installing RustyBT
Optimization Caching - Performance optimization