Live Trading vs Backtesting Data Flow¶

Overview¶

RustyBT uses different data flow strategies for live trading and backtesting to optimize for real-time responsiveness and historical performance respectively. Understanding these differences is crucial for developing strategies that transition smoothly from backtest to production.

Data Flow Architecture¶

Backtesting Mode¶

┌──────────────┐
│  Algorithm   │
└──────┬───────┘
       │ get_spot_value()
       ▼
┌─────────────────┐
│  DataPortal     │
│  (use_cache=True│
└──────┬──────────┘
       │
       ▼
┌──────────────────┐
│ CachedDataSource │  ◄─── Wraps adapter automatically
└──────┬───────────┘
       │
       ├──[Fresh?]─────┐
       │               │
       ▼               ▼
   Cache Hit      Cache Miss
       │               │
       ▼               ▼
┌──────────┐    ┌──────────┐
│ Parquet  │    │ YFinance │
│ Bundle   │    │ Adapter  │
└──────────┘    └────┬─────┘
                     │
                     ▼
               [Write to Cache]

Key Characteristics: - Caching enabled by default - Data fetched once, reused across runs - Optimized for speed (10-15x faster) - Deterministic results (same data every time) - No API rate limits concern

Live Trading Mode¶

┌──────────────┐
│  Algorithm   │
└──────┬───────┘
       │ get_spot_value()
       ▼
┌─────────────────┐
│  DataPortal     │
│  (use_cache=False
└──────┬──────────┘
       │
       ▼
┌──────────────┐
│   Alpaca     │  ◄─── Direct adapter access
│   Adapter    │
└──────┬───────┘
       │
       ▼
┌──────────────┐
│  REST API /  │
│  WebSocket   │
└──────────────┘

Key Characteristics: - No caching (fresh data every request) - Real-time market data - WebSocket support for streaming - API rate limits apply - Non-deterministic (market evolves)

Code Comparison¶

Backtesting Example¶

from rustybt import TradingAlgorithm
from rustybt.data.sources import DataSourceRegistry
import pandas as pd

# Get data source
source = DataSourceRegistry.get_source("yfinance")

# Create backtest algorithm
class MyStrategy(TradingAlgorithm):
    def initialize(self):
        self.symbols = ["AAPL", "MSFT"]

    def handle_data(self, context, data):
        # This data comes from cache (fast!)
        prices = data.current(context.symbols, "close")
        # ... trading logic

# Run backtest with caching
algo = MyStrategy(
    data_source=source,
    live_trading=False,  # ◄─── Cache enabled
    start=pd.Timestamp("2023-01-01"),
    end=pd.Timestamp("2023-12-31")
)

results = algo.run()
print(f"Cache hit rate: {algo.data_portal.cache_hit_rate:.1f}%")
# Output: Cache hit rate: 87.3%

Live Trading Example¶

from rustybt import TradingAlgorithm
from rustybt.data.sources import DataSourceRegistry

# Get live data source (with API credentials)
source = DataSourceRegistry.get_source(
    "alpaca",
    api_key=os.getenv("ALPACA_API_KEY"),
    api_secret=os.getenv("ALPACA_API_SECRET"),
    paper_trading=True
)

# Create live trading algorithm
class MyStrategy(TradingAlgorithm):
    def initialize(self):
        self.symbols = ["AAPL", "MSFT"]

    def handle_data(self, context, data):
        # This data comes directly from Alpaca API (real-time!)
        prices = data.current(context.symbols, "close")
        # ... trading logic

# Run live with real-time data
algo = MyStrategy(
    data_source=source,
    live_trading=True,  # ◄─── No cache, real-time
)

algo.run()  # Runs indefinitely until stopped

When to Use Each Mode¶

Use Backtesting Mode When:¶

✅ Developing and testing strategies ✅ Optimizing parameters ✅ Validating historical performance ✅ Running Monte Carlo simulations ✅ Performance is critical (need fast iteration) ✅ You want deterministic, reproducible results

Use Live Trading Mode When:¶

✅ Deploying to production ✅ Paper trading with live data ✅ Real-time signal generation ✅ You need fresh market data ✅ Testing real-time execution latency ✅ WebSocket streaming is required

Data Freshness Considerations¶

Backtesting: Controlled Freshness¶

from rustybt.data.sources.cached_source import CachedDataSource
from rustybt.data.sources.freshness import MarketCloseFreshnessPolicy

cached_source = CachedDataSource(
    adapter=yfinance_source,
    freshness_policy=MarketCloseFreshnessPolicy(
        market_close_time="16:00",
        timezone="America/New_York"
    )
)

algo = MyStrategy(
    data_source=cached_source,
    live_trading=False
)

# Data refreshes after market close each day
# Backtest uses consistent historical data

Live Trading: Real-Time Freshness¶

# No caching - always fresh
algo = MyStrategy(
    data_source=alpaca_source,
    live_trading=True
)

# Every data.current() call fetches live data
# Real-time prices, no staleness risk

Performance Implications¶

Backtesting Performance¶

Dataset Size	Without Cache	With Cache	Speedup
Small (100 days, 10 symbols)	2.3s	0.2s	11.5x
Medium (1 year, 50 symbols)	18.7s	1.2s	15.6x
Large (5 years, 200 symbols)	127.4s	8.3s	15.3x

Key Insight: Caching provides consistent 10-15x speedup regardless of dataset size.

Live Trading Performance¶

Operation	Latency	Notes
REST API fetch	50-200ms	Depends on network
WebSocket update	5-20ms	Real-time streaming
Cache fetch	5-10ms	Not applicable in live mode

Key Insight: Live mode prioritizes freshness over speed. Use WebSocket for lowest latency.

Transitioning from Backtest to Live¶

Step 1: Validate in Backtest¶

# Test strategy with cached data
backtest_algo = MyStrategy(
    data_source=yfinance_source,
    live_trading=False,
    start=pd.Timestamp("2023-01-01"),
    end=pd.Timestamp("2023-12-31")
)

results = backtest_algo.run()
assert results["sharpe_ratio"] > 1.0, "Strategy underperforming"

Step 2: Paper Trade with Live Data¶

# Test with real-time data but simulated orders
paper_algo = MyStrategy(
    data_source=alpaca_source,
    live_trading=True,
    # Broker uses paper trading mode
)

paper_algo.run()
# Monitor for 1 week, ensure no errors

Step 3: Go Live (Production)¶

# Deploy to production with real money
live_algo = MyStrategy(
    data_source=alpaca_source,
    live_trading=True,
    # Broker uses live trading mode
)

live_algo.run()

Common Pitfalls¶

Pitfall 1: Caching in Live Mode¶

❌ Wrong: Enabling cache in live trading

algo = MyStrategy(
    data_source=yfinance_source,
    live_trading=True
)
algo.data_portal.use_cache = True  # ⚠️ Stale data risk!

✅ Correct: No cache in live mode

algo = MyStrategy(
    data_source=alpaca_source,
    live_trading=True  # Cache automatically disabled
)

Pitfall 2: Using Backtest Data Source in Live Mode¶

❌ Wrong: YFinance (delayed data) in live trading

algo = MyStrategy(
    data_source=yfinance_source,  # ⚠️ 15-minute delay!
    live_trading=True
)

✅ Correct: Use real-time data source

algo = MyStrategy(
    data_source=alpaca_source,  # ✓ Real-time
    live_trading=True
)

Pitfall 3: Ignoring API Rate Limits¶

❌ Wrong: Excessive API calls in live mode

def handle_data(self, context, data):
    for symbol in context.symbols:  # 100 symbols
        price = data.current(symbol, "close")  # 100 API calls per bar!

✅ Correct: Batch API calls

def handle_data(self, context, data):
    prices = data.current(context.symbols, "close")  # 1 API call

Pitfall 4: Time Zone Confusion¶

❌ Wrong: Assuming UTC in live mode

now = pd.Timestamp.now()  # ⚠️ Local timezone
if now.hour == 9:  # Market open?
    self.order(...)

✅ Correct: Explicit timezone handling

now = pd.Timestamp.now(tz="America/New_York")
market_open = now.replace(hour=9, minute=30)
if now >= market_open:
    self.order(...)

Testing Strategies for Both Modes¶

Unit Test: Backtest Mode¶

def test_strategy_backtest():
    """Test strategy with cached data."""
    source = YFinanceDataSource()

    algo = MyStrategy(
        data_source=source,
        live_trading=False,
        start=pd.Timestamp("2023-01-01"),
        end=pd.Timestamp("2023-12-31")
    )

    results = algo.run()

    assert results["total_return"] > 0
    assert algo.data_portal.cache_hit_rate > 80  # Cache working

Integration Test: Live Mode (Paper Trading)¶

def test_strategy_live_paper():
    """Test strategy with live data (paper trading)."""
    source = AlpacaDataSource(
        api_key=os.getenv("ALPACA_API_KEY"),
        api_secret=os.getenv("ALPACA_API_SECRET"),
        paper_trading=True  # ✓ Safe testing
    )

    algo = MyStrategy(
        data_source=source,
        live_trading=True
    )

    # Run for 5 minutes
    algo.run(duration=pd.Timedelta(minutes=5))

    assert algo.data_portal.cache_hit_rate == 0  # No cache in live mode
    assert len(algo.blotter.orders) >= 0  # Orders executed

Debugging Tips¶

Enable Detailed Logging¶

import structlog
import logging

# Configure logging
logging.basicConfig(level=logging.DEBUG)
logger = structlog.get_logger()

# Log data fetches
logger.debug("data_fetch", source="alpaca", symbols=["AAPL"], mode="live")

Compare Backtest vs Live Results¶

# Run backtest
backtest_results = run_backtest(MyStrategy, start="2023-01-01", end="2023-12-31")

# Run paper trading for 1 month
paper_results = run_live_paper(MyStrategy, duration=pd.Timedelta(days=30))

# Compare
print(f"Backtest Sharpe: {backtest_results['sharpe_ratio']:.2f}")
print(f"Paper Sharpe: {paper_results['sharpe_ratio']:.2f}")
assert abs(backtest_results['sharpe_ratio'] - paper_results['sharpe_ratio']) < 0.5

Monitor Cache Statistics¶

# After backtest
stats = algo.data_portal.data_source.get_stats()
print(f"Cache hit rate: {stats['hit_rate']}%")
print(f"Cache size: {stats['size_mb']:.2f} MB")

if stats['hit_rate'] < 80:
    print("⚠️  Low cache hit rate - consider pre-warming cache")

Best Practices¶

1. Separate Configuration Files¶

# config/backtest.yaml
data:
  source: yfinance
  cache: true
  freshness_policy: market_close

# config/live.yaml
data:
  source: alpaca
  cache: false
  streaming: true

2. Use Feature Flags¶

import os

LIVE_TRADING = os.getenv("LIVE_TRADING", "false").lower() == "true"

if LIVE_TRADING:
    source = AlpacaDataSource(...)
    use_cache = False
else:
    source = YFinanceDataSource()
    use_cache = True

algo = MyStrategy(
    data_source=source,
    live_trading=LIVE_TRADING
)

3. Gradual Rollout¶

Week 1: Backtest with last 2 years of data
Week 2: Paper trade with live data (observe only)
Week 3: Paper trade with small position sizes
Week 4: Live trade with 10% of capital
Week 5+: Scale up if metrics match backtest