Future Improvement Opportunities

Overview

This document outlines opportunities for improvement that were identified during the adversarial review but cannot be implemented without additional data or external resources. Each section describes the opportunity, the data required, and suggested implementation approaches.


1. Empirical Dispersion Parameter Calibration

Current State

The Monte Carlo simulator uses a hardcoded dispersion parameter of 1.3 for the Negative Binomial distribution. This value is reasonable based on MLB research but has not been validated against actual high school game scores.

Opportunity

Calibrate the dispersion parameter using empirical game-level data to improve simulation accuracy.

Data Required

  • Game-level scores for Colorado high school baseball games (ideally 2+ seasons)
  • Format: Date, Home_Team, Away_Team, Home_Score, Away_Score
  • Source possibilities:
    • MaxPreps game results pages
    • CHSAA (Colorado High School Activities Association) archives
    • Local newspaper sports sections

Implementation Approach

from scipy.stats import nbinom
from scipy.optimize import minimize

def fit_dispersion(actual_scores, expected_means):
    """
    Fit optimal dispersion parameter to historical game scores.
    
    Args:
        actual_scores: Array of actual runs scored
        expected_means: Array of expected runs (from team strength model)
    
    Returns:
        Optimal dispersion value
    """
    def neg_log_likelihood(dispersion):
        total_ll = 0
        for score, mean in zip(actual_scores, expected_means):
            if mean <= 0 or dispersion <= 1:
                return float('inf')
            variance = mean * dispersion
            p = mean / variance
            n = (mean ** 2) / (variance - mean)
            total_ll -= nbinom.logpmf(score, n, p)
        return total_ll
    
    result = minimize(neg_log_likelihood, x0=1.3, bounds=[(1.01, 3.0)])
    return result.x[0]

Expected Impact

  • More accurate win probability estimates
  • Better calibrated confidence intervals for season projections
  • Reduced systematic bias in blowout/shutout predictions

References

  • Lindsey, G.R. “An Investigation of Strategies in Baseball.” Operations Research 11.4 (1963): 477-501.

2. Park Effects / Field Dimensions

Current State

The simulator applies a flat 10% home field advantage multiplier regardless of the specific field being played on.

Opportunity

Incorporate park factors based on field dimensions to adjust run expectations by venue.

Data Required

  • Field dimensions for each high school baseball field:
    • Left field line distance
    • Center field distance
    • Right field line distance
    • Fence height
    • Altitude (relevant for Colorado)
  • Historical scoring by venue (optional but helpful for validation)

Implementation Approach

# Park factor calculation (simplified)
def calculate_park_factor(left_field, center_field, right_field, altitude_ft):
    """
    Estimate park factor based on dimensions and altitude.
    
    Returns: Multiplier where 1.0 = neutral, >1.0 = hitter-friendly
    """
    # Baseline distances (typical HS field)
    baseline_lf, baseline_cf, baseline_rf = 320, 380, 320
    
    # Distance factor (smaller = more runs)
    distance_factor = (
        (baseline_lf / left_field) * 0.3 +
        (baseline_cf / center_field) * 0.4 +
        (baseline_rf / right_field) * 0.3
    )
    
    # Altitude factor (Denver effect: ~5% increase per 1000ft above sea level)
    altitude_factor = 1.0 + (altitude_ft / 1000) * 0.05
    
    return distance_factor * altitude_factor

Expected Impact

  • More accurate home/away scoring differentials
  • Better predictions for games at extreme venues (e.g., high altitude Fort Collins vs. sea-level opponent)
  • Improved player projection for extreme home fields

References

  • Keri, Jonah, ed. Baseball Between the Numbers. Basic Books, 2006. Chapter on Park Effects.

3. Strength of Schedule Adjustment

Current State

Power rankings and team strength indices do not account for opponent quality. A team with 150 projected runs against weak opponents appears equivalent to one with 150 runs against strong opponents.

Opportunity

Implement Strength of Schedule (SOS) adjustments to normalize team ratings.

Data Required

  • Complete league schedule for all teams (not just Rocky Mountain)
  • Format: Team, Opponent, Date, Home/Away
  • At minimum, conference schedules for the target league

Implementation Approach

def calculate_sos_adjusted_index(team, schedule_df, strength_map):
    """
    Adjust team index based on average opponent strength.
    
    Formula: Adjusted = Raw * (League_Avg_Opponent / Team_Avg_Opponent)
    """
    team_games = schedule_df[
        (schedule_df['Home'] == team) | (schedule_df['Away'] == team)
    ]
    
    opponents = []
    for _, game in team_games.iterrows():
        opp = game['Away'] if game['Home'] == team else game['Home']
        if opp in strength_map:
            opponents.append(strength_map[opp]['Off_Index'])
    
    if not opponents:
        return strength_map[team]['Off_Index']
    
    avg_opp_strength = np.mean(opponents)
    league_avg = np.mean([v['Off_Index'] for v in strength_map.values()])
    
    adjustment = league_avg / avg_opp_strength
    return strength_map[team]['Off_Index'] * adjustment

Expected Impact

  • Fairer power rankings that reward teams with difficult schedules
  • Better predictions for non-conference/playoff matchups
  • Identification of “paper tigers” (good record against weak opponents)

4. Regression to the Mean for Extreme Projections

Current State

Some players have extreme stat projections (e.g., 58.8 projected hits) that may be unreliable due to small sample sizes or outlier prior seasons.

Opportunity

Apply regression toward population means based on sample reliability.

Data Required

  • Historical variance data by stat category (already available in multipliers)
  • League-wide averages by class and position
  • Ideally: Multiple seasons of data per player to estimate true talent

Implementation Approach

def regress_projection(projection, sample_size, population_mean, typical_variance):
    """
    Apply Marcel-style regression to projection.
    
    Based on Tango's Marcel the Monkey system.
    
    Args:
        projection: Raw projected value
        sample_size: Player's historical sample (e.g., PA)
        population_mean: League average for this stat
        typical_variance: Expected variance in the population
    
    Returns:
        Regressed projection
    """
    # Reliability increases with sample size
    # Rule of thumb: ~500 PA for full reliability on batting stats
    reliability = sample_size / (sample_size + 500)
    
    regressed = population_mean + reliability * (projection - population_mean)
    return regressed

Expected Impact

  • More conservative (and likely more accurate) projections for breakout candidates
  • Reduced projection variance for low-PA players
  • Better handling of “one-hit wonders”

References

  • Tango, Tom. “Marcel the Monkey Forecasting System.” TangoTiger.net, 2004.
  • Silver, Nate. “PECOTA.” Baseball Prospectus methodology documentation.

5. Game-Level Validation / Backtesting

Current State

The projection system has not been validated against actual game outcomes.

Opportunity

Implement backtesting framework to measure projection accuracy.

Data Required

  • Historical game results for at least 1 full season
  • Historical rosters that were used to generate predictions (to recreate past projections)
  • Format: Date, Home_Team, Away_Team, Home_Score, Away_Score, Home_Win

Implementation Approach

def backtest_season(predictions_df, actuals_df):
    """
    Compare predicted win probabilities to actual outcomes.
    
    Metrics:
    - Brier Score: Mean squared error of probability predictions
    - Calibration: Do 70% predictions win 70% of the time?
    - Log Loss: Information-theoretic accuracy measure
    """
    merged = predictions_df.merge(actuals_df, on=['Date', 'Opponent'])
    
    # Brier Score (lower is better, 0 = perfect)
    brier = ((merged['Win_Pct'] - merged['Actual_Win']) ** 2).mean()
    
    # Calibration by bucket
    merged['Prob_Bucket'] = (merged['Win_Pct'] * 10).astype(int) / 10
    calibration = merged.groupby('Prob_Bucket').agg({
        'Win_Pct': 'mean',
        'Actual_Win': 'mean',
        'Date': 'count'
    }).rename(columns={'Date': 'N_Games'})
    
    return {
        'brier_score': brier,
        'calibration': calibration,
        'total_games': len(merged)
    }

Expected Impact

  • Quantified confidence in projection accuracy
  • Identification of systematic biases (e.g., overconfident in favorites)
  • Data-driven tuning of model parameters

6. Pitcher Matchup Adjustments

Current State

Game simulations use aggregate team pitching strength without considering which specific pitcher will start.

Opportunity

Model game-level outcomes based on probable starting pitcher.

Data Required

  • Pitching rotation information (who starts which games)
  • Opponent-specific splits (how pitchers perform vs. specific teams/lineups)
  • Pitch count / workload data for fatigue modeling

Implementation Approach

def simulate_game_with_starter(my_team, opponent, my_starter, opp_starter, ...):
    """
    Adjust game simulation based on starting pitchers.
    
    Use individual pitcher's Pitching_Score rather than team aggregate.
    Apply fatigue adjustment based on recent workload.
    """
    # Get individual pitcher strength instead of team aggregate
    my_pit_index = my_starter['Pitching_Score'] / league_avg_pitcher_score
    opp_pit_index = opp_starter['Pitching_Score'] / league_avg_pitcher_score
    
    # Apply fatigue if pitcher threw recently
    days_rest = calculate_days_rest(my_starter, game_date)
    fatigue_factor = min(1.0, 0.8 + days_rest * 0.05)  # Full strength at 4+ days rest
    my_pit_index *= fatigue_factor
    
    # Continue with normal simulation...

Expected Impact

  • More accurate game-level predictions
  • Better modeling of aces vs. #4 starters
  • Ability to simulate “what if” scenarios for playoff pitching decisions

7. Defensive Metrics Integration

Current State

The system focuses on batting (RC) and pitching (Game Score variant) but does not incorporate fielding.

Opportunity

Add defensive contribution to player and team valuations.

Data Required

  • Fielding statistics (currently partially available: FP, TC, PO, A, E)
  • Ideally: Position-specific data (SS errors are different from 1B errors)
  • Advanced: Range factor or UZR-equivalent (not typically available at HS level)

Implementation Approach

def calculate_defensive_score(df):
    """
    Simple defensive value based on available stats.
    
    Weights errors negatively, rewards assists/putouts for non-1B positions.
    """
    # Fielding percentage component
    fp_component = (df['FP'].fillna(0.95) - 0.95) * 10  # 0 for average, + for good
    
    # Volume component (more chances = more defensive value)
    volume_component = np.log1p(df['TC'].fillna(0)) / 3  # Diminishing returns
    
    # Error penalty (more severe for key positions)
    error_penalty = df['E'].fillna(0) * -0.5
    
    return fp_component + volume_component + error_penalty

Expected Impact

  • More complete player valuation
  • Better differentiation of “glove-first” vs. “bat-first” players
  • Improved team strength assessment for defensively-oriented squads

Summary: Data Collection Priority

Opportunity Data Difficulty Impact Priority
Game-Level Scores (for dispersion calibration) Medium High 1
Complete League Schedules (for SOS) Low High 2
Historical Results (for backtesting) Medium High 3
Field Dimensions Medium Medium 4
Pitching Rotation Data High Medium 5
Advanced Defensive Data Very High Low 6

Recommended Next Step: Scrape game-level results from MaxPreps for the 2024 and 2025 seasons to enable both dispersion calibration (#1) and backtesting (#3).


Copyright © 2025 Tim Coultas

This site uses Just the Docs, a documentation theme for Jekyll.