Future Improvement Opportunities
Overview
This document outlines opportunities for improvement that were identified during the adversarial review but cannot be implemented without additional data or external resources. Each section describes the opportunity, the data required, and suggested implementation approaches.
1. Empirical Dispersion Parameter Calibration
Current State
The Monte Carlo simulator uses a hardcoded dispersion parameter of 1.3 for the Negative Binomial distribution. This value is reasonable based on MLB research but has not been validated against actual high school game scores.
Opportunity
Calibrate the dispersion parameter using empirical game-level data to improve simulation accuracy.
Data Required
- Game-level scores for Colorado high school baseball games (ideally 2+ seasons)
- Format:
Date, Home_Team, Away_Team, Home_Score, Away_Score - Source possibilities:
- MaxPreps game results pages
- CHSAA (Colorado High School Activities Association) archives
- Local newspaper sports sections
Implementation Approach
from scipy.stats import nbinom
from scipy.optimize import minimize
def fit_dispersion(actual_scores, expected_means):
"""
Fit optimal dispersion parameter to historical game scores.
Args:
actual_scores: Array of actual runs scored
expected_means: Array of expected runs (from team strength model)
Returns:
Optimal dispersion value
"""
def neg_log_likelihood(dispersion):
total_ll = 0
for score, mean in zip(actual_scores, expected_means):
if mean <= 0 or dispersion <= 1:
return float('inf')
variance = mean * dispersion
p = mean / variance
n = (mean ** 2) / (variance - mean)
total_ll -= nbinom.logpmf(score, n, p)
return total_ll
result = minimize(neg_log_likelihood, x0=1.3, bounds=[(1.01, 3.0)])
return result.x[0]
Expected Impact
- More accurate win probability estimates
- Better calibrated confidence intervals for season projections
- Reduced systematic bias in blowout/shutout predictions
References
- Lindsey, G.R. “An Investigation of Strategies in Baseball.” Operations Research 11.4 (1963): 477-501.
2. Park Effects / Field Dimensions
Current State
The simulator applies a flat 10% home field advantage multiplier regardless of the specific field being played on.
Opportunity
Incorporate park factors based on field dimensions to adjust run expectations by venue.
Data Required
- Field dimensions for each high school baseball field:
- Left field line distance
- Center field distance
- Right field line distance
- Fence height
- Altitude (relevant for Colorado)
- Historical scoring by venue (optional but helpful for validation)
Implementation Approach
# Park factor calculation (simplified)
def calculate_park_factor(left_field, center_field, right_field, altitude_ft):
"""
Estimate park factor based on dimensions and altitude.
Returns: Multiplier where 1.0 = neutral, >1.0 = hitter-friendly
"""
# Baseline distances (typical HS field)
baseline_lf, baseline_cf, baseline_rf = 320, 380, 320
# Distance factor (smaller = more runs)
distance_factor = (
(baseline_lf / left_field) * 0.3 +
(baseline_cf / center_field) * 0.4 +
(baseline_rf / right_field) * 0.3
)
# Altitude factor (Denver effect: ~5% increase per 1000ft above sea level)
altitude_factor = 1.0 + (altitude_ft / 1000) * 0.05
return distance_factor * altitude_factor
Expected Impact
- More accurate home/away scoring differentials
- Better predictions for games at extreme venues (e.g., high altitude Fort Collins vs. sea-level opponent)
- Improved player projection for extreme home fields
References
- Keri, Jonah, ed. Baseball Between the Numbers. Basic Books, 2006. Chapter on Park Effects.
3. Strength of Schedule Adjustment
Current State
Power rankings and team strength indices do not account for opponent quality. A team with 150 projected runs against weak opponents appears equivalent to one with 150 runs against strong opponents.
Opportunity
Implement Strength of Schedule (SOS) adjustments to normalize team ratings.
Data Required
- Complete league schedule for all teams (not just Rocky Mountain)
- Format:
Team, Opponent, Date, Home/Away - At minimum, conference schedules for the target league
Implementation Approach
def calculate_sos_adjusted_index(team, schedule_df, strength_map):
"""
Adjust team index based on average opponent strength.
Formula: Adjusted = Raw * (League_Avg_Opponent / Team_Avg_Opponent)
"""
team_games = schedule_df[
(schedule_df['Home'] == team) | (schedule_df['Away'] == team)
]
opponents = []
for _, game in team_games.iterrows():
opp = game['Away'] if game['Home'] == team else game['Home']
if opp in strength_map:
opponents.append(strength_map[opp]['Off_Index'])
if not opponents:
return strength_map[team]['Off_Index']
avg_opp_strength = np.mean(opponents)
league_avg = np.mean([v['Off_Index'] for v in strength_map.values()])
adjustment = league_avg / avg_opp_strength
return strength_map[team]['Off_Index'] * adjustment
Expected Impact
- Fairer power rankings that reward teams with difficult schedules
- Better predictions for non-conference/playoff matchups
- Identification of “paper tigers” (good record against weak opponents)
4. Regression to the Mean for Extreme Projections
Current State
Some players have extreme stat projections (e.g., 58.8 projected hits) that may be unreliable due to small sample sizes or outlier prior seasons.
Opportunity
Apply regression toward population means based on sample reliability.
Data Required
- Historical variance data by stat category (already available in multipliers)
- League-wide averages by class and position
- Ideally: Multiple seasons of data per player to estimate true talent
Implementation Approach
def regress_projection(projection, sample_size, population_mean, typical_variance):
"""
Apply Marcel-style regression to projection.
Based on Tango's Marcel the Monkey system.
Args:
projection: Raw projected value
sample_size: Player's historical sample (e.g., PA)
population_mean: League average for this stat
typical_variance: Expected variance in the population
Returns:
Regressed projection
"""
# Reliability increases with sample size
# Rule of thumb: ~500 PA for full reliability on batting stats
reliability = sample_size / (sample_size + 500)
regressed = population_mean + reliability * (projection - population_mean)
return regressed
Expected Impact
- More conservative (and likely more accurate) projections for breakout candidates
- Reduced projection variance for low-PA players
- Better handling of “one-hit wonders”
References
- Tango, Tom. “Marcel the Monkey Forecasting System.” TangoTiger.net, 2004.
- Silver, Nate. “PECOTA.” Baseball Prospectus methodology documentation.
5. Game-Level Validation / Backtesting
Current State
The projection system has not been validated against actual game outcomes.
Opportunity
Implement backtesting framework to measure projection accuracy.
Data Required
- Historical game results for at least 1 full season
- Historical rosters that were used to generate predictions (to recreate past projections)
- Format:
Date, Home_Team, Away_Team, Home_Score, Away_Score, Home_Win
Implementation Approach
def backtest_season(predictions_df, actuals_df):
"""
Compare predicted win probabilities to actual outcomes.
Metrics:
- Brier Score: Mean squared error of probability predictions
- Calibration: Do 70% predictions win 70% of the time?
- Log Loss: Information-theoretic accuracy measure
"""
merged = predictions_df.merge(actuals_df, on=['Date', 'Opponent'])
# Brier Score (lower is better, 0 = perfect)
brier = ((merged['Win_Pct'] - merged['Actual_Win']) ** 2).mean()
# Calibration by bucket
merged['Prob_Bucket'] = (merged['Win_Pct'] * 10).astype(int) / 10
calibration = merged.groupby('Prob_Bucket').agg({
'Win_Pct': 'mean',
'Actual_Win': 'mean',
'Date': 'count'
}).rename(columns={'Date': 'N_Games'})
return {
'brier_score': brier,
'calibration': calibration,
'total_games': len(merged)
}
Expected Impact
- Quantified confidence in projection accuracy
- Identification of systematic biases (e.g., overconfident in favorites)
- Data-driven tuning of model parameters
6. Pitcher Matchup Adjustments
Current State
Game simulations use aggregate team pitching strength without considering which specific pitcher will start.
Opportunity
Model game-level outcomes based on probable starting pitcher.
Data Required
- Pitching rotation information (who starts which games)
- Opponent-specific splits (how pitchers perform vs. specific teams/lineups)
- Pitch count / workload data for fatigue modeling
Implementation Approach
def simulate_game_with_starter(my_team, opponent, my_starter, opp_starter, ...):
"""
Adjust game simulation based on starting pitchers.
Use individual pitcher's Pitching_Score rather than team aggregate.
Apply fatigue adjustment based on recent workload.
"""
# Get individual pitcher strength instead of team aggregate
my_pit_index = my_starter['Pitching_Score'] / league_avg_pitcher_score
opp_pit_index = opp_starter['Pitching_Score'] / league_avg_pitcher_score
# Apply fatigue if pitcher threw recently
days_rest = calculate_days_rest(my_starter, game_date)
fatigue_factor = min(1.0, 0.8 + days_rest * 0.05) # Full strength at 4+ days rest
my_pit_index *= fatigue_factor
# Continue with normal simulation...
Expected Impact
- More accurate game-level predictions
- Better modeling of aces vs. #4 starters
- Ability to simulate “what if” scenarios for playoff pitching decisions
7. Defensive Metrics Integration
Current State
The system focuses on batting (RC) and pitching (Game Score variant) but does not incorporate fielding.
Opportunity
Add defensive contribution to player and team valuations.
Data Required
- Fielding statistics (currently partially available: FP, TC, PO, A, E)
- Ideally: Position-specific data (SS errors are different from 1B errors)
- Advanced: Range factor or UZR-equivalent (not typically available at HS level)
Implementation Approach
def calculate_defensive_score(df):
"""
Simple defensive value based on available stats.
Weights errors negatively, rewards assists/putouts for non-1B positions.
"""
# Fielding percentage component
fp_component = (df['FP'].fillna(0.95) - 0.95) * 10 # 0 for average, + for good
# Volume component (more chances = more defensive value)
volume_component = np.log1p(df['TC'].fillna(0)) / 3 # Diminishing returns
# Error penalty (more severe for key positions)
error_penalty = df['E'].fillna(0) * -0.5
return fp_component + volume_component + error_penalty
Expected Impact
- More complete player valuation
- Better differentiation of “glove-first” vs. “bat-first” players
- Improved team strength assessment for defensively-oriented squads
Summary: Data Collection Priority
| Opportunity | Data Difficulty | Impact | Priority |
|---|---|---|---|
| Game-Level Scores (for dispersion calibration) | Medium | High | 1 |
| Complete League Schedules (for SOS) | Low | High | 2 |
| Historical Results (for backtesting) | Medium | High | 3 |
| Field Dimensions | Medium | Medium | 4 |
| Pitching Rotation Data | High | Medium | 5 |
| Advanced Defensive Data | Very High | Low | 6 |
Recommended Next Step: Scrape game-level results from MaxPreps for the 2024 and 2025 seasons to enable both dispersion calibration (#1) and backtesting (#3).