(Fellow stats geeks: Anyone know where I can find data sets with scores and yardage data for I-A over a number of years? Please comment!)
The heated discussion over what to make of an undefeated team who’s last in the league in both total (yardage) offense and total defense has the entire SEC blogosphere abuzz. Well, OK, not the entire blogosphere, really. And maybe not abuzz, more like slightly intrigued.
- On Bleacher Report, Tim Pollock pushes the case for Vandy-skepticism to its extreme. VSL naturally reacts with scorn.
- The Joe Cribbs Car Wash hints at recanting from its previous skepticism. In fact, Jerry seems to be swinging his opinion an awful lot based on just one game, our Ole Miss win.
- The Dead Guy continues to keep our deficiencies in mind but cogently links them to Miss State last year, a team whose 2007 season I would accept for VU in a heartbeat.
- Meanwhile, although it’s not exactly the SEC blogosphere, Phil Steele is posting some sort of stats-based projection that has VU going 3-5 in conference — with the remaining win over Auburn! (His VU and AU projections appear to be out of synch, but the VU one shows the OM game as completed whereas the AU one doesn’t appear to have the LSU game completed. Then again, LSU looks like they already beat Miss State, so this whole projection appears a bit sketchy.)
In the spirit of open-minded empiricism that I hope permeates this blog, I’m definitely not going to the extreme of Pollock to say that Vanderbilt will come crashing down. (Don’t be fooled by his pro forma, “Can the ‘Dores get three more victories? Yes. Would I bet on it? No, sir”; Pollock seems pretty convinced that our season will come crashing down, although he also seems to think that 6-6 won’t get us in a bowl.) I get that uncertainty and probability doesn’t sell like taking an assertive position does, even one that turns out to be wrong.
Actually, I think there are plenty of good arguments to be made that VU’s performance of the first four games actually is sustainable. I’m not yet convinced that those arguments are as good as the people making them think they are, though, so that’s what I want to talk about today.
What would convince me that a team living by lots of turnovers, return yards, and efficient red-zone performance on both ends — in other words, a team with a low offensive yards-per-point (YPP) and a high defensive YPP — was able to sustain that performance over the course of games and seasons?
- First of all, I’d like to isolate the role of turnovers. Take a data series of every team’s turnover margins for a number of years, and regress the turnovers in year n over the turnovers in year n-1. Or regress the turnover performance in the nth game over the turnover margin in weeks 1 through n-1. It probably makes sense to break these down into fumbles and interceptions, because it seems intuitively obvious to me that picks are more related to underlying football ability than fumbles.
- Secondly, I’d like to do similar to what Steele did but with regression analysis, and comparing YPP to YPP instead of YPP to record the next year. How does a team’s YPP one year predict YPP the following year, or how does a team’s YPP through the first however many games predict YPP in their next game?
Now, my knowledge of regression analysis doesn’t go beyond rusty memories of an undergrad econometrics class 15 years ago, but I’m pretty sure you can use a spreadsheet for basic regressions, and there are probably open-source packages if you need something fancier. The harder part for me is staying passionate long enough to find the data and do the analysis! So I’m throwing this out in case someone else wants to pick up the gauntlet before I get to it.
I’m sure you’d want to control for strength of schedule. One problem often cited with college football is the prevalence of “creampuff” games which throw stats all out of whack. Steele and others solve this by only looking at conference games, which makes a lot of sense. Even though conference schedules aren’t stable year-to-year (think of Georgia, dropping Ole Miss and picking up LSU this year!) and certainly not week-to-week, a large enough sample would make all that balance out. Or you could deflate the stats for the opponent’s stats too.
I like the work Steele has done so far in this area, but (1) I get the sense he doesn’t do any regression, whereas that seems like the most standard statistical tool to answer questions of this sort (2) he’s only going year-to-year, and a first month like VU has had prompts questions about intra-season sustainability. If any of the more stats-minded bloggers have thoughts on how to proceed - ESPECIALLY IF YOU CAN POINT ME TO DATA SETS SO I DON’T HAVE TO TYPE IN OR PARSE THEM MYSELF!!! — please do chime in.
Other pseudo-scientific claptrap
After mentioning transitivity in my last potpourri, I went a little crazy and worked out how Vanderbilt is really the clear second-best team in the SEC.
I keep forgetting about my crude little season simulator, but it’s kind of fun to see how my probabilities for each game predictions turn into W-L. I’m not doing anything fancy here, just brute-forcing one million simulations and totaling the results.
1 response so far ↓
1 Steele on the predictive value of yardage // Oct 10, 2008 at 1:36 pm
[...] finally, as I discussed here, it seems like a good idea to use regression analysis. It’s a little tricky and maybe makes [...]
Leave a Comment