Jump to content
Head Coach Openings 2024 ×
  • Current Donation Goals

    • Raised $2,716 of $3,600 target

Developing A Simple Algorithm for Gridiron Digest Team Rankings and Ratings.


Recommended Posts

I love all the ranking systems out there -- Sagarin, Almanac Sports (shoutout to @Rudy) CalPreps, USAToday, MaxPreps, Play Football NFL (just for top teams), etc...

With the advent of Google Docs, it is easier than ever to collaborate on projects. Is there any interest in developing open source rankings for TGD? Here is an editable Google Sheet with six teams (this year's state champions) and six weeks of games (round robin) with fictitious, but realistic score data.

I have set up this Google Sheet using my GridironDigest email. If you are interested in being involved with this, sound off! It will be a long term project, but no better time than the present (and the off season) to talk about it.

Thoughts? I am open to using other software, but I have some experience with Google Sheets, Forms, and Scripts...

Just based on the above data, I think a good first step would be to automatically calculate:
- Total_Win_Percentage (by team)

- Home_Win_Percentage (by team)

- Away_Win_Percentage (by team)

- With this, we can develop a simple "Top 6" based solely on Win_Percentage

 

FYI: Anyone with access to the aforementioned Google Sheet link can edit. Edits are trackable and changes are reversible. Please use the comments section within Google Docs or on this thread if you make any big changes. Interested to see where this goes!

Link to comment
Share on other sites

Gibson Southern only losing by a score on the road vs cathedral?

 

I love GS and support them all the way……but even I don’t think cathedral would keep it that close lol….feel like it would be like 2018 Memorial vs GS where sure GS scored 28 points…but cathedral scores 56

Edited by DumfriesYMCA
Link to comment
Share on other sites

10 hours ago, DumfriesYMCA said:

Gibson Southern only losing by a score on the road vs cathedral?

 

I love GS and support them all the way……but even I don’t think cathedral would keep it that close lol….feel like it would be like 2018 Memorial vs GS where sure GS scored 28 points…but cathedral scores 56

It's just quasi data for the sake of developing a working rating system... Change the base score, but only if you contribute to the rating algorithm! 

Link to comment
Share on other sites

54 minutes ago, hhpatriot04 said:

It's just quasi data for the sake of developing a working rating system... Change the base score, but only if you contribute to the rating algorithm! 

I’m not nearly smart enough to know where to start on an algorithm for rankings…..but I do know it should include out of state opponents if possible. 
 

it’s probably the 1 major downfall of Sagarin ratings and Sagarin gets it pretty dang close 

Link to comment
Share on other sites

Both Sagarin and CalPreps have Gibson Southern as the 9th ranked Indiana HS football team All Classes.  To my knowledge, over the last 10 or so years, only one other 3A school has been able to break into the Top 10 All Classes.

Say what you want, but that tells me that GS is a very rare and special 3A team.

  • Like 2
Link to comment
Share on other sites

I didn't know there was such a thing as a simple algorithm....  

10 minutes ago, Lysander said:

Both Sagarin and CalPreps have Gibson Southern as the 9th ranked Indiana HS football team All Classes.  To my knowledge, over the last 10 or so years, only one other 3A school has been able to break into the Top 10 All Classes.

Say what you want, but that tells me that GS is a very rare and special 3A team.

No question at all - they were very special.     

  • Like 1
Link to comment
Share on other sites

42 minutes ago, Lysander said:

Both Sagarin and CalPreps have Gibson Southern as the 9th ranked Indiana HS football team All Classes.  To my knowledge, over the last 10 or so years, only one other 3A school has been able to break into the Top 10 All Classes.

Say what you want, but that tells me that GS is a very rare and special 3A team.

I just checked and maxpreps has them at #9 also. 
 

Has to be Chatard/West Laf/Memorial as the only others in contention for that.

 

I would argue that 3A is pound for pound the toughest division for Indiana football.  Every year there are a handful of good teams dishing 4A/5A/6A teams big losses…outside of Cathedral in 5A…the best 3A teams could probably contend for at least a runner up in 4A and 5A every year

 

@hhpatriot04this makes me think about something for the algorithm….Sagarin has I believe a pretty linear setup for the differences between 1A and 6A…but that’s not how it really plays out.

Top of 6A -> LARGE GAP -> 6A -> 5A/4A pretty dang close to each other minus if Cathedral is in 5A -> Top of 3A not too far behind -> rest of 3A/Top of 2A -> Top of 1A ->rest of 2A and 1A 

I don’t know if it’s even possible to make this a thing but it’s generally how it plays out.  
 

would be really cool if someone was able to source from Rudy’s website/maxpreps and have a computer run the numbers of how teams have faired between the divisions….probably just keep it to post 6A expansion.  From there you would have an actual metric to plug into the algorithm and it would help rank teams accordingly.  

Link to comment
Share on other sites

On 12/3/2021 at 10:33 PM, Rudy said:

I use excel for everything. I'm happy to share the relevant files if someone wants to take this on. Should be a fairly "simple" task for someone who is really good with excel formulas 🙂

Rudy

28 minutes ago, southend said:

Any takers?

I'd be willing to give it a shot. I use Excel all day every day with work. By no means am I an expert, but I'm not that bad either.

Link to comment
Share on other sites

On 12/6/2021 at 5:33 PM, NLCTigerFan07 said:

I'd be willing to give it a shot. I use Excel all day every day with work. By no means am I an expert, but I'm not that bad either.

Please keep everything on the Google Drive Sheet shared above. While Excel and Google Sheets share much of the same nomenclature, Google Sheets does have some differences, especially if we have to start tying it into macros and other projects. As you can see from the above link, I just created a small sample of games with six teams. It needs some work to develop the SoS argument. I have mainly been using the countif() and countifs() functions for pulling such data.

In all, I think it is best to get something working on a small scale (lets say a closed conference like the SIAC), and then introduce the other 300-something teams. Unfortunately, not using something like Python/PHP and SQL means we can't leave comments in the "code." However, Google Sheets does have the ability to leave comments on individual cells. 

Good to see that there is interest and thank you @Rudy for volunteering the data so we don't have to dig it up from the Associated Press or rip it off of Harrell's site. Can I ask, what is your current input method for game nights? Source and method of input? Are you using a form or inputting directly into a table or database?

Link to comment
Share on other sites

On 12/3/2021 at 3:40 AM, DumfriesYMCA said:

I just checked and maxpreps has them at #9 also. 
 

Has to be Chatard/West Laf/Memorial as the only others in contention for that.

 

I would argue that 3A is pound for pound the toughest division for Indiana football.  Every year there are a handful of good teams dishing 4A/5A/6A teams big losses…outside of Cathedral in 5A…the best 3A teams could probably contend for at least a runner up in 4A and 5A every year

 

@hhpatriot04this makes me think about something for the algorithm….Sagarin has I believe a pretty linear setup for the differences between 1A and 6A…but that’s not how it really plays out.

Top of 6A -> LARGE GAP -> 6A -> 5A/4A pretty dang close to each other minus if Cathedral is in 5A -> Top of 3A not too far behind -> rest of 3A/Top of 2A -> Top of 1A ->rest of 2A and 1A 

I don’t know if it’s even possible to make this a thing but it’s generally how it plays out.  
 

would be really cool if someone was able to source from Rudy’s website/maxpreps and have a computer run the numbers of how teams have faired between the divisions….probably just keep it to post 6A expansion.  From there you would have an actual metric to plug into the algorithm and it would help rank teams accordingly.  

My thought on classes is they don't really matter. I'd favor a system that showed no bias directly built in by class. For single year rankings, this could create discrepancies, but if you start factoring in something like "Program Prestige" and "Recent Program Prestige," it should all work out.

This would also help to address when say you have multiple Top 4 teams in a class in the same sectional... If you factor in previous years' success, then teams like HH or GS in the same sectional or Chatard and Roncalli don't get penalized simply because of arbitrary lines and arbitrary geography. It will probably take at least 10 years of data, but I'm hoping to eventually get it to something like CalPreps where you could compare teams from different decades... What if Roncalli 2004 played Cathedral 2021... etc.

  • Like 1
Link to comment
Share on other sites

1 hour ago, hhpatriot04 said:

My thought on classes is they don't really matter. I'd favor a system that showed no bias directly built in by class. For single year rankings, this could create discrepancies, but if you start factoring in something like "Program Prestige" and "Recent Program Prestige," it should all work out.

This would also help to address when say you have multiple Top 4 teams in a class in the same sectional... If you factor in previous years' success, then teams like HH or GS in the same sectional or Chatard and Roncalli don't get penalized simply because of arbitrary lines and arbitrary geography. It will probably take at least 10 years of data, but I'm hoping to eventually get it to something like CalPreps where you could compare teams from different decades... What if Roncalli 2004 played Cathedral 2021... etc.

I definitely like your idea better. 
 

the question then is…how far back do you track?  Do you go by 4 year windows that basically just follow the senior class every year?  Do you do 3 assuming most don’t get a shot to start until their sophomore year?

Link to comment
Share on other sites

11 hours ago, hhpatriot04 said:

Please keep everything on the Google Drive Sheet shared above. While Excel and Google Sheets share much of the same nomenclature, Google Sheets does have some differences, especially if we have to start tying it into macros and other projects. As you can see from the above link, I just created a small sample of games with six teams. It needs some work to develop the SoS argument. I have mainly been using the countif() and countifs() functions for pulling such data.

In all, I think it is best to get something working on a small scale (lets say a closed conference like the SIAC), and then introduce the other 300-something teams. Unfortunately, not using something like Python/PHP and SQL means we can't leave comments in the "code." However, Google Sheets does have the ability to leave comments on individual cells. 

Good to see that there is interest and thank you @Rudy for volunteering the data so we don't have to dig it up from the Associated Press or rip it off of Harrell's site. Can I ask, what is your current input method for game nights? Source and method of input? Are you using a form or inputting directly into a table or database?

I'll scan the scores from Harrell's page, usually copy and paste and add them to my excel file using my format. I'll update the standings file too then upload via FTP to our server. Finally I'll run a php script that my brother created which updates everything on the site.

Not the easiest way but done it that way for over a dozen years now lol

Dan

Link to comment
Share on other sites

On 12/8/2021 at 6:31 AM, Rudy said:

I'll scan the scores from Harrell's page, usually copy and paste and add them to my excel file using my format. I'll update the standings file too then upload via FTP to our server. Finally I'll run a php script that my brother created which updates everything on the site.

Not the easiest way but done it that way for over a dozen years now lol

Dan

I've used the PHP curl() functions to populate my predictor ratings (you should look at using curl() in whatever language you're using - big time saver). It has always bugged me -- and it is hard to say that because he does so much -- that Harrell doesn't keep a weekly list of scores online (he of course must have this on the backend). To the best of my knowledge, he does all of his data storage and calculations offline and then uploads them to static pages. That's pretty old school, but not surprising.

 

When I was working at the H-T, I actually played with some of his REGEX scripts which would use an early version of JavaScript to pull scores from the Associated Press wire and format them so they would fit on the newspaper's scoreboard pages and have consistent naming systems, for example, all colleges with "State" were automatically abbreviated to St, all colleges with cardinal directions North/South/No/North/So/South would be replaced with N or S. Those scripts at the Bloomington Herald-Times he first developed in the late 1980s-1990s.
 

Well at least I don't feel so bad "lifting" his scores, as I know from experience they are more reliable than the press wires like AP. For example, if you got a bad HS score from a wire service, Harrell's site was probably 90% more likely to have the accurate score prior to print deadline.

Link to comment
Share on other sites

On 12/7/2021 at 8:42 PM, DumfriesYMCA said:

I definitely like your idea better. 
 

the question then is…how far back do you track?  Do you go by 4 year windows that basically just follow the senior class every year?  Do you do 3 assuming most don’t get a shot to start until their sophomore year?

Every game ever played would be factored in (eventually), but of course the more recent ones would have more "weight" -- determining how to set that weight in a relatively non-arbitrary way is the question... Standard deviations for example are better than saying "This season = 0.4, previous season = 0.2, two years ago = 0.1, all other previous seasons = 0.3" (or something similar).

Link to comment
Share on other sites

18 hours ago, whiteshoes said:

Simple algorithm is an oxymoron.

Touche! 

Just wanted to communicate that before we start inputting (160 games * 10 weeks + tournament games) * 15 years, we want something relatively simple and transparent (like a season of a closed conference, so we can write, test, debug, then enrich the rankings.

Link to comment
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...