<?xml version="1.0" encoding="utf-8"?><feed xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator><link href="https://jeremywalsh.ca/feed.xml" rel="self" type="application/atom+xml" /><link href="https://jeremywalsh.ca/" rel="alternate" type="text/html" /><updated>2026-05-08T23:02:29-04:00</updated><id>https://jeremywalsh.ca/feed.xml</id><title type="html">Jeremy Walsh</title><subtitle>A place to showcase my projects and my passion for data science, and sport.</subtitle><author><name>Jeremy Walsh</name></author><entry><title type="html">Sulphur Springs 2026: Storylines to watch for</title><link href="https://jeremywalsh.ca/2026/05/08/sulphur-springs-preview.html" rel="alternate" type="text/html" title="Sulphur Springs 2026: Storylines to watch for" /><published>2026-05-08T00:00:00-04:00</published><updated>2026-05-08T00:00:00-04:00</updated><id>https://jeremywalsh.ca/2026/05/08/sulphur-springs-preview</id><content type="html" xml:base="https://jeremywalsh.ca/2026/05/08/sulphur-springs-preview.html"><![CDATA[<p>The 2026 Sulphur Springs Trail Race is right around the corner and there a few storylines I’m following.</p>

<ul>
  <li>The men’s 50k may be the most competitive of all time.</li>
  <li>The women’s 100K and 100M records are under serious threat.</li>
  <li>A trail newcomer is poised to reset what is possible in the men’s 100M.</li>
  <li>A team competition?</li>
</ul>

<p>I’ve got a sneak peak of the start list and I’ve done the best I can to add more information to it, like incorporating UTMB and ITRA indexes, and past results to get a sense of how these races are going to play out. By combining historical finishing data, geospatial modeling, and UTMB/ITRA performance indices, we can map out not just who’s racing—but how these races are likely to unfold. My hope is to build some excitement for the races, and provide at least for myself, some idea of the main storylines to follow throughout the event.</p>

<hr />

<h2 id="field-depth--competitiveness">Field Depth &amp; Competitiveness</h2>

<p>How strong is this year’s field?</p>

<p>To find out, I looked up the UTMB Performance Index of every entrant and then plotted the average of the Top 3 entrants in 2026 against the average index of the Top 3 finishers from every year dating back to 2010. By focusing on the top-end of the field, we can see the true competitive ceiling of the race.</p>

<p><img src="/assets/img/sulphur-springs/historical_performance_index_trend_men.png" alt="The Men's Competitive Ceiling of Sulphur Springs" /></p>

<p><img src="/assets/img/sulphur-springs/historical_performance_index_trend_women.png" alt="The Women's Competitive Ceiling of Sulphur Springs" /></p>

<p>The Men’s 50k stands out as the marquee race this weekend. The average index of the top 3 male entrants (~777) is higher than any winning trio in the event’s 30-year history. The men’s 100M is the only other event with an increase in the entrance index, which is due to an outlier we’ll discuss later. The other events all are projecting lower values than last year. As with the rest of this analysis, it’s worth noting that these are projections. UTMB index is not a perfect measure of competition, and there are likely runners that I’m missing from this dataset. That said, my goal isn’t to make a perfect prediction, but rather to provide a story of the 2026 race based on the data available to me.</p>

<p>At the full-field level, the races blend together. But isolating the projected top 10 reveals very different competitive structures across distances.</p>

<p><img src="/assets/img/sulphur-springs/sulphur_competitiveness_men.png" alt="Men's Field Competitiveness" /></p>

<p><img src="/assets/img/sulphur-springs/sulphur_competitiveness_women.png" alt="Women's Field Competitiveness" /></p>

<p>Filtering to just the projected top 10, and the picture sharpens.</p>

<p><img src="/assets/img/sulphur-springs/sulphur_competitiveness_top10.png" alt="Top 10 Only Competitiveness" /></p>

<ul>
  <li>The Men’s 50k and 100M have the highest ceilings of all the races, but a high spread in the top 10, while the 50M and 100k are more compressed, hinting at tighter race for these spots.</li>
  <li>The Women’s 50k dominates, and somehow the 100k and 100M still have outliers, indicating how above and beyond the top women are in each of these races.</li>
</ul>

<hr />

<h2 id="the-mens-50k-battle">The Men’s 50k Battle</h2>

<p>The Men’s 50k field is arguably the most competitive and densely packed race of the weekend. A closer look at the entrants reveals what is essentially a rematch of 2023, except nearly everyone has levelled up since then.</p>

<table>
  <thead>
    <tr>
      <th>Athlete</th>
      <th>UTMB Index</th>
      <th>Half Marathon PB</th>
      <th>Sulphur Podiums / Finishes</th>
      <th>Sulphur 50k PB</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>Dylan Pust</td>
      <td>814</td>
      <td>1:11:32</td>
      <td>2 / 2</td>
      <td>3:37:32</td>
    </tr>
    <tr>
      <td>Matthew Farquharson</td>
      <td>784</td>
      <td>1:10:54</td>
      <td>3 / 3</td>
      <td>3:43:32</td>
    </tr>
    <tr>
      <td>Jeremy Walsh</td>
      <td>733</td>
      <td>1:13:11</td>
      <td>2 / 4</td>
      <td>4:05:18</td>
    </tr>
  </tbody>
</table>

<ul>
  <li><strong>Dylan Pust</strong> — The favourite on paper with an 814 UTMB index. Dylan has the engine of a former elite triathlete, which translates to serious speed on the trails. He won the Sulphur Springs 50k in 2023 in a blistering 3:37:32, and then stepped up to the 100K in 2024 to take 2nd place in 8:29:04. Earlier this year he was 9th at the very competitive Black Canyons 50k in <a href="https://utmb.world/utmb-index/races/4984.blackcanyonultras50k.2026">3:34:02</a> which gave him a UTMB race score of 840. He comes into the race fresh and ready to maintain his Sulphur podium streak.</li>
  <li><strong>Matthew Farquharson</strong> — Coming in with a 784 index, Matt is a true Sulphur Springs veteran. He has been incredibly consistent at the 50k distance here: he won the race in 2019 (3:43:32), and was 2nd in both 2018 (3:48:00) and 2023 (3:46:05), the latter putting him directly behind Dylan who he hung onto for the first 30k. He finished 10th place at the historic JFK 50 Mile in late 2025 (5:53:35), he is primed for another podium battle.</li>
  <li><strong>Jeremy Walsh</strong> — Returning after winning the 20k last year in the second fastest time in the race’s history (<a href="https://results.raceroster.com/v3/events/s9v6cbyg2c3m6kps/race/231657?filter_search=">1:18:14</a>). His last time running the 50k was in 2022 where he finished 7th in 4:05:18. He most recently finished Foxtail 50k in 3:16:28 on a much flatter course. After posting predictions publicly, anything short of a podium would be deeply embarrassing.</li>
</ul>

<h3 id="what-this-means">What this means</h3>

<p>It will be tough for anyone to improve upon the <a href="https://sulphursprings.burlingtonrunners.com/results-and-records">3:23:02 course record</a>, but this race may go out with that kind of pace.</p>

<hr />

<h2 id="course-record-threats-womens-100k-and-100m">Course Record Threats: Women’s 100K and 100M</h2>

<p>The women’s ultra distances are poised for historic performances this year, with two course records under serious threat from the returning champions.</p>

<ul>
  <li><strong>Women’s 100K</strong>: Karen Holland is returning after a phenomenal run in 2025 where she set the current women’s course record of <a href="https://sulphursprings.burlingtonrunners.com/results-and-records">9:55:14</a>. The <a href="https://fastestknowntime.com/route/bruce-trail-canada">current women’s solo FKT holder for the Bruce Trail</a> has been noticebally quite on the racing front this year, but she knows exactly what is needed to put down a record performance.</li>
  <li><strong>Women’s 100M</strong>: Molly Hurford is back after an incredible performance in 2025 where she took 3rd place overall and won the women’s race with a blistering 17:22:53. That time narrowly missed Amanda Nelson’s <a href="https://sulphursprings.burlingtonrunners.com/results-and-records">17:18:58</a> course record from 2024. She also seemingly hasn’t raced yet this year, but is the major favourite.</li>
</ul>

<p>Both of these races will be fascinating to track as the day progresses, particularly if weather conditions are favourable for a record day.</p>

<hr />

<h2 id="sergio">Sergio</h2>

<p>The most fascinating entrant in the entire race might be Sergio Ráez Villanueva. He’s a 2:18 marathoner who has just broadened his attention to more than road racing. Earlier this year he led the first half of the World 50k championships before falling back to 9th place in 2:54:47. He’s been callusing his legs and mind with an absurd amount of races to prepare for his 100M debut at Sulphur Springs:</p>

<table>
  <thead>
    <tr>
      <th>Date</th>
      <th>Race</th>
      <th>Distance</th>
      <th>Place</th>
      <th>Time / Notes</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>April 11</td>
      <td>Caledon Crusher</td>
      <td>72km</td>
      <td>3rd</td>
      <td><a href="https://results.raceroster.com/v3/events/7zx4vycgbr4vtwpn/race/253011?filter_search=">7:14:11</a></td>
    </tr>
    <tr>
      <td>April 12</td>
      <td>Around the Bay</td>
      <td>30k</td>
      <td>5th</td>
      <td><a href="https://results.raceroster.com/v3/events/atjdngbt3v5wwxak/race/252843?filter_search=">1:43:11</a></td>
    </tr>
    <tr>
      <td>April 18</td>
      <td>Foxtail</td>
      <td>50km</td>
      <td>1st</td>
      <td><a href="https://results.raceroster.com/v3/events/b2fkqnxdb2ysnedr/race/253067?filter_search=">3:07:56</a></td>
    </tr>
    <tr>
      <td>April 19</td>
      <td>Her Majesty’s Royal Race</td>
      <td>10k</td>
      <td>1st</td>
      <td><a href="https://results.raceroster.com/v3/events/7zx4vycgbr4vtwpn/race/253011?filter_search=">32:29</a></td>
    </tr>
    <tr>
      <td>April 26</td>
      <td>Mississauga Half Marathon</td>
      <td>Half Marathon</td>
      <td>1st</td>
      <td><a href="https://results.raceroster.com/v3/events/7zx4vycgbr4vtwpn/race/253011?filter_search=">1:08:08</a></td>
    </tr>
    <tr>
      <td>May 3</td>
      <td>Vancouver Marathon</td>
      <td>Marathon</td>
      <td>4th</td>
      <td><a href="https://results.raceroster.com/v3/events/7zx4vycgbr4vtwpn/race/253011?filter_search=">2:25:39</a></td>
    </tr>
  </tbody>
</table>

<p>With ~230km raced in the span of just 23 days Sergio has all the potential to decimate the 100M course record. It may not be a question of if he breaks it, but rather by how much.</p>

<hr />

<h2 id="the-team-competition-regional-supremacy">The Team Competition: Regional Supremacy</h2>

<p>My favourite part of cross country is the team competition. I almost think team competition is essential to sport. Trail running rarely frames itself as a team sport, but I’ve always felt a strong regional identity at races. With that in mind, I tried to create three groupings from these regions:</p>

<ul>
  <li><strong>Locals</strong> (Hamilton, Dundas and along the Guelph to Niagara corridor)</li>
  <li><strong>GTA</strong></li>
  <li><strong>Invaders</strong></li>
</ul>

<p><img src="/assets/img/sulphur-springs/ontario_squad_map.png" alt="Ontario Squad Map" /></p>

<p>The entrant split by region is reasonably close with these groupings setting up a fair competition.</p>

<p><img src="/assets/img/sulphur-springs/regional_2026_entries.png" alt="2026 Regional Entries" /></p>

<p>Then using cross-country scoring rules (lowest score wins, top 3 per gender per region count) across 2010–2025, we can see which teams historically have been the best.</p>

<h3 id="historical-dominance">Historical dominance</h3>

<p><img src="/assets/img/sulphur-springs/regional_yearly_men.png" alt="Men's Yearly Regional XC Champion" /></p>

<p><img src="/assets/img/sulphur-springs/regional_yearly_women.png" alt="Women's Yearly Regional XC Champion" /></p>

<ul>
  <li><strong>Men</strong>: Invaders dominates the 100M (8 of 10 years); Locals and GTA trade wins in shorter distances.</li>
  <li><strong>Women</strong>: GTA controls the 50k; Locals spike in the 100M when present; Invaders increasingly dominates longer races.</li>
</ul>

<h3 id="2026-projected-team-scoring">2026 Projected Team Scoring</h3>

<p>Using the predicted finish times from a UTMB index based model, we can simulate the cross-country scoring for each region. Top 3 finishers per region score their predicted finishing position (1st = 1 point), lowest total wins.</p>

<table>
  <thead>
    <tr>
      <th>Distance</th>
      <th>Men’s Winner</th>
      <th>Score</th>
      <th>Women’s Winner</th>
      <th>Score</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>50k</td>
      <td><strong>Locals</strong></td>
      <td>8 pts</td>
      <td><strong>Locals</strong></td>
      <td>10 pts</td>
    </tr>
    <tr>
      <td>50M</td>
      <td><strong>Locals</strong></td>
      <td>6 pts</td>
      <td><strong>Locals</strong></td>
      <td>10 pts</td>
    </tr>
    <tr>
      <td>100K</td>
      <td><strong>Invaders</strong></td>
      <td>11 pts</td>
      <td><strong>Invaders</strong></td>
      <td>8 pts</td>
    </tr>
    <tr>
      <td>100M</td>
      <td><strong>GTA</strong></td>
      <td>6 pts</td>
      <td><strong>Invaders</strong></td>
      <td>6 pts</td>
    </tr>
  </tbody>
</table>

<p>The historical pattern holds—but with a twist:</p>

<ul>
  <li><strong>Locals</strong> are projected to dominate the short distances for both genders, sweeping the men’s 50M, with closer battles in the women’s 50k and 50M.</li>
  <li><strong>GTA</strong> is projected to score an almost-unprecedented 6 points in the Men’s 100M—a projected 1-2-3 finish from Sergio Ráez Villanueva, Edmund Heung, and Matt Tribe.</li>
  <li><strong>Invaders</strong> continues to dominate the longer distances, led by Karen Holland, Star Hofer, and Caitlin McAuliffe (100K) and Molly Hurford, Joanne Moon, and Gesine Freund (100M).</li>
</ul>

<p><strong>Projected overall tally: Locals 4, GTA 1, Invaders 3.</strong> The locals’ home-course advantage pays off and reclaims the “championship” after a long drought.</p>

<hr />

<h2 id="predicted-top-10-finishers">Predicted Top 10 Finishers</h2>

<p>To make the team prediction and to help drive the storylines I put together some predictions for the times and placings for each ultra. As we say in the modelling world, all models are wrong, but some are useful. I hope anyone that sees themselves on this list feels inspired to beat the prediction. It will be interesting to see how different the final results are to these predictions to see who had outstanding days.</p>

<p>To remove my own biases and scale this effort, every prediction is built from three signals, blended together based on how much we trust each one:</p>

<ol>
  <li>
    <p><strong>UTMB Performance Index</strong> — A global fitness score maintained by UTMB. I’ve trained a regression model on <strong>3,324 historical index-to-finish-time pairs</strong> from Sulphur Springs (2010–2025) to translate each runner’s index into a predicted speed on this specific course. Runners without an official index are marked with <code class="language-plaintext highlighter-rouge">-</code>; those with a proxy score derived from external race benchmarks are marked with <code class="language-plaintext highlighter-rouge">*</code>.</p>
  </li>
  <li>
    <p><strong>Past Sulphur Springs results</strong> — Nothing predicts Sulphur like Sulphur. If a runner has raced this exact distance here within the past two years, that result is treated as the strongest signal. If the index suggests they should be <em>slower</em> than what they actually ran, we ignore the index entirely. If the index suggests they should be <em>faster</em>, we weight it heavily (70%) but keep their Sulphur time in the mix (30%) as insurance against a bad day. Older results are still used, but receive a 2% per-year slowdown penalty after a 3-year grace window.</p>
  </li>
  <li>
    <p><strong>Distance crossover scaling</strong> — For runners stepping up or down (e.g., a 50k runner entering the 100K), we scale their past time using <a href="https://en.wikipedia.org/wiki/Peter_Riegel">Riegel’s ultrarunning fatigue formula</a>, which applies an exponential penalty as distances increase.</p>
  </li>
</ol>

<p>For strong runners I’ve recognized in the startlist that don’t have a UTMB index, or a result at Sulphur Springs, I’ve assigned them a proxy UTMB index based on their known results in other races. For cases like Sergio in the 100M, I’ve taken people he’s run against in other races that have a UTMB index and assigned him a score based on how closely he’s finished relative to them.</p>

<h3 id="100-mile-trail-race">100 Mile Trail Race</h3>

<h4 id="mens-field">Men’s Field</h4>
<p><em>Men’s Course Record: Paul Vanoostveen, 14:38:23 (2025)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Sergio Ráez Villanueva</strong></td>
      <td>13:55:13</td>
      <td>838.0*</td>
      <td>-</td>
      <td>-</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Edmund Heung</strong></td>
      <td>16:09:06</td>
      <td>-</td>
      <td>84.69</td>
      <td>50M (2023): 2nd in 7:38:27</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Matt Tribe</strong></td>
      <td>16:29:57</td>
      <td>693.0</td>
      <td>79.36</td>
      <td>100km (2023): 2nd in 9:23:34</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Elias Kibreab</strong></td>
      <td>16:42:46</td>
      <td>700.0</td>
      <td>93.85</td>
      <td>100km (2025): 3rd in 8:47:58</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>John Cole</strong></td>
      <td>17:37:04</td>
      <td>610.0</td>
      <td>70.84</td>
      <td>100mi (2025): 3rd in 17:37:04</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Ryan Flint</strong></td>
      <td>18:16:58</td>
      <td>636.0</td>
      <td>88.8</td>
      <td>100mi (2025): 4th in 18:24:26</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Eugenio Parra</strong></td>
      <td>18:23:55</td>
      <td>630.0</td>
      <td>94.52</td>
      <td>-</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Nicolas Cazelais</strong></td>
      <td>18:45:36</td>
      <td>-</td>
      <td>68.3</td>
      <td>50km (2023): 20th in 5:18:30</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Juan Zorrilla</strong></td>
      <td>18:59:50</td>
      <td>562.0</td>
      <td>100.0</td>
      <td>50M (2025): 1st in 7:36:46</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Juan Aja Aguinaco</strong></td>
      <td>20:57:12</td>
      <td>508.0</td>
      <td>-</td>
      <td>50km (2025): 25th in 5:01:21</td>
    </tr>
  </tbody>
</table>

<h4 id="womens-field">Women’s Field</h4>
<p><em>Women’s Course Record: Amanda Nelson, 17:18:58 (2024)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Molly Hurford</strong></td>
      <td>17:22:53</td>
      <td>617.0</td>
      <td>97.56</td>
      <td>100mi (2025): 1st in 17:22:53</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Joanne Moon</strong></td>
      <td>22:28:48</td>
      <td>487.0</td>
      <td>87.86</td>
      <td>100mi (2025): 8th in 22:28:48</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Gesine Freund</strong></td>
      <td>22:59:49</td>
      <td>463.0</td>
      <td>76.39</td>
      <td>100km (2025): 10th in 12:56:32</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Esther Hagerman</strong></td>
      <td>23:09:44</td>
      <td>471.0</td>
      <td>82.43</td>
      <td>50km (2022): 11th in 5:23:17</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Bethany McRae</strong></td>
      <td>23:47:33</td>
      <td>-</td>
      <td>74.67</td>
      <td>100km (2023): 11th in 15:41:34</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Jessica Lee</strong></td>
      <td>23:51:39</td>
      <td>478.0</td>
      <td>68.52</td>
      <td>-</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Val Bauman</strong></td>
      <td>24:44:00</td>
      <td>434.0</td>
      <td>77.56</td>
      <td>100km (2025): 16th in 13:16:00</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Saskia Mattern</strong></td>
      <td>24:45:00</td>
      <td>433.0</td>
      <td>72.04</td>
      <td>100km (2025): 15th in 13:13:58</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Ashley Sametz</strong></td>
      <td>26:03:46</td>
      <td>443.0</td>
      <td>73.78</td>
      <td>100km (2025): 11th in 13:01:57</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Larissa Chankseliani</strong></td>
      <td>26:44:34</td>
      <td>430.0</td>
      <td>73.77</td>
      <td>100mi (2025): 13th in 27:41:40</td>
    </tr>
  </tbody>
</table>

<h3 id="100k-trail-race">100K Trail Race</h3>

<h4 id="mens-field-1">Men’s Field</h4>
<p><em>Men’s Course Record: Robert Brouillette, 8:07:22 (2025)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Brian Putre</strong></td>
      <td>09:16:55</td>
      <td>617.0</td>
      <td>74.97</td>
      <td>100km (2025): 4th in 9:16:55</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Liam Walke</strong></td>
      <td>09:35:41</td>
      <td>635.0</td>
      <td>78.55</td>
      <td>50km (2018): 7th in 4:16:27</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Cullen Price</strong></td>
      <td>09:46:01</td>
      <td>599.0</td>
      <td>80.29</td>
      <td>50km (2025): 5th in 4:12:53</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Ryan Niclasen</strong></td>
      <td>09:47:43</td>
      <td>640.0</td>
      <td>89.81</td>
      <td>100mi (2025): 2nd in 16:47:43</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Danny Tresise</strong></td>
      <td>10:07:22</td>
      <td>633.0</td>
      <td>88.89</td>
      <td>100mi (2025): 6th in 19:22:12</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Jordan Bierema</strong></td>
      <td>10:10:12</td>
      <td>589.0</td>
      <td>89.07</td>
      <td>100km (2025): 26th in 11:40:11</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Perry Curiston</strong></td>
      <td>10:30:41</td>
      <td>545.0</td>
      <td>71.53</td>
      <td>100km (2025): 8th in 10:30:41</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>James Orr</strong></td>
      <td>10:38:42</td>
      <td>570.0</td>
      <td>58.15</td>
      <td>50M (2024): 3rd in 8:14:31</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Lawrence Warriner</strong></td>
      <td>10:49:44</td>
      <td>529.0</td>
      <td>75.72</td>
      <td>100km (2025): 11th in 10:49:44</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Tyler Chacra</strong></td>
      <td>10:53:22</td>
      <td>525.0</td>
      <td>91.36</td>
      <td>100km (2025): 12th in 10:53:22</td>
    </tr>
  </tbody>
</table>

<h4 id="womens-field-1">Women’s Field</h4>
<p><em>Women’s Course Record: Karen Holland, 9:55:14 (2025)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Karen Holland</strong></td>
      <td>09:43:09</td>
      <td>637.0</td>
      <td>88.29</td>
      <td>100km (2025): 1st in 9:55:14</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Star Hofer</strong></td>
      <td>11:43:05</td>
      <td>528.0</td>
      <td>78.57</td>
      <td>100km (2025): 7th in 11:59:56</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Caitlin McAuliffe</strong></td>
      <td>12:11:23</td>
      <td>-</td>
      <td>75.48</td>
      <td>50km (2022): 16th in 5:39:11</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Kirsten Clement</strong></td>
      <td>12:29:45</td>
      <td>469.0</td>
      <td>78.73</td>
      <td>50km (2025): 10th in 5:26:07</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Vicki Mayberry</strong></td>
      <td>12:58:57</td>
      <td>471.0</td>
      <td>72.0</td>
      <td>-</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Trina Boisvenue</strong></td>
      <td>13:13:22</td>
      <td>468.0</td>
      <td>72.72</td>
      <td>100km (2025): 19th in 13:35:31</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Donna Dowsett</strong></td>
      <td>13:39:45</td>
      <td>-</td>
      <td>74.24</td>
      <td>50km (2018): 18th in 5:52:31</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Linda Trinh</strong></td>
      <td>13:49:44</td>
      <td>430.0</td>
      <td>76.74</td>
      <td>100km (2025): 21st in 13:49:44</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Susan Munn</strong></td>
      <td>14:05:19</td>
      <td>408.0</td>
      <td>70.84</td>
      <td>100km (2025): 23rd in 14:05:19</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Niki Lanz</strong></td>
      <td>14:18:03</td>
      <td>409.0</td>
      <td>68.67</td>
      <td>50km (2025): 45th in 6:13:14</td>
    </tr>
  </tbody>
</table>

<h3 id="50-mile-trail-race">50 Mile Trail Race</h3>

<h4 id="mens-field-2">Men’s Field</h4>
<p><em>Men’s Course Record: Michael Daigeaun, 6:07:00 (2013)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Will Gilmet</strong></td>
      <td>08:03:38</td>
      <td>583.0</td>
      <td>93.51</td>
      <td>50km (2025): 8th in 4:20:36</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Travis Marks</strong></td>
      <td>08:37:56</td>
      <td>524.0</td>
      <td>70.21</td>
      <td>50km (2025): 15th in 4:52:12</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Eric Tiedje</strong></td>
      <td>08:57:51</td>
      <td>520.0</td>
      <td>71.13</td>
      <td>-</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Andrew Douglas</strong></td>
      <td>08:58:58</td>
      <td>533.0</td>
      <td>91.28</td>
      <td>100km (2024): 10th in 11:25:37</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Brian Gauthier</strong></td>
      <td>09:13:18</td>
      <td>523.0</td>
      <td>59.84</td>
      <td>50km (2025): 58th in 5:51:31</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Ken Akune</strong></td>
      <td>09:27:55</td>
      <td>476.0</td>
      <td>68.38</td>
      <td>50km (2025): 33rd in 5:18:20</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Evan Johnston</strong></td>
      <td>09:30:17</td>
      <td>474.0</td>
      <td>63.54</td>
      <td>50km (2025): 34th in 5:19:38</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Isaac Herrera</strong></td>
      <td>10:22:20</td>
      <td>436.0</td>
      <td>55.59</td>
      <td>50km (2025): 59th in 5:52:01</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Allan Williams</strong></td>
      <td>10:26:54</td>
      <td>435.0</td>
      <td>69.57</td>
      <td>50km (2025): 93rd in 6:20:50</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Norm Stephen</strong></td>
      <td>10:31:24</td>
      <td>427.0</td>
      <td>70.22</td>
      <td>50M (2019): 45th in 11:07:06</td>
    </tr>
  </tbody>
</table>

<h4 id="womens-field-2">Women’s Field</h4>
<p><em>Women’s Course Record: Julie Hamulecki, 7:08:03 (2024)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Angela Pasceri</strong></td>
      <td>09:59:56</td>
      <td>458.0</td>
      <td>70.65</td>
      <td>100km (2025): 20th in 13:39:49</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Jessica Shao</strong></td>
      <td>10:04:19</td>
      <td>449.0</td>
      <td>74.34</td>
      <td>50km (2025): 15th in 5:41:21</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Jada Carr</strong></td>
      <td>10:36:06</td>
      <td>439.0</td>
      <td>69.32</td>
      <td>-</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Paulina Karwowska</strong></td>
      <td>10:55:37</td>
      <td>-</td>
      <td>70.12</td>
      <td>50km (2022): 35th in 6:24:28</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Shirin Niroomand</strong></td>
      <td>10:59:44</td>
      <td>409.0</td>
      <td>64.3</td>
      <td>50km (2025): 41st in 6:11:06</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Dylan Magner</strong></td>
      <td>11:07:55</td>
      <td>405.0</td>
      <td>67.87</td>
      <td>50km (2025): 47th in 6:17:55</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Jisun Hahn</strong></td>
      <td>11:12:51</td>
      <td>413.0</td>
      <td>68.77</td>
      <td>50km (2025): 56th in 6:40:34</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Edda Oviedo</strong></td>
      <td>11:36:27</td>
      <td>401.0</td>
      <td>61.98</td>
      <td>50km (2025): 65th in 6:59:23</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Yi Xiao</strong></td>
      <td>12:12:44</td>
      <td>367.0</td>
      <td>61.51</td>
      <td>50km (2025): 62nd in 6:53:07</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Erica Swirsky</strong></td>
      <td>12:19:42</td>
      <td>376.0</td>
      <td>100.0</td>
      <td>100km (2024): 23rd in 15:34:00</td>
    </tr>
  </tbody>
</table>

<h3 id="50k-trail-race">50k Trail Race</h3>

<h4 id="mens-field-3">Men’s Field</h4>
<p><em>Men’s Course Record: Alex Forte, 3:23:02 (2025)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Dylan Pust</strong></td>
      <td>03:29:05</td>
      <td>814.0</td>
      <td>94.78</td>
      <td>100km (2024): 2nd in 8:29:04</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Matthew Farquharson</strong></td>
      <td>03:37:01</td>
      <td>784.0</td>
      <td>91.68</td>
      <td>50km (2023): 2nd in 3:46:05</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Jeremy Walsh</strong></td>
      <td>03:52:19</td>
      <td>733.0</td>
      <td>86.19</td>
      <td>50km (2022): 7th in 4:05:18</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Jonathan Gray</strong></td>
      <td>03:58:24</td>
      <td>679.0*</td>
      <td>90.26</td>
      <td>-</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Taylor Reid</strong></td>
      <td>04:10:38</td>
      <td>644.0*</td>
      <td>44.24</td>
      <td>-</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Eric Ashby</strong></td>
      <td>04:31:23</td>
      <td>558.0</td>
      <td>73.64</td>
      <td>50km (2025): 11th in 4:31:23</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Andriy Yatsynych</strong></td>
      <td>04:31:28</td>
      <td>559.0</td>
      <td>65.65</td>
      <td>50km (2024): 10th in 4:31:28</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Nick Ridpath</strong></td>
      <td>04:39:19</td>
      <td>574.0</td>
      <td>81.11</td>
      <td>-</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Wayne Westby</strong></td>
      <td>04:40:41</td>
      <td>571.0</td>
      <td>78.05</td>
      <td>-</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Andrew Norman</strong></td>
      <td>04:42:33</td>
      <td>567.0</td>
      <td>61.07</td>
      <td>-</td>
    </tr>
  </tbody>
</table>

<h4 id="womens-field-3">Women’s Field</h4>
<p><em>Women’s Course Record: Christina Clark, 3:59:46 (2012)</em></p>

<table>
  <thead>
    <tr>
      <th>Rank</th>
      <th>Athlete</th>
      <th>Predicted Time</th>
      <th>UTMB/ITRA Index</th>
      <th>US Score (Proxy)</th>
      <th>Past Result</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>1</td>
      <td><strong>Bridget Cobham</strong></td>
      <td>04:55:45</td>
      <td>512.0</td>
      <td>95.39</td>
      <td>50km (2024): 2nd in 4:55:45</td>
    </tr>
    <tr>
      <td>2</td>
      <td><strong>Bethany McChesney</strong></td>
      <td>04:58:28</td>
      <td>541.0</td>
      <td>80.66</td>
      <td>-</td>
    </tr>
    <tr>
      <td>3</td>
      <td><strong>Ingrid Perugachi</strong></td>
      <td>05:05:49</td>
      <td>527.0</td>
      <td>83.21</td>
      <td>-</td>
    </tr>
    <tr>
      <td>4</td>
      <td><strong>Dorothy Apedaile</strong></td>
      <td>05:07:37</td>
      <td>516.0</td>
      <td>81.77</td>
      <td>50km (2025): 7th in 5:07:37</td>
    </tr>
    <tr>
      <td>5</td>
      <td><strong>Jamie McGill Worsley</strong></td>
      <td>05:39:23</td>
      <td>471.0</td>
      <td>75.89</td>
      <td>50km (2025): 13th in 5:39:34</td>
    </tr>
    <tr>
      <td>6</td>
      <td><strong>Leanne MacFayden</strong></td>
      <td>05:39:58</td>
      <td>470.0</td>
      <td>74.09</td>
      <td>-</td>
    </tr>
    <tr>
      <td>7</td>
      <td><strong>Tamara Robb</strong></td>
      <td>05:50:58</td>
      <td>454.0</td>
      <td>71.73</td>
      <td>-</td>
    </tr>
    <tr>
      <td>8</td>
      <td><strong>Benita Whyte</strong></td>
      <td>05:53:07</td>
      <td>451.0</td>
      <td>68.82</td>
      <td>-</td>
    </tr>
    <tr>
      <td>9</td>
      <td><strong>Maria Musselman</strong></td>
      <td>05:53:25</td>
      <td>433.0</td>
      <td>72.54</td>
      <td>50km (2025): 27th in 5:53:25</td>
    </tr>
    <tr>
      <td>10</td>
      <td><strong>Rina Atienza</strong></td>
      <td>05:54:57</td>
      <td>476.0</td>
      <td>73.08</td>
      <td>50M (2025): 1st in 9:00:58</td>
    </tr>
  </tbody>
</table>

<p>* <em>Indicates a proxy UTMB index.</em></p>

<hr />

<h2 id="the-western-states-equation">The Western States Equation</h2>

<p>Sulphur Springs is one of the few races in Canada that can earn its finishers a lottery ticket for the <a href="https://www.wser.org/">Western States 100-Mile Endurance Run</a>. To get a ticket, 100 Mile runners need to finish in 30 hours, and 100k must finish in 18 hours. In order to become a qualifying race, Sulphur Springs needed 100 finishers in the 100M and/or 100k distances. Doing a quick projection for 2026, the finishers will be well over the threshold for both distances.</p>

<p><img src="/assets/img/sulphur-springs/sulphur_wser_projections.png" alt="WSER Finisher Projections" /></p>

<p>Registration has gone from 269 entrants (226 who finished) in the 100K and 169 entrants (96 who finished) in the 100M in 2025, up to 315 and 218 respectively for 2026. Suffice to say the Western States qualifier is all but guaranteed for this year.</p>

<hr />

<h3 id="geographical-catchment-flow">Geographical Catchment Flow</h3>

<p>In a <a href="https://jeremywalsh.ca/2026/02/24/sulphur-springs-2025-data-analysis.html#a-massive-catchment-area">previous post</a> I mapped out the geographical catchment of the Western States Endurance Run qualifier races. One of the interesting findings was the recent inclusion of Sulphur Springs, which serves the massive population of Ontario. We can see that 2026 the locals are again showing up in high numbers to get their ticket to the big dance.</p>

<p>Mapping 100K and 100M starters against the Western States North American catchment highlights the importance of this race as the only qualifier in Ontario and one of only a handful in Canada.</p>

<p><img src="/assets/img/sulphur-springs/sulphur_catchment_map.png" alt="Sulphur Springs Voronoi Catchment" /></p>

<hr />

<h2 id="the-legendary-returners">The Legendary Returners</h2>

<p>For many athletes, Sulphur Springs is a yearly tradition. By examining lifetime finishes, I’ve identified a group of “Sulphur Hall of Famers”.</p>

<p><img src="/assets/img/sulphur-springs/sulphur_returners.png" alt="Hall of Fame Returnees" /></p>

<p>I only planned on writing this post about the elites, but seeing that Ronald Gehl has run 19 ultras at Sulphur Springs over the past 30 years is amazing to me. My finger got tired from scrolling his <a href="https://ultrarunning.com/calendar/runner/view/Ronald-Gehl-c2df8892-0fd8-11ea-91a0-624db84c3c72">results</a>. He’s been awarded the <a href="https://outrace.ca/norm-patenaude-award-recipients/">Norm Patenaude Award</a> 29 times for finishing 7 or more Ontario ultras a year. At the age of 79 he’s signed up again this year in the 50 mile to continue a 4 year streak at that distance. He’s one year shy of being the oldest Sulphur finisher after Hans Maier who ran the 50k in 2019 as an 80 year old. Long after the winners are done, he’s still out there—that’s the heart of Sulphur Springs.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[A data-driven look at the contenders and storylines for the 2026 Sulphur Springs 100 Mile trail race.]]></summary></entry><entry><title type="html">Sulphur Springs 2025 Data Analysis</title><link href="https://jeremywalsh.ca/2026/02/24/sulphur-springs-2025-data-analysis.html" rel="alternate" type="text/html" title="Sulphur Springs 2025 Data Analysis" /><published>2026-02-24T12:56:54-05:00</published><updated>2026-02-24T12:56:54-05:00</updated><id>https://jeremywalsh.ca/2026/02/24/sulphur-springs-2025-data-analysis</id><content type="html" xml:base="https://jeremywalsh.ca/2026/02/24/sulphur-springs-2025-data-analysis.html"><![CDATA[<h1 id="the-evolution-of-a-canadian-classic">The evolution of a Canadian Classic</h1>

<p>The 2025 Canadian ultramarathon season was bigger than ever, with 200+ races and over 15,000 finishers. But amidst this explosive growth, the <strong>Sulphur Springs Trail Run</strong> has managed something rare: it hasn’t just grown in size—it has fundamentally evolved in quality.</p>

<p>The Sulphur Springs trail system in Ancaster has housed this event for <a href="https://sulphursprings.burlingtonrunners.com/">33 years</a>, making it the oldest running ultramarathon in Canada. The race features a 20K loop course on mainly groomed trails with some technical elements, and a not-insignificant amount of hills. Each loop of the Sulphur Springs 20km course has 500m of elevation gain.</p>

<p>From shattering decades-old course records to serving as an essential gateway to the legendary Western States 100, here is why my local ultra is still one of the most significant events in Canada. By analyzing the 2025 results alongside decades of historical data, we can see exactly how this race has transformed from a local gathering into a national powerhouse.</p>

<hr />

<h2 id="1-the-performance-surge">1. The Performance Surge</h2>

<p>Trail and ultra running has had steady growth for over a decade, and Sulphur Springs is certainly a large part of that. Taking all the <a href="https://statistik.d-u-v.org/geteventlist.php?year=2025\&amp;dist=all\&amp;country=CAN\&amp;Submit.x=18\&amp;Submit.y=5\&amp;label=\&amp;surface=all\&amp;sort=1\&amp;from=\&amp;to=">Canadian Ultra results listed on DUV </a>(ultramarathon results platform), Sulphur Springs has grown alongside Canadian ultras.</p>

<p><img src="/uploads/SulphurSprings/national_growth_trend.png" alt="" /></p>

<p>When we compare Sulphur Springs to every other race in Canada using our DUV Results Database, its scale is undeniable.</p>

<ul>
  <li><strong>#4 in Canada</strong> for Total Finishers (723).</li>
  <li><strong>#3 in Canada</strong> for Cumulative Distance (<strong>60,806 km</strong> covered).</li>
  <li><strong>#1 in Canada</strong> for 100k and 100M participation.</li>
</ul>

<p>This analysis focuses exclusively on these ultra distances (50k, 50M, 100k, and 100M) due to data accessibility, meaning the ranking doesn’t even account for the high participation in the sub-ultra events, which saw an additional 355 finishers in the 20k and 255 in the 10k.</p>

<p><img src="/uploads/SulphurSprings/participation_comparison.png?v=2" alt="" /></p>

<p>When I sum up the distance that every finisher had for all the distances, Sulphur Springs moves up to 3rd largest ultra event in Canada.</p>

<p><img src="/uploads/SulphurSprings/distance_agg_comparison.png?v=2" alt="" /></p>

<h3 id="national-ranking-men">National Ranking: Men</h3>

<p>The Sulphur Springs men’s winners established themselves as national leaders in 2025:</p>

<ul>
  <li><strong>Alex Forte</strong> clocked a <strong>3:23:02</strong> 50k, placing his time among the Top 5 fastest 50k finishes in Canada.</li>
  <li><strong>Juan Zorrilla</strong> commanded the 50-Mile race in <strong>7:36:46</strong>, securing the fastest 50M time of the year.</li>
  <li><strong>Robert Brouillette</strong> secured the 100k win in <strong>8:07:22</strong>, a time that ranked 2nd on the national leaderboard for 2025, and helped land him on Team Canada for the World 50km Championships in 2026.</li>
  <li><strong>Paul Vanoostveen</strong> ran a blistering <strong>14:38:23</strong> in the 100-Miler, establishing himself as one of the elite long-distance performers in Canada, hinting towards to his eventual <a href="https://runningmagazine.ca/the-scene/toronto-man-runs-nearly-1000-kilometres-for-free-burritos-and-cold-baths/">Burrito League dominance</a>.</li>
</ul>

<p><img src="/uploads/SulphurSprings/national_benchmarks_men.png" alt="" /></p>

<h3 id="national-ranking-women">National Ranking: Women</h3>

<p>The women’s field at Sulphur was equally deep, producing national-caliber performances across all distances:</p>

<ul>
  <li><strong>Tanis Bolton</strong> secured the 50k victory in <strong>4:14:07</strong>, falling just outside of the top-5.</li>
  <li><strong>Rina Atienza</strong> took the 50-Mile crown in <strong>9:00:58</strong>, securing the #2 spot in the national rankings.</li>
  <li><strong>Karen Holland</strong> won the 100k in <strong>9:55:14</strong>, for the third fastest 100k in 2025.</li>
  <li><strong>Molly Hurford</strong> <a href="https://runningmagazine.ca/trail-running/ontarios-molly-hurford-wins-sulphur-springs-100-miler/">dominated the 100-Miler</a> in <strong>17:22:53</strong>, placing her firmly at the top of the women, and faster than all but two men for 2025 across Canada.</li>
</ul>

<p><img src="/uploads/SulphurSprings/national_benchmarks_women.png" alt="" title="Women's Top 10 Times in Canada" /></p>

<p>All of these amazing times were done on a course few would choose to run a fast time. It’s all gravel trail, some technical elements, and more than enough elevation gain to ward off a pure speed race. Each loop of the Sulphur Springs 20km course has 500m of elevation gain. To try to accurately reflect that let’s recalculate these times by normalizing for elevation gain with Grade Adjusted Pace.</p>

<h3 id="grade-adjusted-pace-gap">Grade Adjusted Pace (GAP)</h3>

<p>Comparing Grade Adjusted Pace (GAP) to actual pace reveals how the terrain uniquely impacts the field, and how the top performers manage their effort across the climbs and descents. GAP estimates what an athlete’s pace would be on flat ground, accounting for the extra energy required to run uphill and the reduced effort on downhills.</p>

<p>I’ve taken the GAP listed for every runner with a Strava race file. Strava uses a heart-rate equivalency model — a great walkthrough of the approach is in <a href="https://aaron-schroeder.github.io/reverse-engineering/grade-adjusted-pace.html">Aaron Schroeder’s GAP analysis</a>. Effectively everyone looks a little faster, but especially those that had a lot of elevation like at Sulphur Springs.</p>

<p><img src="/uploads/SulphurSprings/mens_gap_comparison.png" alt="" /></p>

<p>Applying the same model to the women’s field, we can see just how intense the adjusted efforts are at the front of the pack, showing that the Ancaster hills require an energy output equal to much faster paces on flat courses.</p>

<p><img src="/uploads/SulphurSprings/womens_gap_comparison.png" alt="" /></p>

<hr />

<h2 id="2-the-fastest-100-mile-in-canada">2. The fastest 100 Mile in Canada?</h2>

<p>Sulphur Springs doesn’t just produce fast times occasionally—it has consistently dominated the national leaderboard for over 15 years, particularly in 100M distance.</p>

<p>This <strong>“Top 10 Density”</strong> analysis tracks how many of the fastest 10 times in Canada each year come from Sulphur Springs. For the men’s 100M, Sulphur Springs has placed an average of <strong>4 runners in the national Top 10</strong> each year since 2010, peaking at <strong>9 out of 10</strong> in both 2013 and 2025. For the women’s 100M, the average is nearly <strong>4 per year</strong>, reaching <strong>8 out of 10</strong> in 2025.</p>

<p><img src="/uploads/SulphurSprings/sulphur_top10_density_men.png" alt="" />
<img src="/uploads/SulphurSprings/sulphur_top10_density_women.png" alt="" /></p>

<p>Taking all 100M results in Canada since 2010 (chosen due to issues getting data earlier), we see that Sulphur is consistently the top 100M in the nation.</p>

<p><img src="/uploads/SulphurSprings/historical_national_benchmarks_men.png" alt="" />
<img src="/uploads/SulphurSprings/historical_national_benchmarks_women.png" alt="" /></p>

<p>One of the reasons for this is the relatively low elevation gain compared to other 100-mile races in Canada. We can see this by comparing the elevation gain and the winning times of all the 100-mile races in Canada in 2025.</p>

<p><img src="/uploads/SulphurSprings/100m_elevation_winning_times.png" alt="" /></p>

<p>This trend validates Sulphur Springs as the definitive performance anchor for the country. If you want to run a fast Canadian ultra in spite of hills, you come to Ancaster in May.</p>

<hr />

<h2 id="3-comparing-performances-and-distances-at-sulphur">3. Comparing performances and distances at Sulphur</h2>

<p>It’s a tricky proposition to compare times from trail races. Even running in the same location, on the same weekend, it’s hard to compare a 20km to a 100M. One is 8 times as far and often takes 12 times as long to finish. There are a few metrics used in our sport to handle these comparisons, and the one that is most broadly applied for Sulphur Springs is the UTMB Index. It uses a private formula that includes features like course difficulty (distance and elevation) and the runner’s finishing time. By comparing the UTMB Index we can get an estimate of the most impressive performances at Sulphur Springs.</p>

<h3 id="the-top-10-breakdown">The Top 10 Breakdown</h3>

<p>Looking at just the most recent years when the Index was most readily applied we can rank the top scores. At a glance these tell us which events have seen the highest competition in each gender and who had the most impressive performances.</p>

<p><img src="/uploads/SulphurSprings/utmb_top_10_men.png" alt="" /></p>

<p>While the 50k has produced the best individual and group performances for the men, the 100M and 20k take more of a center stage for the women.<img src="/uploads/SulphurSprings/utmb_top_10_women.png" alt="" /></p>

<h3 id="course-records">Course Records</h3>

<p>This talent naturally leads to shattered records. Comparing the progression of course records over time, the impact of this new competitive era is striking, with many of the records being updated in the past few years.</p>

<p><img src="/uploads/SulphurSprings/course_records.png" alt="" /></p>

<p>I wanted to look at which of these course records was the most difficult to beat by comparing the top UTMB Index for each event. I wasn’t surprised to see Alex and Amanda on top, but as I examined the rest I noticed one of the quirks of using the Index.</p>

<p><img src="/uploads/SulphurSprings/utmb_course_records.png" alt="" /></p>

<p>The UTMB Index ranks performances based on field quality and race conditions, not solely on finish time. In two cases, the highest UTMB Index doesn’t belong to the actual course record holder: Allison Thompson holds the Women’s 20k CR (1:29:26, 2022) but Tanis Bolton achieved a higher UTMB Index (664 vs 656) in 2024, and Robert Brouillette holds the Men’s 100k CR (8:07:22, 2025) but Elias Kibreab posted a higher UTMB Index (735 vs 729) in 2023.</p>

<h3 id="how-deep-is-the-depth---race-competitiveness-index">How Deep is the Depth? - Race Competitiveness Index</h3>

<p>Using the UTMB Index, I calculated the <strong>Race Competitiveness Index (RCI)</strong> for Sulphur Springs over the years. This metric, pioneered by <a href="https://fantasy.freetrail.com/news/2026/01/realized-competition-indexhttps://fantasy.freetrail.com/news/2026/01/realized-competition-index">Travis Loncar at Freetrail,</a> calculates the field structure by taking the top finishers, averaging their UTMB Index score, and subtracting by a standard deviation.</p>

<p>The results show a clear upward trajectory in the 100k and 100M, while remaining relatively flat for the already highly competitive 20k and 50k. The depth of competition is increasing, meaning it requires a higher and higher UTMB Index to finish in the top 10.</p>

<p><img src="/uploads/SulphurSprings/rci_trends_regression.png" alt="" /></p>

<h3 id="finish-time-distributions">Finish Time Distributions</h3>

<p>Looking at the distribution of finish times by placement highlights the density of the competitive field.</p>

<p><img src="/uploads/SulphurSprings/finish_times_by_place_M.png" alt="" /></p>

<p><img src="/uploads/SulphurSprings/finish_times_by_place_F.png" alt="" /></p>

<p>The men’s and women’s times have gotten faster in the past 3 years. This correlates with a drop in race temperature, but due to the increase in RCI over the same period, it looks like there is a significant increase in talent showing up for the races.</p>

<hr />

<h2 id="4-masterclass-in-consistency-how-the-winners-won">4. Masterclass in Consistency: How the Winners Won</h2>

<p>Another way we can look at how competitive the races are is by looking at how perfect the execution had to be in them, and how much the positions changed during the race. The more competitive the races become, the more important it becomes to run as optimally as possible, with minimal pacing errors.</p>

<h3 id="the-mathematics-of-slowing-down">The Mathematics of Slowing Down</h3>

<p>To quantify this consistency, I ran a linear slope analysis on every runner’s lap paces from 2022 to 2025 to see exactly what percentage they “slowed down” from their first lap to their last.</p>

<p>The results highlight a massive disparity between the winners and the rest of the field:</p>

<ul>
  <li><strong>50k Winners</strong> are pacing masters, only slowing down by <strong>1% to 3%</strong> over the entire race (compared to a <strong>22% to 37%</strong> slowdown for the field average).</li>
  <li><strong>100M Winners</strong> slow down by roughly <strong>37% to 55%</strong>, but the rest of the 100-mile field degrades by a brutal <strong>65% to 77%</strong>.</li>
  <li>I even found a rare <strong>negative split</strong> in our dataset—Mike Fickel managed a -7% pace acceleration in his 2022 100k victory.</li>
</ul>

<p><img src="/uploads/SulphurSprings/pacing_slope_bars.png" alt="" /></p>

<p>Generally we see higher slow downs as the races get longer and the race becomes less about metabolic efficiency, and more about mechanical breaking down.</p>

<h3 id="gender-differences-in-execution">Gender Differences in Execution</h3>

<p>When we compare the average slowdown of the Top 3 of both genders directly against each other, they act remarkably similar. Just the 50M as a standout difference, possibly due to the historically weaker competition depth in that race.</p>

<p><img src="/uploads/SulphurSprings/pacing_slope_comparison.png" alt="" /></p>

<h3 id="the-carnage-factor">The Carnage Factor</h3>

<p><strong>“Carnage/Rally Charts”</strong> (an idea from <a href="https://johnsug.substack.com/">John Sugden’s substack Over the Ridgeline</a>) show how many places changed at each lap split. In the chart below tracking the overall Top 10 Finishers, lines that drop indicate runners who went out too hard and lost multiple places late in the race, while others started more conservative and moved into the Top 10 late in the race.</p>

<p><img src="/uploads/SulphurSprings/50k_2025_carnage.png" alt="" /></p>

<p>The takeaway is that the race is getting competitive enough that placing well is no longer just about being the fittest; you have to be smart with your pacing plan as well.</p>

<hr />

<h2 id="5-scale--national-impact">5. Scale &amp; National Impact</h2>

<h3 id="the-2025-circuit-at-a-glance">The 2025 Circuit at a Glance</h3>

<p>When I mapped the entire Canadian season, it becomes clear that Sulphur Springs isn’t just one race among many—it is the best option for the early season. In the visual metrics below, I use marker sizes that are scaled to the total number of finishers for each event, demonstrating the massive footprint Sulphur Springs holds at this time of the year.</p>

<p><img src="/uploads/SulphurSprings/canada_2025_calendar_timeline.png?v=2" alt="" /></p>

<hr />

<h2 id="6-the-western-states-gateway">6. The Western States Gateway</h2>

<p>Perhaps the most significant shift in recent years was Sulphur Springs becoming an official <strong>Western States 100 Qualifier</strong> in 2024. The impact on participation in the “long” distances (100k and 100M) was immediate and massive.</p>

<h3 id="the-qualifier-surge">The Qualifier Surge</h3>

<p>Since joining the WS circuit, the number of finishers aiming for a qualifier spot has skyrocketed, turning the event into a high-stakes arena for runners across the continent.</p>

<p><img src="/uploads/SulphurSprings/ss_long_distance_surge.png?v=4" alt="" /></p>

<p>This shift isn’t just numerical; it represents a fundamental reshaping of the event’s demographics.</p>

<p><img src="/uploads/SulphurSprings/participation_stacked.png" alt="" /></p>

<p>Prior to 2024, the 100k and 100M distances historically made up just 4.4% of the entrants, but now those premier, qualifier-eligible distances account for a striking 33.0% of the entire field.</p>

<p>The importance of getting the standard to enter Western States can be seen by the breakdown of runners above and below the qualifier time. Instead of continuing the bell curve there is a steep dropoff, suggesting that those who were close to missing the standard may have either sped up and snuck under, or chose not to finish. The charts below differentiate the years Sulphur Springs was actually a WS Qualifying race (2024 and 2025) versus the historical finish times mapped against the same hypothetical cutoffs:</p>

<p><img src="/uploads/SulphurSprings/ws_qualifying_100k.png?v=3" alt="" /></p>

<h3 id="a-massive-catchment-area">A Massive Catchment Area</h3>

<p>Why such growth? Because Sulphur Springs serves as the primary qualifier for a massive population center. I did a catchment analysis to see the closest population centers to each Western States qualifying race in North America, and it shows that for millions of people in Ontario, Quebec, and the Northeast US, Sulphur is the closest and most efficient path to Western States.</p>

<p>This analysis was done by retrieving the exact coordinates for the largest population centers in Canada and the US, computing the closest geographical spherical distance (“as the crow flies”) to any Western States qualifying race, and generating a Voronoi diagram based on those closest-race designations.</p>

<p><img src="/uploads/SulphurSprings/race_locations_voronoi.png" alt="" /></p>

<p>As we will see in the next section, though an imperfect model, 89% of the finishers at the Sulphur Springs ultra distances in 2025 came from this catchment.</p>

<hr />

<h2 id="7-the-pull-of-sulphur-springs-where-do-runners-come-from">7. The Pull of Sulphur Springs: Where Do Runners Come From?</h2>

<p>We know Sulphur Springs serves as a massively important Western States qualifier, but who exactly is showing up to the start line?</p>

<p>By geocoding the participant list, I’ve mapped the actual catchment area of the race. The data reveals that while it is an Ontario institution, its pull reaches across the eastern seaboard of the United States and deep into Quebec.</p>

<p><img src="/uploads/SulphurSprings/sulphur_springs_participants.png" alt="" />
<img src="/uploads/SulphurSprings/sulphur_springs_distance_hist.png?v=3" alt="" /></p>

<p>However, if we look at Geographical Diversity normalized by the sheer size of the event (Unique Cities per 100 Finishers), Sulphur Springs reveals its true nature: it is a massive, hyper-local race. Because it sits right on the doorstep of the Greater Toronto Area, its starting corral is heavily saturated by the local population.</p>

<p><img src="/uploads/SulphurSprings/geographical_diversity_normalized.png?v=3" alt="" /></p>

<p>This concentration of local talent has already driven the course records down to highly competitive times year after year. However, as the race continues to build its national prestige as a Western States qualifier—and begins drawing in elite talent from across the country and the US on the same scale as destination races like QMT or UTHC—this race is poised to become absurdly competitive at the front of the pack.</p>

<hr />

<h2 id="conclusion">Conclusion</h2>

<p>Sulphur Springs 2025 wasn’t just another race; it was a demonstration of what happens when a historic event meets elite performance. It is a juggernaut on the Canadian ultra scene, and the data proves it is only getting faster.</p>

<p>Will 2026 see the 100k course records fall again? The starting gun goes off in May. If you want to test yourself against one of the deepest fields in Canada, it’s time to start training for the Ancaster hills.</p>

<p><img src="/uploads/SulphurSprings/FPX18657.jpg" alt="" /></p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[An analysis of the 2025 Sulphur Springs results and how it compares to other Canadian ultras.]]></summary></entry><entry><title type="html">Who of my friends is the best Go-Karter?</title><link href="https://jeremywalsh.ca/2022/06/19/who-of-my-friends-is-the-best-go-karter.html" rel="alternate" type="text/html" title="Who of my friends is the best Go-Karter?" /><published>2022-06-19T00:00:00-04:00</published><updated>2022-06-19T00:00:00-04:00</updated><id>https://jeremywalsh.ca/2022/06/19/who-of-my-friends-is-the-best-go-karter</id><content type="html" xml:base="https://jeremywalsh.ca/2022/06/19/who-of-my-friends-is-the-best-go-karter.html"><![CDATA[<p>My friends and I recently went Go-Karting, and after four 16 minute races we wanted to know who among us was the overall best driver. One method of doing that would be to look at the place we all came in at each race, assign a point value for each place and sum the points. You could do the same method by taking only the fastest lap from each driver every race. Importantly, neither of these methods take the actual go karts into account. We were able to drive in different karts for each race and had the feeling that some karts were better than others. Some karts skidded more easily, and some felt like that had better engines.</p>

<p>Here is how I went about answering the question of who was the best driver?</p>

<h3 id="data-preparation">Data Preparation</h3>

<p>After each race we were handed the printout of our lap splits.</p>

<p><img src="/uploads/img_3186.JPG" alt="My nickname here was an effort to rile up the bachelor this party was for." /></p>

<p>We had 7 drivers in our group over 4 races, so I had 28 sheets of lap splits to ingest. I started reading the lap times in with OpenCV, but then pivoted to Microsoft tools to speed-up this on-off process.</p>

<p>Once I had all the pictures converted to csv’s I realized I was missing the kart numbers. Thankfully the people at the <a href="https://www.hamiltonindoorgokarts.com/">Go Kart facility</a> were able to send me all our kart numbers, and a few of the lap time pictures I was missing.</p>

<h3 id="race-result-scoring">Race Result Scoring</h3>

<h4 id="driver-scores">Driver Scores</h4>

<p><img src="/uploads/points-summary.png" alt="" /></p>

<p>Taking the placings from the driver that went the furthest in the fastest time and then applying Mario Kart or F1 scoring gets similar results. The one difference is the first and second drivers are reversed when using F1 Sprint points.</p>

<p>Assigning points for fastest lap in each race sorts the drivers differently.</p>

<p><img src="/uploads/fastest-lap-points.png" alt="" /></p>

<p>If we wanted to know who the fastest driver was, we may use this ranking, but it misses the racing portion that the finish position can give us.</p>

<p>All of this is assuming our karts were all equal. The feeling we had was that each kart was a little different, but was there a difference in the times for each kart?</p>

<h4 id="kart-scoring">Kart Scoring</h4>

<p>The karts were randomly assigned by the staff before each race, and most of us got to use a unique kart for each race, with most karts doing 3-4 races.</p>

<p><img src="/uploads/kart-numbers.png" alt="" /></p>

<p>To determine which karts were fastest I’m just going to look at the fastest lap each kart had.</p>

<p><img src="/uploads/kart-fastest-lap.png" alt="" /></p>

<p>The difference between the fastest kart and slowest was 1.6 seconds, which is 12.5% slower.</p>

<p>Though this seems like a good case we can’t jump to any conclusions yet. The 4 fastest karts were the 4 karts that the top placing driver had. Is that due to the karts being fast, or the driver, or both?</p>

<h3 id="multivariate-regression">Multivariate Regression</h3>

<p>One way to examine who the best driver was is to do a regression on the kart numbers, drivers, times, and whatever other relevant information we can get our hands on and examine the feature importance. A model that accurately fits our data can help to tease apart the relationships.</p>

<p>The dataset I used for the regression analysis included:</p>

<ul>
  <li><strong>Race Number</strong>
    <ul>
      <li>None of us are seasoned go-karters so we improved with each race, especially after the first.</li>
    </ul>
  </li>
  <li><strong>Start Position</strong>
    <ul>
      <li>
        <p>On the small track we raced it was very difficult to pass so starting first was expected to have a large advantage. This can be seen in each race, like here in race 3 were the leaders that got out cleanly went unchallenged, but the highly defensive driver Pat slowed down drivers in behind him, even as they came up to lap him.</p>

        <p><img src="/uploads/gap-to-current-first-place-in-race3.png" alt="" /></p>
      </li>
    </ul>
  </li>
  <li><strong>Drivers</strong>
    <ul>
      <li>Our names one-hot-encoded for interpretation by the algorithm.</li>
    </ul>
  </li>
  <li><strong>Kart Numbers</strong>
    <ul>
      <li>The kart numbers one-hot-encoded to remove any confusion in the algorithm relating to independent kart numbers.</li>
    </ul>
  </li>
  <li><strong>Filtered Mean Lap</strong>
    <ul>
      <li>This is the dependent variable that I’m trying to predict from the data. It is the mean lap time after removing the crash laps that were slower than 40s. It was chosen so as to limit the effect of these outlier laps when we had to stop for a kart to be turned around.</li>
    </ul>
  </li>
</ul>

<p>Running a ridge regression against all the data led to an accuracy score of 85%. The high accuracy is partially because the model is overfit and would likely do a poor job predicting future scores. Things like Ryan’s laps not all being sensed in our first race, or the randomness of our first-time racing would be corrected over more races. Given the small amount of data available however, the coefficients seen below can still give us some insight.</p>

<p><img src="/uploads/regression-coefficients-for-filtered-mean-lap.png" alt="" /></p>

<p>The features that had the most influence on the fastest lap time have the highest magnitude coefficient. A positive coefficient means that feature influences a slower lap time, while a negative coefficient means that feature causes a faster lap time.</p>

<p>Based on this dataset the best Go-Karter was Pat, and if he was put in Kart #8, in a later race of the day we’d expect the combination to dominate.</p>

<h3 id="conclusion">Conclusion</h3>

<p>This analysis shows that karts have a significant impact on the fastest lap time. It also suggests that, despite scoring well, I am not the fastest driver. The model thinks that Pat was able to get more out of slower karts. Personally, I disagree with this conclusion, and will welcome the challenge to get more data in further racing.</p>

<p>The code I wrote for this analysis <a href="https://github.com/thereiswaldo/GoKarts">is posted on my github</a>.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[My friends and I recently went Go-Karting, and after four 16 minute races we wanted to know who among us was the overall best driver. One method of doing that would be to look at the place we all came in at each race, assign a point value for each place and sum the points. You could do the same method by taking only the fastest lap from each driver every race. Importantly, neither of these methods take the actual go karts into account. We were able to drive in different karts for each race and had the feeling that some karts were better than others. Some karts skidded more easily, and some felt like that had better engines.]]></summary></entry><entry><title type="html">Pacing Strategy for Running Around the Bay</title><link href="https://jeremywalsh.ca/2022/03/12/pacing-strategy-for-running-around-the-bay.html" rel="alternate" type="text/html" title="Pacing Strategy for Running Around the Bay" /><published>2022-03-12T00:00:00-05:00</published><updated>2022-03-12T00:00:00-05:00</updated><id>https://jeremywalsh.ca/2022/03/12/pacing-strategy-for-running-around-the-bay</id><content type="html" xml:base="https://jeremywalsh.ca/2022/03/12/pacing-strategy-for-running-around-the-bay.html"><![CDATA[<p>A friend was wondering how to pace themselves at the Around the Bay 30km road race on March 27th. The race has a unique elevation profile, with several significant hills in the last 10km.</p>

<p><img src="/uploads/around-the-bay-course-profile.png" alt="" /></p>

<p>For a runner hoping to meet a specific time it begs the question of whether they should start the race faster than their goal race pace or not. A faster starting pace in anticipation for slowing down on the hills near the end of the race makes sense, but begs the question of how much faster? Conventional running wisdom is to run “even splits”, which is running the same pace for the entirety of a race. As I’m also going to be running the race this year, I want to know what the optimal pacing strategy for the Around the Bay course is. I’ll do that by comparing how well popular heuristics and models align with historical results at Around the Bay. First let’s review the methods that will be compared.</p>

<h3 id="pacing-heuristics">Pacing Heuristics</h3>

<h4 id="even-paceeven-effort">Even Pace/Even Effort</h4>

<p>The simplest of the pacing strategies are even pace, and even effort. Even pacing simple means to run the same pace throughout the race. It’s a good strategy for a flat course, but can be near impossible to replicate on a hilly course. The alternative is even effort, where an exertion level is kept constant, as pace is slowed on uphills and increased on downhills. This is a good simple heuristic, but can be difficult to follow in a race. Unless one is diligently heart rate or power training, it’s difficult to start and maintain an effort throughout a race. Thankfully, there are some guidelines for running at even effort.</p>

<p>First there are the rules developed from two of the most infamous studies on the subject.</p>

<h4 id="jack-daniels">Jack Daniels</h4>

<blockquote>
  <p>+12-15s/mile/% gradient incline</p>

  <p>-8s/mile/% gradient decline</p>
</blockquote>

<p>The legendary coach Jack Daniels is heavily quoted for a <a href="https://www.letsrun.com/forum/flat_read.php?thread=197366">comment he (jtupper) made online</a> referencing a study performed on the subject. The post suggested that 12 to 15 seconds should be added per mile per increase in gradient percent. The study also suggested that a downhill subtracts 8 seconds per mile per decline in gradient percent.</p>

<h4 id="john-kellogg">John Kellogg</h4>

<blockquote>
  <p>+9.2s/mile/% gradient incline</p>
</blockquote>

<p>John Kellogg made similar online posts, which have been <a href="https://docs.google.com/file/d/0B_zzkn1-wR0dRFNLT0tXTVlUN3FyZGpiVWRBNld0dw/edit?resourcekey=0-4GUJ056H30C6KtvbjGxmCA">summarized in this document</a>. His number was a more conservative 9.2 seconds should be added per mile per increase in gradient percent, which he further generalized to 1.74s for each 10ft elevation gain regardless of distance covered. John never specified the effect of the downhills so we won’t use his numbers in our analysis.</p>

<p>Jack and John both expressed that there formulas are merely guidelines, that seemed accurate on average for the people they studied. Since the studies were on runners of relatively similar ability to myself they should have some good information for me. Their guidelines don’t scale with pace, so they won’t be as accurate for runners much slower than 4min/km.  For a more general tool we need a model.</p>

<h3 id="models">Models</h3>

<h4 id="grade-adjusted-pace">Grade Adjusted Pace</h4>

<p>Strava has a model they apply for all subscribers on their platform called Grade Adjusted Pace. I’ll discuss it further in an upcoming post about cross country courses. <a href="https://medium.com/strava-engineering/an-improved-gap-model-8b07ae8886c3">This blog post</a> details the model and compares it with a study from a <a href="https://pubmed.ncbi.nlm.nih.gov/12183501/">2002 paper</a> on the matter. For our comparison we will use both the Strava and Minetti curves. As shown in the diagram, they use Equal Energy Cost and Equal Heartrate as a proxies for even effort.</p>

<p><img src="https://miro.medium.com/max/1400/1*_TwofsNS872wbUS12ykKPQ.png" alt="" /></p>

<h4 id="normalized-graded-pace">Normalized Graded Pace</h4>

<p>TrainingPeaks has a similar metric called <a href="https://www.trainingpeaks.com/learn/articles/what-is-normalized-graded-pace/#:\~:text=Normalized%20Graded%20Pace%20(NGP)%20is%20the%20adjusted%20pace%20reported%20from,of%20running%20on%20varied%20terrain.">Normalized Graded Pace</a>, but there is less information on how it is calculated so I won’t be investigating it here.</p>

<h2 id="data-mining">Data Mining</h2>

<h4 id="results">Results</h4>

<p>Most of the data for this analysis is scraped from results posted on <a href="https://www.sportstats.ca/" title="https://www.sportstats.ca/">https://www.sportstats.ca/</a>. For the races from 2016-2019 the 10km, 15km, 20km, and finish splits for each athlete are available. Splits from earlier years are accessible, but more difficult to obtain, and the 4 years captured have shown to be sufficient for our purposes.</p>

<h4 id="gps">GPS</h4>

<p>To get the course elevation profile I’ve used the gps recordings from my own previous runs on the course. The gps files are parsed into a pandas dataframe for easy calculations and plotting.</p>

<h2 id="data-exploration">Data Exploration</h2>

<p>The results for the four years from 2016-19 were all combined for analysis. The split time distributions show the predictable widening of the distribution as the race spreads out over time, and a small deviation from normal as athletes who went out too fast cause a long tail to the final distributions.</p>

<p><img src="/uploads/2019-around-the-bay-split-times.png" alt="" /></p>

<p>The raw data only has the running time for each athlete at each distance (10km, 15km, 20km, 30km), but we want to look at the pace they ran for each of the segments (0-10km, 10-15km, 15-20km, 20-30km). Calculating these gives us these averages.</p>

<p><img src="/uploads/2016-19-average-pace-per-segment.png" alt="" /></p>

<p>As we expected, the final 10km of the race are slowest (largest pace). On average the first 10km is 40s/km faster than the last. A slow finish is fairly common for most road races as  most runners are optimistic with their pacing strategy. Since I am going to plan for success, I’ll have to remove some of these outliers.</p>

<h2 id="data-filtering">Data Filtering</h2>

<p>Keeping my filtering as simple as possible I’ve selected two comparison criteria. The first is to only look at runners that have previously run the race in my dataset. The thought is that these multiple finishers will know the course well and how they should pace themselves. Dropping each runner’s first performance in the dataset leaves in a wide distribution of times and keeps 4622 performances (26% of the 17440 in the dataset).</p>

<p><img src="/uploads/multiple-finishers-30km-time-distribution.png" alt="" /></p>

<p>The other more relevant filter is to keep only the “elite” athletes who ran faster than two hours. Though the definition of elite is up for debate, this is the top 1.6% of all finishers, and leaves just 279 performances. This is large enough for me to feel confident in the aggregated results while maximizing the amount of runners that ran close to an optimal pacing strategy. My assumption is that on average elite runners maintain an even effort.</p>

<p><img src="/uploads/elites-30km-time-distribution.png" alt="" /></p>

<p>Being the leading end of the distribution, most of the times are near the two hour mark.</p>

<h2 id="model-evaluation">Model Evaluation</h2>

<h4 id="race-result-analysis">Race Result Analysis</h4>

<p>With these filters in place we can take each runners pace for each segment and divide that by the average pace for the entire race to get a percentage.</p>

<p><img src="/uploads/around-the-bay-pacing-form-2016-19.png" alt="" /></p>

<p>The different filters are plotted along the course profile and compared with the even pace heuristic (constant 100% pace). Each group ran faster (pace &lt;100%) for the first 20km of the race, and slower for the last 10km. This feels right since the first two-thirds of the race are downhill or flat, and the last third is very hilly. All the runners combined highlights the suboptimal pacing of the average Around the Bay runner as they tend to go out significantly faster than they finish. On average the runners don’t save enough energy for the hills at the end of the race. To back this up, if we look at the correlation between the paces, we can see that best determinant for a runner’s pace in the last 10km of the race is what they ran in the first 10km.</p>

<p><img src="/uploads/all-pacing-correlation.png" alt="" /></p>

<p>A correlation of -0.89 is very strong, and means that on average the athletes that start too fast finish the slowest.</p>

<p>The multi-finishers tend to run closer to even splits, while the elite runners only have 3% variation in pace. However, the elite filter is not perfect, as there is are several outliers that seem to start out too fast (10% faster than they average!). This shifts our averages slightly to faster starts and slower finishes, but not significantly so we’ll leave them in.</p>

<p><img src="/uploads/elite-pace-percentage-boxplot.png" alt="" /></p>

<p><img src="/uploads/elites-pacing-correlation.png" alt="" /></p>

<p>All of this is informative, but hasn’t completely answered the question about what pacing strategy is optimal. I want to trust the average elite pacing, but it could be that the outliers are racing optimally. In an attempt to validate their performance we’ll include the other pacing models.</p>

<h4 id="grade-adjusted-pace-model">Grade Adjusted Pace Model</h4>

<p>To include the Grade Adjusted Pace models I calculated the gradient throughout the course and applied each model. I then averaged the percentage pace over each segment.</p>

<p>To use the Jack Daniels equation I converted the time change in each segment to a percentage by comparing against my goal pace (3:36min/km). With slower goal paces the predicted effect by this formula lessens as it suggests adding a constant time to all performances, so seen as a percent it would regress toward even pacing.</p>

<p><img src="/uploads/around-the-bay-pacing-comparison.png" alt="" /></p>

<p>Interestingly, the closest model to the elite performance was actually the Jack Daniels formula. I think this is because the Jack Daniels study was done on elite athletes, while the Strava and Minetti studies were more for the average runner. With this comparison it does validate to me that the elite athletes have followed a good pacing strategy, as they are within 1% of each pacing model.</p>

<h2 id="prescription">Prescription</h2>

<p>The simplified takeaway from this is that runners at Around the Bay should average the first half of the race at around 1% faster than their goal pace. This sets them up to run an even effort for the duration of the race, providing some banked time to lose in the hills. For me that means starting two to three seconds per kilometer faster than my goal pace. Speaking from experience, the last 10 kilometers is no longer about holding yourself back with proper pacing. Instead you are pushing yourself to the limit as you fight towards the finish line. Slowing down on this section is to be expected, and maintaining a pace 2-3% slower than goal pace in this section is something to be extremely happy with.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[A friend was wondering how to pace themselves at the Around the Bay 30km road race on March 27th. The race has a unique elevation profile, with several significant hills in the last 10km.]]></summary></entry><entry><title type="html">Canadian Olympic Athletics</title><link href="https://jeremywalsh.ca/2021/08/13/canadian-olympic-athletics.html" rel="alternate" type="text/html" title="Canadian Olympic Athletics" /><published>2021-08-13T00:00:00-04:00</published><updated>2021-08-13T00:00:00-04:00</updated><id>https://jeremywalsh.ca/2021/08/13/canadian-olympic-athletics</id><content type="html" xml:base="https://jeremywalsh.ca/2021/08/13/canadian-olympic-athletics.html"><![CDATA[<p>Canada won 24 medals at the Tokyo 2020 Olympic games, with 6 coming from Athletics. The 6 medals equals the total medals from the Athletics team in 2016, and considering the two golds vs 2016’s single gold medal, is the best performance from Canadian Athletics since the 1932 Los Angeles games. The 6 medals put Canadian Athletics in <a href="https://www.worldathletics.org/competitions/olympic-games/the-xxxii-olympic-games-athletics-7132391/medaltable">8th place</a>, while the team’s 15 top-8 placings put <a href="https://www.worldathletics.org/competitions/olympic-games/the-xxxii-olympic-games-athletics-7132391/placingtable">Canada in 5th place,</a> the best in modern games history. With such an impressive team performance, I wanted to highlight the medal winners</p>

<h3 id="4x100m-bronze">4x100m Bronze</h3>

<p>The team of Aaron Brown, Jerome Blake, Brendon Rodney, and Andre de Grasse brought home the final olympic medal of the athletics campaign with a bronze. The time of 37.70 is the <a href="https://athletics.ca/rankings-records/rankings/?y=0&amp;season=Outdoor&amp;area=National&amp;age_group_id=&amp;category=Relays&amp;event_id=136&amp;track_wind=No&amp;best_by_athlete=Yes">3rd best all time</a> from a Canadian team, only behind the bronze medal performance in the Rio 2016 games and the gold from the Atlanta 1996 games.</p>

<p>The silver medal winning Great Britain’s time is the fastest silver medal in Olympic history pointing to not just to the progression of the sprint times, but the depth of competition in the event.</p>

<p><img src="/uploads/olympic-4x100m-medal-winning-times.png" alt="Olympic 4X100m Medal Winning Times Scatter Plot" title="Olympic 4X100m Medal Winning Times" /></p>

<p>With the potential of <a href="https://www.cbc.ca/sports/olympics/summer/trackandfield/chinjindu-ujah-tests-positive-4-100m-tokyo-olympics-1.6138959">Great Britain’s silver medal winning team being disqualified</a>, there is potential for Canada getting upgraded to silver.</p>

<h3 id="andre-de-grasse-trifecta">Andre De Grasse Trifecta</h3>

<p>Andre De Grasse cemented himself as a superhuman sprinter and one of the greatest olympians in Canadian history. His three medals in Tokyo 2020 to match his three from Rio 2016 put him in some illustrious company.<img src="/uploads/top-canadian-olympic-medalists.png" alt="Top Canadian Olympic Medalists By Medal" title="Top Canadian Olympic Medalists" /></p>

<p>With Penny Oleksiak and Andre De Grasse likely competing again in three years at the 2024, there is the potential for the top end of this graph to change again soon.</p>

<h5 id="100m">100m</h5>

<p>Andre’s 100m bronze medal keeps Canada as the country with the <a href="https://en.wikipedia.org/wiki/100_metres_at_the_Olympics#Medals_by_country">4th most medals</a> in the event. It is Canada’s 7th medal in the event, the most successful athletics event for Canada all time.</p>

<p><img src="/uploads/canadianmedalstreemap.png" alt="Number of Canadian Medals in Each Event broken out by category and coloured for gender" title="Number of Canadian Medals in Each Event" /></p>

<h5 id="200m">200m</h5>

<p>De Grasse’s best event is the 200m where he won gold and improved upon his own national record. With his incredible top speed, the average pace of his 200m was faster than his 100m by 0.3 km/hr.</p>

<p><img src="/uploads/speed-difference-in-de-grasse-s-races.png" alt="" /></p>

<h3 id="mo-speed">Mo Speed</h3>

<p>Mohammed Ahmed has been the standout male distance runner in Canada for several years now with Canadian records from 3000m to 10000m. At this Olympics he achieved his first Olympic medal with a bronze medal in the 5000m. Coupled that with a gutsy 6th place in the 10000m, Mohammed Ahmed is undeniably one of the greatest Canadian runners of all-time.</p>

<p>Ahmed has steadily improved upon his 5000m time leading up to this Olympics, with only 3 years since 2009 that haven’t been improvements on the year before.</p>

<p><img src="/uploads/mohammed-ahemd-s-5000m-progression.png" alt="" /></p>

<h3 id="evan-dunfee-walks-away-with-bronze">Evan Dunfee Walks away with Bronze</h3>

<p>After a <a href="https://globalnews.ca/news/2892133/rio-2016-canadas-evan-dunfee-finishes-fourth-in-50-kilometre-race-walk-after-a-collision-near-finish-line/">contentious fourth place at the 2016 Rio Olympic games</a>, Evan Dunfee claimed the elusive bronze medal with a momentous final kick. Tokyo was the final 50km race at the Olympics as it ends it’s walk through Olympic history that started in 1932.</p>

<p>Dunfee is the Canadian record holder in the 5000m (18:39.08), 10000m (38:39.72), and 50km (3:41:38) race walks. His marathon split during his 50km record is approximately 3:09:30, which is a <a href="https://www.baa.org/races/boston-marathon/qualify">Boston Qualifier for men above the age of 40</a>.</p>

<p>When Dunfee broke the Canadian record in 2015, it had been standing since 1981. Dunfee has since bested his own record, and is clearly a standout in the distance when compared to the athletes that fill out the top 10.</p>

<p><img src="/uploads/top-10-canadian-50km-race-walkers.png" alt="" title="Top 10 Canadian 50km Race Walkers" /></p>

<h3 id="damien-warner---instagram-worthy">Damien Warner - Instagram Worthy</h3>

<p>Damien Warner was crowned the unofficial greatest athlete with his win of the Decathlon. Each individual performance is outstanding in it’s own right, from a 100m that would have placed him in the 100m final, to a high jump that would have been good enough for bronze. More analysis is likely warranted on the results themselves, but for now I wanted to look at his Instagram following during the games.</p>

<p>Instagram is good measure for an athletes marketability which has a large impact on the sponsorship opportunities for athletes. Damien saw one of the largest upticks of any Canadian at the games, with 12930 followers before his events started on August 4th, to 23172 followers the day after he won gold. Interestingly, he only received a couple thousand more followers after being named the flagbearer on August 7th.</p>

<p><img src="/uploads/damien-warner-s-instagram-following-during-the-olympics.png" alt="" /></p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[Canada won 24 medals at the Tokyo 2020 Olympic games, with 6 coming from Athletics. The 6 medals equals the total medals from the Athletics team in 2016, and considering the two golds vs 2016’s single gold medal, is the best performance from Canadian Athletics since the 1932 Los Angeles games. The 6 medals put Canadian Athletics in 8th place, while the team’s 15 top-8 placings put Canada in 5th place, the best in modern games history. With such an impressive team performance, I wanted to highlight the medal winners]]></summary></entry><entry><title type="html">Predicting Canadian University Cross Country Performances - Introduction</title><link href="https://jeremywalsh.ca/2021/06/05/predicting-canadian-university-cross-country-performances.html" rel="alternate" type="text/html" title="Predicting Canadian University Cross Country Performances - Introduction" /><published>2021-06-05T00:00:00-04:00</published><updated>2021-06-05T00:00:00-04:00</updated><id>https://jeremywalsh.ca/2021/06/05/predicting-canadian-university-cross-country-performances</id><content type="html" xml:base="https://jeremywalsh.ca/2021/06/05/predicting-canadian-university-cross-country-performances.html"><![CDATA[<p>A project of mine that keeps coming back to me is predicting USport/CIS Cross Country results. During my time at McMaster University I competed on the cross country team and like many of my competitors, enjoyed guessing how the championship races would play out. Most of these predictions are made at the team level, guessing which university will win the team title, and how the other schools will finish. I was involved in many different predictions, but the one I spent the most time on was a <a href="https://web.archive.org/web/20131105103135/http://www.trackie.com/track-and-field/Forum/cis-and-conference-individual-team-predictor/9714/1/" title="CIS and Conference Individual Team Predictor">full individual prediction of every athlete</a>. The thought process was that in order to better predict each team’s performance, I would look at how the individuals should perform against each other, build out a full simulated race, and then aggregate the individual results to the team level. This worked pretty well, but it relied on some guesswork and some broad assumptions.</p>

<p>Seven years down the road, my data science skills have improved and I wanted to revisit this problem. Predicting cross country results are difficult for many reasons, but before we talk about the roadblocks in our way I’ll give a brief overview on the relevant rules of the sport.</p>

<h2 id="what-you-need-to-know-about-canadian-cross-country">What You Need To Know About Canadian Cross Country</h2>

<p>Each September to November (baring global pandemics), Canadian University Cross Country teams participate in meets culminating in a regional and then Canadian championships (USports/CIS). There strictly isn’t anything preventing schools from competing in the championship meets, but most schools only send teams that meet some performance criteria. The placing for each team is determined by the score of the top five individuals. Seven runners are allowed to start in the championship meets, and the sixth and seventh runners are considered displacers. The athlete’s score is based off of the overall place in the race, and the lowest total team score wins (further rules for scoring can be found <a href="https://usports.ca/uploads/hq/Playing_Regs/2020-21/200721_Playing_Regulations_Cross_Country_%28W%26M%29_ENG.pdf">here</a>). The best possible score would be if one team takes the first five positions, resulting a score of 15 (1+2+3+4+5).</p>

<p><img src="/uploads/cis2012wxc.gif" alt="" title="CIS Women's XC Championship Start 2012" /></p>

<h2 id="why-is-this-worth-doing">Why is this worth doing?</h2>

<p>This is usually a difficult question to ask of any data science project, but like everything I will post on this site, I find this problem fun and interesting.  More broadly though I do think this analysis has value in the running community.</p>

<p>The main reason that is currently motivating me to revisit this project is enhancing the spectator experience. I have family members that would come to my races in university, and they would tell me that having the “insider” knowledge I could provide them about who was expected to win, how our team should perform, and some backstory on the favourites greatly enhanced their viewing experience. For a sport that is not considered very spectator friendly, we should be doing everything we can to educate our fans in an effort to improve their engagement.</p>

<p>Another benefit of doing these predictions are the aid they could provide to coaches and athletes. A prediction for a race could be used to give athletes a goal pace to train and race at, as well as show them competitors they should plan to keep up with. Coaches could project how individuals would perform when considering who to enter in the meets, and could analyze the development of their athletes over the years.</p>

<p>A potential side-benefit of this work is the results database that needs to be constructed. There currently isn’t a good place to find Canadian cross country race results. There are websites that host PDF results, but there isn’t an easy way to search those results for an athlete or team. Having one large database would be helpful for statistics and further analysis.</p>

<h2 id="difficulty-in-predicting-results">Difficulty in Predicting Results</h2>

<p>For many reasons, predicting performances in cross country is very difficult. This is, in my opinion, one of the things that makes guessing how teams or even athletes individually will perform on any given day. The main difficulties I’ve discovered are detailed here.</p>

<h5 id="1-courses-are-all-different"><strong>1. Courses are all different</strong></h5>

<p>Each different course has its own elevation profile, number of turns, and terrain. One weekend a team might be climbing big hills on muddy trails with many hairpin turns, and the next race could be a flat hard-packed grass loop. These factors have a huge impact on not just the average time, but some runners perform comparatively better on hills or in mud. To complicate this further, courses can change from year-to-year.</p>

<h5 id="2-race-distances-vary"><strong>2. Race distances vary</strong></h5>

<p>Over the course of the season, races could be from 5km to 10km in length, and are not expected to be precise. The <a href="https://usports.ca/uploads/hq/Playing_Regs/2020-21/200721_Playing_Regulations_Cross_Country_%28W%26M%29_ENG.pdf" title="USports XC Regulations">current regulations</a> for the national meet in USports require the length of the course to be within 25m of the 8000m nominal length. The early races in the season have no such requirement, so even the listed distances could be incorrect.</p>

<p>There are many online calculators for scaling running times from one distance to another (examples <a href="">1</a>, <a href="https://runsmartproject.com/calculator/">2</a>, <a href="https://lukehumphreyrunning.com/hmmcalculator/race_equivalency_calculator.php">3</a>). They work well in the aggregate, but at the individual level, some athletes are better tuned to the shorter distances, while others are better long distance runners.</p>

<h5 id="3-weather"><strong>3. Weather</strong></h5>

<p>As the races are done in the in the great outdoors of the Canadian fall, weather of all kinds impacts the runners. Extreme heat, cold, wind, rain, snow and sleet all have a unique effect on the course and the physiology of the runners.</p>

<h5 id="4-competitive-levelnumber-of-runners"><strong>4. Competitive level/number of runners</strong></h5>

<p>Some meets have very few teams participating while others can have more than 20. Having more runners in the field increases the competitiveness and provides more running  companions. Home meets and championships are often placed as a high priority, with coaches adjusting preparation to have optimal performance, while smaller meets might be used more for a workout.</p>

<h5 id="5-time-in-season"><strong>5. Time in season</strong></h5>

<p>Though the season only lasts about 10 weeks, training plans have teams slowly improving throughout the season as they aim to perform their very best at the national championship in November. In other words, we’d expect the same athlete, on the same course, in the same weather, with the same competitors, to perform better on average in November than in September.</p>

<h5 id="6-events-in-a-season"><strong>6. Events in a season</strong></h5>

<p>Most teams compete about every two weeks. Typically they run three or four in-season races, then the regional meet, and finally the Canadian championship. Including the regional meet that is at best five data points we have for each runner.</p>

<h5 id="7-regions-are-separated-by-vast-distances"><strong>7. Regions are separated by vast distances</strong></h5>

<p>Canada is a big country and budgetary constraints mean most teams stay within their region. Eastern and western teams don’t meet until nationals. With little head-to-head competition, it is difficult to compare performances.</p>

<h5 id="8-injuries-and-sickness"><strong>8. Injuries and sickness</strong></h5>

<p>The combination of cool weather, start of a school year with interactions of hundreds of other students, the stress of school work, and heavy training load leave athletes ripe for injury and illness. Some athletes miss races, and others underperform.</p>

<h5 id="9-bad-days"><strong>9. Bad days</strong></h5>

<p>Similar, though not necessarily the same as injuries and sickness, are when athletes generally underperform. For tangible or intangible reasons, sometimes things just don’t go according to plan.</p>

<h5 id="10-athlete-individuality"><strong>10. Athlete Individuality</strong></h5>

<p>As I’ve alluded to, each athlete is unique. Though we may have records of athlete’s height and weight, we can’t directly measure their preference for long hilly races, or afternoon races instead of morning ones.</p>

<h5 id="11-athlete-age"><strong>11. Athlete Age</strong></h5>

<p>University is often a period of dramatic athletic development. An athlete in their fifth year generally runs faster than they did in their first year. Most athletes are of the same age range, though PhD students can be significantly older.</p>

<h5 id="12-data"><strong>12. Data</strong></h5>

<p>The data itself is another monumental hurdle in this process. Most meets have a record of the performances stored in a pdf, and most of those have been aggregated on trackie.ca. Unless they are from the same timing company, they come in varying formats, with varying degrees of information. Some may give splits, the weather, and age of the runners, but most just list the place, time, runner name and their team. Compiling every race in the history of Canadian university cross country may be impossible at this point, but even putting the last 10 years into a database is no easy feat.</p>

<p>These issues are daunting, and at the very least, bad days will always show up as an error source in our model. That said, for years anecdotal predictions have been made, and are still relatively accurate.</p>

<h2 id="prediction-evaluation">Prediction Evaluation</h2>

<p>When we reach the end of this blog series I want to be able to prove that I’ve built a good model. One of the evaluation metrics I would like to hit is outperforming the historical pundit prediction rate. I’ll do a longer breakdown in a future post, but a quick scan of previous predictions from coach polls and community rankings show that the top 10 team placings usually have a correlation coefficient of around 0.85. We can see this as the case for the 2019 USports Women’s Coaches Poll below. Most of the predictions were close, but the coaches greatly overestimated the McGill team.</p>

<p><img src="/uploads/2019-usports-women-s-cross-country-coaches-poll-accuracy.png" alt="" title="2019 USports Women's Cross Country Coaches Poll Accuracy" /></p>

<p>It would stand to reason that the knowledge pollsters use could all be compiled into a database, and then a data-driven approach could produce a more accurate result. I would consider this project a success if the final model produces top 10 team placing predictions at a equivalent or higher accuracy than the polls.</p>

<h2 id="first-step---course-analysis">First Step - Course Analysis</h2>

<p>My current plan is to essentially walk through each problem identified above and build out a solution for each. With each problem addressed I want to combine the solutions into one model, evaluate its accuracy, and iterate where needed. I hope to post every few weeks with a good-enough solution to each problem, starting with the comparison of each cross country race course.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[A project of mine that keeps coming back to me is predicting USport/CIS Cross Country results. During my time at McMaster University I competed on the cross country team and like many of my competitors, enjoyed guessing how the championship races would play out. Most of these predictions are made at the team level, guessing which university will win the team title, and how the other schools will finish. I was involved in many different predictions, but the one I spent the most time on was a full individual prediction of every athlete. The thought process was that in order to better predict each team’s performance, I would look at how the individuals should perform against each other, build out a full simulated race, and then aggregate the individual results to the team level. This worked pretty well, but it relied on some guesswork and some broad assumptions.]]></summary></entry><entry><title type="html">Regression Analysis for Items and Gold in League of Legends</title><link href="https://jeremywalsh.ca/2021/05/08/regression-analysis-for-items-and-gold-in-league-of-legends.html" rel="alternate" type="text/html" title="Regression Analysis for Items and Gold in League of Legends" /><published>2021-05-08T20:00:00-04:00</published><updated>2021-05-08T20:00:00-04:00</updated><id>https://jeremywalsh.ca/2021/05/08/regression-analysis-for-items-and-gold-in-league-of-legends</id><content type="html" xml:base="https://jeremywalsh.ca/2021/05/08/regression-analysis-for-items-and-gold-in-league-of-legends.html"><![CDATA[<p>Continuing my analysis on League of Legends (<a href="http://jeremywalsh.ca/2021/04/15/league-of-legends-analyzing-champion-basic-stats.html">first post here</a>), I want to talk about items in the game. As a new player one of the things that I feel isn’t well communicated is the importance of getting gold to build items. Items have many effects in the game, but here I’m going to focus on how they alter the base stats of the champions.</p>

<p>If you want to follow along the code is posted (<a href="https://github.com/thereiswaldo/LoL-Champ-Analysis/blob/main/LoL%20Items.ipynb">here</a>).</p>

<h2 id="how-much-do-items-improve-base-stats">How much do items improve base stats?</h2>

<h3 id="tank">Tank</h3>

<p>As discussed in my <a href="http://jeremywalsh.ca/2021/04/15/league-of-legends-analyzing-champion-basic-stats.html">previous post,</a> there are various classes each champion is typed into. The simplest that we will look at first is the Tank. They are generally designed to be your front liners that jump into battle first and absorb as much of the enemies damage as possible to save their squishy teammates. A few minutes into the game a tank might go back to the shop to buy some items. The cheapest item that provides health is called a Ruby Crystal. For 400 gold it boosts the owner’s health (HP) by 150. If we take all the champions in the game that have Tank as their primary or secondary class and average their HP at level 3 plus this cheap item we get this breakdown:</p>

<p><img src="/uploads/tank-build-level-3-hp.png" alt="LoL Tank Level 3 HP Pie Chart" title="Tank Champion First Item HP Breakdown" /></p>

<p>17% of their HP just from a relatively cheap item. Certainly a noticeable increase, and one that could alone have significant impact on the next few minutes of game play.</p>

<p>As the game progresses, gold is accumulated and so are levels. At the end of a long game we can look at what the average level capped (18) tank’s HP is alongside a full item build. League of Legends allows you to equip 6 items, one of them is typically a pair of boots to increase move speed, so we will consider 5 typically purchased items and look at the breakdown again.</p>

<p><img src="/uploads/tank-build-level-18-hp.png" alt="LoL Tank Level 18 HP Pie Chart" title="Tank Champion Level 18 HP Breakdown" /></p>

<p>We can see nearly half of the average level 18 tank’s HP is now coming directly from items. The Knight’s Vow with its 400 HP increase being the most significant in this build.</p>

<h3 id="marksman">Marksman</h3>

<p>If we look at the attack damage that the Marksman class does we can see a similar trend of importance of items.</p>

<p><img src="/uploads/marksman-build-level-3.png" alt="" /></p>

<p><img src="/uploads/marksman-build-level-18.png" alt="LoL Marksman Level 18 Attack Damage Pie Chart" title="Marksman Champion Level 18 Attack Damage Breakdown" /></p>

<p>68% of the attack damage the average max-level Marksman does can be from items. Ignoring the other effects these weapons do, the stat boost from items is massive. A champion with more items is more powerful.</p>

<h3 id="how-to-acquire-items">How to Acquire Items</h3>

<p>Item’s are clearly an important part of a champion’s stats, but the real question is how much they impact the game. To try to understand that we first need to understand how the items are purchased in game.</p>

<p>As a player you gain gold for killing enemy minions, champions, buildings, or neutral monsters. There is also a mechanism for passive gold generation and assisting in kills. This gold has the singular purpose of being used to buy items. This leads to a snowball effect in the game. The more minions killed in the early game, the more early game gold that can be spent on early stat-boosting items to help get more minion kills and eventually more champion and objective kills.</p>

<h1 id="regression-analysis">Regression Analysis</h1>

<p>Since I want to help my friends and I improve at the game I’m going to look at the data from each of our games, and see how predictive early game gold is in determining the features we care about. Namely champion kills and deaths.</p>

<h2 id="data-mining-and-cleaning">Data Mining and Cleaning</h2>

<p>I queried the Riot API for the 32 games I’ve played, and pulled out the relevant information the game captures. I took the amount of damage dealt and number of kills and divided them by the number of minutes the game took to get a comparable feature. The recorded data includes 4 features that aggregate over 10 minute intervals. We’ll use the 0-10 minute interval as our early game indicator and the start of our gold-to-item snowball. The 4 features are:</p>

<p>CS ~ Stands for “Creep Score”, and is the measure of the number of enemy and neutral minions killed.</p>

<p>Damage ~ Amount of damage dealt to minions, champions, buildings and objectives</p>

<p>Gold ~ Amount of gold gained from all sources, <a href="https://leagueoflegends.fandom.com/wiki/Gold">more info here</a></p>

<p>XP ~ “Experience”, which is gained from <a href="https://leagueoflegends.fandom.com/wiki/Experience_(champion)">numerous things</a> and allows champions to level up at specific thresholds.</p>

<p>To improve our model accuracy and give us more information I’ve also added the role and lane that the API classified for each champion. With these categorical entries I used one-hot encoding to analyze them. Due to the unnatural tactics of new players, the game struggles to classify in some instances and classifies the Lane as None. For the Jungle lane a Role is not identified and instead left as None.</p>

<p>With our dataset now ready I randomly pulled out 20% of the champions across all the games and trained a linear regression on the other 80%. To allow for easy interpretation I take the trained coefficients from the regression and multiply them by the values in their respective column of the dataset. This is a way to show the importance of each feature. Since this is a linear regression model, we can interpret the higher absolute value as having a large impact on determining the goal feature. Imagine we had just two features, we could write the regression as:</p>

<p>y = m<sub>1</sub>x<sub>1</sub> + m<sub>2</sub>x<sub>2</sub>  + c</p>

<p>If m<sub>1</sub>x<sub>1</sub> is larger than m<sub>2</sub>x<sub>2</sub>, it will have a larger impact on y. To scale this up for our our purposes we consider 14 features instead of two, and allow x to represent the array of 256 entries of that feature in the training set. To visualize this we take the resulting dataframe and make a boxplot.</p>

<p><img src="/uploads/kills-per-minute-importance-boxplot.png" alt="" /></p>

<p>The box of each feature extends to the Q1 and Q3 quartile values of the data with a line in between for the median. The “whiskers” further show the range of the data by extending out to the farthest data point within 1.5*(Q3-Q1). The dots seen above and below the whiskers are outliers.</p>

<p>With this simple visualization we can see that the most important feature here in predicting the number of kills is the early game gold. The gold per minute for the 0-10 minute interval has a much higher absolute value than anything else. This backs up our snowball item theory. The gold generated in the early game can be used for large item buffs to kill the opponent.</p>

<p>If we repeat the same regression for the deaths per minute of the game we see a more convoluted result.</p>

<p><img src="/uploads/deaths-per-minute-importance-boxplot.png" alt="" /></p>

<p>Here we see that having more gold and more experience in the first 10 minutes leads to fewer deaths, but doing more damage results in more deaths. This is further backed up by looking at he correlation heatmap for the features across the whole dataset.</p>

<p><img src="/uploads/feature-correlation.png" alt="" /></p>

<p>The high early game damage leading to higher deaths is evident here as well showing that early gold and early damage are not correlated. Skirmishing opponents in the early game hurts more than it helps. Likely do to dying unnecessarily when you could be bringing in more gold.</p>

<p>The role and lane features have some interesting patterns, but since they aren’t as significant I’ll avoid discussing them here.</p>

<h2 id="all-models-are-wrong-">All models are wrong …</h2>

<p>Returning to the reason we trained on only 80% of the data, we can use the other 20% to get a gauge of how well these models generalize. If the trained features don’t have a high accuracy on the test dataset, then they may not be good indicators for us to learn from.</p>

<p>The training accuracy score (R<sup>2</sup>) for the Kills per Minute was 59% while the test accuracy score was 51%. The Deaths per Minute were lower at 47% and 25% respectively.</p>

<p><img src="/uploads/kills-predicted-vs-actual.png" alt="" /></p>

<p><img src="/uploads/deaths-predicted-vs-actual.png" alt="" /></p>

<p>These are poor indicators for goodness of fit, but since we are not using this model to make a highly accurate prediction, it doesn’t matter that much to us. The graphs above show how the model predicts on the test data. The high R<sup>2</sup> shows as the data is dispersed far from the red diagonal line, but both reasonably follow the line.</p>

<h2 id="-but-some-are-useful">… but some are useful</h2>

<p>What we actually care about is identifying patterns in the data. The parameters discussed above (gold/min etc.) are statistically significant in the regression model and the correlation matrix backs up the analysis we did. Though we shouldn’t use this analysis to say that in another one of my games, having X gold per minute in the early game will net Y kills at the end of the game, we can still derive insights from the data.</p>

<h1 id="next-steps">Next Steps</h1>

<p>The next thing I’m interested in looking at it for League of Legends is generating a new champion from the text and images of the current champions. If that is interesting enough I’ll make my next post about it.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[Continuing my analysis on League of Legends (first post here), I want to talk about items in the game. As a new player one of the things that I feel isn’t well communicated is the importance of getting gold to build items. Items have many effects in the game, but here I’m going to focus on how they alter the base stats of the champions.]]></summary></entry><entry><title type="html">League of Legends - Analyzing Champion Base Stats</title><link href="https://jeremywalsh.ca/2021/04/15/league-of-legends-analyzing-champion-basic-stats.html" rel="alternate" type="text/html" title="League of Legends - Analyzing Champion Base Stats" /><published>2021-04-15T00:00:00-04:00</published><updated>2021-04-15T00:00:00-04:00</updated><id>https://jeremywalsh.ca/2021/04/15/league-of-legends-analyzing-champion-basic-stats</id><content type="html" xml:base="https://jeremywalsh.ca/2021/04/15/league-of-legends-analyzing-champion-basic-stats.html"><![CDATA[<p>My friends and I recently started playing League of Legends. It’s a 5v5 MOBA (Multiplayer Online Battle Arena) game where you play one of 155 champions and work together with your team to destroy the enemy’s base. As my friends and I are new the to the game I thought I would take a look at some of the data the api makes available in order to help understand the game better. First up is making sense of the base statistics and categories of the champions. (If you want to follow along I have a<a href="https://github.com/thereiswaldo/LoL-Champ-Analysis"> jupyter notebook on my github you can use</a>).</p>

<h2 id="data-understanding-and-exploration">Data Understanding and Exploration</h2>

<p>In general each champion has their health (hp), their magic resource (mana/mp), armour, chance for critical hits (crit), attack damage, and attack speed. All of these parameters have a partner stat for the amount they increase each time the character levels up. Additionally there is move speed and attack range which remain constant as the characters level.<img src="/uploads/histogram.png" alt="League of Legends Champion Stat Histogram" title="Champion Stat Histogram" /></p>

<p>As we can see from the histogram of each of the basic stats for all the champions, most follow a normal distribution, but a game with this many characters has several outliers that ignore any general rules.</p>

<p>League of Legends uses 6 classes to identify different champions. Mage, Support, Marksman, Fighter, Tank, Assassin. For new players it can be frustrating to die easily and often.  Some classes are inherently more forgiving than others, and my assumption as a new player is that Tank’s would have the highest health and armour, and they would gain the most in these stats as they level.</p>

<p><img src="/uploads/hp-and-armour-lvl-1.png" alt="" /></p>

<p><img src="/uploads/hp-and-armour-lvl-18.png" alt="" /></p>

<p>Plotting the health and armour at level 1 and the max level 18 for each primary class doesn’t clearly validate this theory. It is important to note that I’m only using the primary class. 117 of the 155 champions have a secondary class listed that we are ignoring here for simplicity. Due to the complexity of the game our goal isn’t to fully understand every detail of the champions, just enough that at a glance I can look at a champion’s primary class and safely make some assumptions about their base stats.</p>

<p>Another way to analyze how “tanky” champions are is to calculate the average effective hp for each class from their hp and armour values.</p>

<p><img src="/uploads/average-effective-hp-by-primary-class.png" alt="" /></p>

<p>Put simply, effective health is the amount of raw burst physical or magical damage a champion can receive before dying. For physical damage this is calculated using the defending champion’s armour and health. Similarly magic damage is mitigated by magic block and health. <a href="https://leagueoflegends.fandom.com/wiki/Health#Effective_health">(For more on effective health you can look here).</a> Taking the mean of all champions in each class gives us the above graph. Tanks and fighters at the top and squishy mage’s at the bottom. Mages notably deal the most magic damage have the best natural defences to magic (magic block) despite a low health pool.</p>

<p>How hard champions are to kill is interesting, but more exciting is how damage they do. We have attack damage and attack speed in the dataset, and when multiply them together we get damage per second. That is the damage per second the champion deals using only it’s right-click auto-attack at level 1. We can colour the average damage per second by also showing the average attack range for each class.</p>

<p><img src="/uploads/dps-and-range-by-primary-class.png" alt="" />We can see a clear divide here between what appear to be on average melee classes (Fighter, Tank, Assassin) and range classes (Marksman, Mage, Support). The melee classes dish out a higher damage per second (dps), but they can’t do so from a safe distance like their counterparts.</p>

<p>With this first pass over the data, it seems the base stats do a decent job telling a story of what the champion will be capable of. Tanks and Fighters are melee classes that take more damage to kill versus Mages that do good damage from range, but have a lower effective hp.</p>

<h2 id="logistic-regression">Logistic Regression</h2>

<p>Since the base stats so far seem to do a good job classifying each character, I wanted to see how accurately we could classify each champion using only their base statistics. To do this we’ll take 70% of champions with all the features we’ve used so far, and train a logistic regression model. In the training process each champions stats are compared and the stats that most readily predict their class are given a high weight. After training we can take the randomly assigned 30% of champions we didn’t train on, and apply these weights from our logistic regression model to see how accurately we can classify champions based only on their base stats.</p>

<p>Though the data is mostly ready for immediate analysis, one of the columns (partype) is categorical and tells us what resource, if any, the champion uses for it’s abilities. Most champions use mana, but others may use things like energy or fury. To do this we used one-hot encoding to convert these strings to zeroes and ones that we could use in our logistic regression.</p>

<p><img src="/uploads/class-prediction-confusion-matrix-heatmap.png" alt="" /></p>

<p>We can see from the confusion matrix of our logistic regression test results that we do a decent job predicting the Fighters, Mages, Marksmen and Supports, but haven’t figured out Assassins and Tanks. If we were 100% accurate with this model we would see zeroes in every cell except for the main diagonal. Printing out the accuracy score we get 66%, and the following classification report.</p>

<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>#Print the accuracy score of the logistic regression
print('Accuracy Score:', round(accuracy_score(y_test, preds),4)*100, '%')  

#Create classification report
class_report=classification_report(y_test, preds)
print(class_report)
</code></pre></div></div>

<p><img src="/uploads/classification-report.png" alt="" /></p>

<p>Like we saw from the confusion matrix, we can predict a few classes with good precision and recall, but on average we just aren’t there with this method. One of the advantages of using a logistic regression classifier is it’s interpretability, but with this low of accuracy it doesn’t tell us much. Without going into details the takeaway is that Marksmen and Mages have distinctive base stats (attack range, mana, hp), where Assassins and Tanks vary heavily. This is partially due to the high variety amongst the champions, the huge impact abilities/spells have on the game, and that most champions also have a secondary class.</p>

<p>If the classes themselves aren’t always the best at-a-glance indicator of what a champion might perform like, what other method could we use to get this?</p>

<h2 id="principle-component-analysis">Principle Component Analysis</h2>

<p>We could use a Principle Component Analysis (PCA) and see what groups of champions appear. For a detailed explanation of PCA I would <a href="https://www.youtube.com/watch?v=fkf4IBRSeEc">highly recommend this youtube video.</a> PCA is used here to extract the most important features in the dataset and boil them down to just 2 general parameters.</p>

<p><img src="/uploads/champion-stat-principle-component-analysis.png" alt="" /></p>

<p>We can see 3 somewhat distinct groups here. I don’t know enough of the champions to understand what the groupings are so we want to look at what features were important in getting us here. If we plot the relative importance of each feature on the graph we get this graph:</p>

<p><img src="/uploads/feature-importance-principle-component-analysis.png" alt="" /></p>

<p>It seems the simplest explanation for the groups are that those in the top right don’t use mana, the bottom middle are melee champions that use mana, and the left champions are mana using ranged champions.</p>

<p>When trying to group champions it looks like whether or not they use mana is important. From my limited experience the mana-users all follow similar rulesets for their abilities/spells while the non-mana users can be very different.</p>

<h2 id="next-steps">Next Steps</h2>

<p>Since base stats only provide us a glimpse at what each champion is capable of, my next post is going to look at how each player can customize their champion with items in game, and the underlying importance of gold.</p>]]></content><author><name>Jeremy Walsh</name></author><summary type="html"><![CDATA[My friends and I recently started playing League of Legends. It’s a 5v5 MOBA (Multiplayer Online Battle Arena) game where you play one of 155 champions and work together with your team to destroy the enemy’s base. As my friends and I are new the to the game I thought I would take a look at some of the data the api makes available in order to help understand the game better. First up is making sense of the base statistics and categories of the champions. (If you want to follow along I have a jupyter notebook on my github you can use).]]></summary></entry></feed>