For those of you who’ve been following this thread of research into called balls and strikes in NPB from 2009 to 2022, I’ve got a conclusion for you: The chance of the Yomiuri Giants, in the nine seasons from 2009 to 2017, doing as well as they did getting called strikes in 1-0 counts on talent alone is next to zero.
I started this investigation with the observation that the Giants pitchers got an abnormally high percentage of called strikes in some counts between 2009 and 2022. These results came from a data set received from ScoutDragon.com’s incomparable Michael Westbay.
Upon further investigation, it became clear that 2018 was a subtle watershed in NPB.
Since that point several teams diverged from their previous called-strike results relative to the other teams in their leagues. That year, 2018, was when 11 of the 12 teams, having successfully installed Trackman pitch-tracking systems, began sharing that data with NPB for the purposes of “umpire development.”
It would be overly cynical, even for me, to attribute much of the shift to umpires suddenly being just becoming slightly more diligent, since teams and players are always changing. Much of it was likely due to shifts in teams’ talent bases and approaches, while some of it was likely just random noise.
It was, however, obvious from the start that would have been impossible for an ordinary average team to achieve anything close to what Yomiuri did from 2009 to 2017.
Former ump Osamu Ino attributed the Giants’ extreme success in getting called strikes to the extreme high quality of their pitching staffs. When he said that, however, I had no way to measure how likely it would have been for a team that was nearly always the best in the league at getting called strikes on talent alone.
To see if Ino’s assertion was reasonable, I created a program that constructed normally distributed leagues. Of course, not all teams have equal access to talent, particularly since some, like SoftBank, are really good at developing it, or like Yomiuri, are really good at maintaining a system that increases their access to amateur talent at the expense of other clubs.
Still, if you take a collection of teams, throw them into a six-team league, their results in any area will be normally distributed. The model I eventually settled on assigns teams an annual chance of getting called strikes is based on their ability to actually get called strikes relative to the league from 2009 to 2017.
In this model teams equal or surpass their actual weighted average of called strikes relative to the league about half the time. I’ll explain this in detail later. This model gives Giants the league’s highest chance of getting called strikes 93.8 percent of the time and a 5.9 percent of chance of being second best in any given year. It’s not 100 percent, but it’s close.
I ran through 10,000 sets of nine simulated seasons. Each year, I recordied how well each team did relative to the league in terms of their Z scores, the number of standard deviations above or below the league mean, and took the average of those over nine seasons for each of the counts in the study: 0-0, 1-0, 2-0, 0-1, 1-1, and 2-1. I then checked how often each team matched its historic weighted z score average for all counts. This turned out to be close to 50 percent for all teams, which was the goal.
In 10,000 sets of nine simulated seasons, the Yomiuri Giants hit or surpassed their nine-year average of Z scores in all six counts a grand total of seven times. The Hanshin Tigers did it three times, suggesting that the Giants chances of being No. 1 might even have been a bit too high but not high enough to remotely match what they did in competition.
The next worse result was for the Yakult Swallows, who did it 129 times in 10,000 trials. Each of these teams had one outlier count in which their actual results seem to have been so good that that it’s not obvious they were achieved through talent and luck alone.
For the Giants, it was 1-0, for the Tigers it was 0-1, which turns out to be the CL’s biggest outlier, and for the Swallows it was 2-1, although this is not anywhere in the same realm of improbability of the two counts where Yomiuri and Hanshin really excelled.
The table below shows each team’s historic nine-year Z score average in their outlier count, and how often the model was able to match those in 10,000 tries:
|Avg Z score
Last week I found 12 players, who during the period I had data for, pitched with the Giants and other teams. The weighted average of the 12 pitchers’ called strike percentages was six percent higher with Yomiuri than with other teams.
Going forward, I’ll need to repeat that process for the Tigers who played for other teams and see whether they had better results on 0-1 counts with Hanshin, and whether Swallows pitchers got better 2-1 strike rates than with other teams.
I’ve poked around some with home-road stuff, but my database needs a little house cleaning before I go back to look and check if these three team-count pairs were impacted by whether the teams were at home or not.
I’ve already looked into how teams’ called-strike chances increased or decreased starting from 2018, and adapting this model to those figures could provide insight into how teams have changed. My inclination is that a model based on the 2018 to 2022 data that assigns Yomiuri the same chance to lead the league in called-strike ability used in this model would fail badly.
The boring stuff
The study was designed to create six-team leagues taken from a randomly generated table of values with a mean of .5 and a standard deviation of .014 that produces figures similar to the actual distribution of teams’ called strike rates on first pitches.
Of course, the mean and standard deviation varies with each league, season and count, but this helps in creating a model where each team’s ability to get called strikes is placed within a normal distribution as it would be in real life. I used a pool of 6,000 values that were then sorted and divided into five pools, the 1,000 highest values, the next 1,000, the middle 2,000, the next 1,000 and finally the 1,000 lowest values.
To assign values to individual teams I created an array of 533 values, the letters ‘g’, ‘t’, ‘c’, ‘d’, ‘b’, and ‘s’. The array was shuffled and one value was selected randomly, assigned the next highest available called-strike chance, and then that team’s characters were deleted from the array.
So if the Giants’ ‘g’ was drawn first, the remaining 499 ‘g’s were deleted, the array reshuffled and the next value selected from the remaining 33 values, of which nine would be a ‘c’, eight would be ‘t’, seven would be ‘b’, six would be ‘d’ and three ‘s.’
These figures were settled on as the closest I could get to having each team generating similar average weighted z scores to what they actually did about 50 percent of the time.
I used the average number of called strikes each year in the six counts and then counted how many ‘called strike’ each team would get in that many trials given their assigned ‘ability.’
|Times this count was tested each season
The success rates for each league were used to generate a mean and a standard deviation so that each team would get a Z score — (team pct – league average) / standard deviation – for each count for each season.
After nine seasons, the average of each team’s nine Z scores for each count was compared to that team’s historic average to see whether it matched or exceeded it.
In 10,000 sets of nine simulate seasons, all but the three team-count pairs were matched over 1,000 times, and after the Swallows’ 129 six-count matches, the remaining three teams hit all of their six-count targets in a nine-season sample over 300 times.
I’m not pretending to be that knowledgeable with these kinds of studies, so if anyone has advice with ways to think about them or execute them, I’d be more than happy to listen.