It’s good that research was done to inform design considerations but it could be misdirected if too much of the decision was based on the self-reported preference of the participants and the number of people tested seems unusual.
Having done a fair amount of user testing myself, I’ve found that preference can have very little to do with measurable indicators. For spotting problems across rounds, you generally only need a few cohorts of 5—but for gathering statistically significant trends, you need far more than 600 if your audience is in the millions. For both types of data, the participants’ backgrounds are critical to consider for a global audience. I hope additional testing and monitoring of live changes are being done since these updates were enacted, so that the two studies are not the sole ongoing basis for the updates.