"We'd need to know that the sequencing and length of the calibrations was comparable with the active runs."
You made that up. It is an excuse to ignore the data. If you applied that standard, it would be an excuse to ignore every experiment on record, because no one ever calibrates for as long as an active experiment continues.
It is a logical consequence. If you have occasional intermittent active run spikes you need enough calibration time to be sure that whatever caused them has had time to appear in cal too - if it is the same for active and cal. Sure, if your anomalies are much more common in active runs you need less cal time to detect them. But if you see these spikes just once or twice in say 10 active runs you'd need at least that time cal to know whether the spikes are specific to active, or just happening sometimes anyway.
All this is not rocket science, and only somone determined to see error in what I post would object.