Hello THH
What you are describing is simply correct statistical experimental design although it is stated in an unusual or unorthodox way.
Yes we do uncertainty budgets in every reading and we calculate the statistical power required to falsify the null hypothesis at a given confidence level and a hypothetical effect size.
That’s exactly what Google proposed in their Nature Perspectives paper and exactly what we intend to do. That’s just normal science how everyone is trained to do it.
Blinding can have its utility in biological experiments where placebo effect is something real that has to be considered. If I’m dealing with measurements logged to a real time data logger, it’s rather meaningless in my opinion.
As for input measurement error it’s typically less than 1W. Output is calculated based in a delta T measurement (errors add) and a mass flow reading in flow calorimeters which can be in turn based on velocity profile, diameter and density measurements. The two labs we intend to cooperate with use either airflow or water flow calorimeters. Uncertainly in any of these specific measurements or assumptions add up and the more measurements you have the higher the uncertainty.
Each lab is highly experienced and credible in calorimetry using different methods, instrumentation and personnel. Of our total uncertainty is plus or minus 5W at 3 sigmas and we detect a 100W effect, I don’t think any peer reviewer is going to be able to claim systemic error, especially with multiple calibration runs and multiple active runs chosen randomly and repeated 4-6 times.
The same randomization method (choosing calibration or active reactor) over 4-6 runs on at least two different calorimeters will likely bring us over 5 sigmas of statistical significance.
The reason I also choose the incubator method is simply due to its simplicity and single measurement reduces any chances for error. The only error could be from PTD resistance or thermocouple voltage or from insufficient mixing. Since we are using class A thermocouples maximum error is 5K at 1000K or 0.5%. Mixing can be confirmed by multiple probes at different physical locations in the incubator.
If 200W calibration input gives us 1000K and we can achieve 1000K with only 100W input then the maximum input can be 101W and at maximum error in the temperature can become 995K then you still would claim systemic error? Alternatively one could fix the input power and measure the equilibrium temperature with and without an active reactor.
After all the above is said and done, I’m not sure how any serious reviewer could claim systemic error. The incubator calorimeter was specifically designed to eliminate all types of errors to the extent possible.
If I programmed the system to choose any random input, equilibrate, measure and the. Randomly choose a new input, and we could detect such random input within a few Watts, would you suddenly feel that we could not properly detect additional heat when the active reactor is placed in the system?
Would not the suddenly doubling of input power be enough to convince you in such a system?