Effective experimentation

Like a lot of people in coffee I tend to do little experiments from time to time. Side by side brews of different grinds, or different espresso recipes. I take the results of these experiments and incorporate them into my understanding of coffee, which is still pretty fragmented (to say the least).

For every complex problem there is an answer that is clear, simple, and wrong.
– H.L. Mencken

As an industry we tend to set up simple A/B experiments, or sometimes something a little more complicated. We usually think we’re testing one variable (but with coffee brewing I am not sure if this is often the case) and we get results. Most of the time we get more correlation than we do causation. Just about everyone one of our experiments could be improved by a better understanding of method. This is where I fall down too.

There are, fortunately, some very smart people who read this blog from time to time and I hope they weigh in here. I have a little theory, I’d like to propose some experiment but I’d be interested to get some input on how to do it and see if people would also like to join in to share the data. (I know a small data pool doesn’t invalidate an experiment, but if we’re looking for statistically significant results then a larger data pool would be good to prove or disprove the theory.)


Brew temperature in espresso can be used to help negate the damaging effects of hot coffee grinds resulting from a grinder under heavy use.


The coffee bed absorbs heat from the brew water during the extraction process. The amount of heat lost here would depend on three factors: the amount of coffee, the initial temperature of the coffee and the flow rate of water through the coffee. a

We know brew temperature has an effect on an espresso’s cup quality. b  If we reduce the brew temperature, because the coffee is going to loose less heat, this may improve the espresso.

Experiment 1:

The first experiment should be identifying whether hot grinds are as bad as we think.  I propose a hot vs. cold grinder experiment.  Shots to be pulled on a machine that can deliver the same pressure and temperature profile on both groups.  Shots should be pulled to the same spec in terms of weight of dose in, weight of dose out and brew time.  In order to be compared shots from both the hot and cold grinder must fall within acceptable tolerances of each other (open to suggestion here but I would say 0.1g, 0.2g and 1s respectively).

They should be tasted blind by a panel of tasters.  I’d be interested to know how many times this would need to be run in order to get some significant data?

Assuming that a hot grinder produces consistently worse espresso we could move on to the next experiment.

Experiment 2:

The goal here would be to measure the impact of coffee cake temperature on brew temperature.  My idea would be to use a naked portafilter and measure temperature of the liquid as it exits the basket.  If we know our brew temperature (and our machine is stable) then we should see a variation in loss here that correlates to coffee grounds temperatures.

Ideally coffee beds with temps from 20°C up to 40°C would be measured.  Would using a simple IR thermometer be sufficiently accurate to get a reading on the coffee bed before brewing?  A consistent flow rate and dose would be extremely important here so the same brewing specs as Experiment 1 would be observed.

Experiment 3:

A machine with individual brew boilers would be necessary here.  Cool coffee would be brewed in one group at a standard temp, and hotter grounds would be brewed in the other group at a lower temperature so that the exit temperature of liquid from both baskets was the same.  Again, espressos tested would need to be brewed to exacting specifications.

These would then be tasted blind by a panel to see if any preference is observed.


How could these be improved?  Is it worth testing?  Any good suggestions for introduction to the necessary statistics?  If this seems viable would other people be up for joining in and sharing the data?

  1. We’ve all had painfully hot espressos that had a lot more to do with being a fast extraction, than the machine’s brew temperature.  (back)
  2. On a personal note my belief is that about 1°C is the minimum that people can distinguish with temperature as the only variable. The latter part of the sentence does make things very difficult I know. Couple that with the range of accuracy of most probes, despite the fact that they may read down to .1°F – it doesn’t meant they are absolutely accurate to that degree.  (back)

29 Comments Effective experimentation

  1. Tumi Ferrer

    I’m just curious about how long it takes for hot coffee to cool down until it’s the same temperature as cool coffee. I know it would mean a separate experiment but I think it’s a variable one needs to think about concerning consistency in making espresso for the experiments: taking exactly the same time distributing, tamping etc.

    I’m excited to see results; I’ll definitely try and do some of the experiments for sharing.

  2. Pingback: Tweets that mention Effective experimentation « jimseven jimseven -- Topsy.com

  3. James

    It does seem worth doing – though I wonder if you also need a control? Im not a scientist but something like an ABX test seems important so as to rule out simple minuscule variations within groupheads, tamping pressure, and technique. This would show that the difference in coffee bed temp. is more or less significant than shot to shot variation, depending on the results of course.

  4. Tuli

    It appears to be a pretty solid experiment. My own hypothesis has been that hot grinders ‘cook’ the grinds, which either accelerates the staling process or just affects the chemical composition of the coffee in one way or another. Am looking forward to the results of experiment 3 (if we get past step 1!) to see what is indeed the culprit.

  5. John Stubberud

    This is interesting: a regional or national barista championship would seem a fitting event and a unique opportunity to pull this through? Not in the midst of a contest, but as most of these arrangements are conducted over at least three days, it should be possible to get the permission to use the venue and machines at one time? You would have espresso machines and grinders calibrated pretty much equally, a most willing line of experienced baristi and technical freaks, and for sure a test panel large enough to consider your gathered data to be of a value :)
    You would just have to choose among events, and conduct this where the preferred machines are available?
    I’d love to be in a test panel :)

  6. Sean Milnes

    in an experiment such as this it seems that the equipment used to test is really important and ought to be the same for all testing, else you may need to create some sort of range for different machines. i would think that a machine with a heat exchanger would be ideal as well as a grinder with an auto doser. variable will be much easier to control.

    its rather interesting though. the implications for a fast paced shop could be fairly significant. you may be able to create a comprehensive users guide for some of the more commonly used machines! the same could be said for different s.o.’s and blends. individual brew parameters, down to the brew temp!

    very excited to see results…

  7. John Piquet

    Could you do the same experiment for brewed coffee?

    Using same water temp, dose, method… start with say… ten side by side comparisons, with the hot grinder vs cold grinder and do blind tasting, taking notes throughout the cup. Maybe there will be a change at a certain point in the cup in comparison to the other… maybe there won’t.

    Now given that several brew methods have a declining temperature profile, and you are talking about espresso – which IS different, but it may still give us an answer to the core question of “Do heated grounds affect final cup flavor?”

  8. Brady Butler

    This is an intriguing idea.

    After reading this a few times and mulling for a bit, it seems like the hypothesis you’re actually testing is that the hot-grinder “problem” is not caused by irreversible heat related damage, but by inadvertent brewing temperature variation. Is that right?

    I suspect that the tricky part will be insuring consistency in the puck starting temperature, as the grounds will begin cooling immediately. Accurately determining puck center temperature might be tough as well.

    Something I wonder about is whether the brew temperature curves will match up with the reduced temperature. Do you think that you might end up matching the peak brewing temperature, only to find that the beginning portion of the profile might be dramatically different?

    Good luck with this!

  9. Nate

    Great post! Methodology is one question I’ve struggled with all year, unsuccessfully in many cases.

    In answer to your question about statistically significant repetition, the rule of thumb in my statistics class was 30 repetitions to hit the significance barrier, but the more the better. I also agree that a control would be essential to the process.

  10. Chris

    James, one observation: This is the kind of experiment where, inevitably, there is a strong bias to the order in which the tasters taste things. Sensory scientists would typically control this with a difference test (sometimes called a triangle test). You would need to pull two shots in one way (let’s call them A), and one shot the other way (call it B), and ask the taster to tell you which two are the same and which one is different.

    If you do this, you need to randomize whether a taster gets two B shots and one A or two A shots and one B. You also want different tasters to taste them in different orders. There are six different serving combinations you should use: AAB, ABA, BAA and ABB, BAB, and BBB. Mixing up the serving order also minimizes the natural biases we have to the order in which we eat or drink something.

    Having done this for foods, it’s always surprising to me how often you find that you can’t really tell the difference between A and B when using a proper sensory test, compared to a side by side that would have you swearing up and down that you can tell the difference between A and B. Even doing this a few times often gives you a hint as to whether something is different or not.

    But if you really want to be certain, then you care about the statistical significance. In turn, this entirely depends on how much confidence you want to have that a perceived difference is real and how big the difference really is (or is not)? In terms of confidence, do you want to be 50%, 80%, 90%, 95%, 99%, 99.9% sure? I’d say between 50% and 80% sure is reasonable, if for no other reason that it means doing the experiment the least number of times. As for how big the difference is, the number of times you have to do the test depends on how many tasters are guessing right or wrong—which gives you an indirect idea of how big the difference in taste is or is not. In simpler terms: if everyone is getting it right or wrong, you can stop sooner. Finally, you also need to decide in which direction would you prefer to be biased towards? The more you want to guard against a false positive—assuming there is a difference, when really there is not—then the more likely you are to make the opposite mistake, the false negative, of assuming there is not a difference, when really there is (false negative).

    Following all of this? If not, here is a slightly more straightforward answer: If the differences between one way and the other seem small, you will need a lot of tests—possibly hundreds or even thousands, for even 80% confidence. If the differences seem obvious, you might get away with doing the experiment as few 28 times to give you better than 50% certainty that there is (or is not) a difference, and around 70 times for 90% certainty.

    Clearly, doing all of this presents some logistical challenges.

  11. Ole


    from my understanding, the heating of the grounds in the grinder itself does some damage to the taste that you couldn’t revert by brewing it colder. If that is so, your two starting materials for the test (apart from their temperature) will be different – and should deliver different espressos even when brought to the same temperature before brewing.

    If your goal is to detect differences in the cup caused by different grounds temperatures, you should ensure stable conditions there (Brady Butler already pointed out this issue). The grounds will cool down to room temperature pretty fast. To make the test more stable and the results more discernable, an idea would be to cool one portion down to 10° in the fridge and heat the other to 60° in the oven. But alas, this would not deliver real-world results.


  12. Brady Butler

    “from my understanding, the heating of the grounds in the grinder itself does some damage to the taste that you couldn’t revert by brewing it colder.”

    Ole, that was my understanding as well up, until about 2 weeks ago. However that may not really be true… at very least it should not be treated as a given.

  13. Ole

    Brady, what happend to shake this “foundation”?

    But anyway, before James’ experiment could start properly, this point should be adressed. Or did someone already do the research on that?

  14. Eric

    I have two concerns and one suggestion.

    The first concern is, even if your hypothesis was correct and a cooler brew temperature negates the damaging effects of hot grinds, you would need to know the temperature distribution of grinds leaving the grinder as it warms up. You would need to know what “correct” brew temperatures correspond to the changing grinds temperatures. Presumably, as usual, this would be different for a given bean at a given point in the roast at a given time of day under given ambient temperature/ humidity conditions.

    Second, heat is only absorbed by the coffee bed until thermal equilibrium is reached between the group head, brew water, coffee bed, and portafilter. I would imagine this to be less than 10 seconds (experiment??), which means that the vast majority of a 25 second shot takes place at thermal equilibrium. The temperature of thermal equilibrium is a function of the machine, not a function of grinds characteristics. Any effects of grinds temperature are done in the first 5-10 seconds of the shot, and the rest of the time is spent simply brewing at a reduced temperature. That is not to say that the final product is independent of grinds temperature, but rather any characteristics of hot grinds that result in a final product different than that of normal temperature grinds are intrinsic to the grinds themselves and not correctable by the brewing system. (This point is a bit of speculation on my part. I believe that what you have proposed in “Experiment 3” would do pretty well in addressing it.)

    A useful question to understand the answer to before attempting any of this is, “What are the damaging effects of hot grinds?”

    I would contend that optimizing your brew temperature to a room temperature grinder is always your best bet. It will give your customer that comes in during normal operating conditions an optimum beverage, while giving the hoards during a rush a very good one…and, of course, not stress out your already stressed baristas with another variable to control.

    And one suggestion. In “Experiment 2”, I would not think that a naked portafilter would be the best method to measure temperature. You would be measuring the temperature of a narrow stream in open air that is most likely moving. To demonstrate the concern here, try your set up with a temperature probe under a naked portafilter, while holding another temperature probe in the volume of liquid held directly under the stream. Do they read the same? Is the volume of liquid at least cooler than the stream? (I have observed this discrepancy when measuring the temperature at the spout of a Takahiro kettle.) If not, I would try sticking a flexible temperature probe up one spout of the portafilter, which would help insulate the stream.

  15. Phil Mackay

    I’d also be interested to know what happened to shake that belief Brady..

    The fact that hot grinders do damage or “stale” the coffee is kind one of those stories in coffee that maybe we all take for granted rather than questioning the underlying reasoning.

    I would think that if hot grinders do in fact stale the coffee, then reducing the brew temp is more likely to make the shots taste worse.
    I’m sure we’ll get some answers as to what effect the grinder heat really does have from these experiements, even if it only creates a whole new set of questions

  16. James Hoffmann

    “Second, heat is only absorbed by the coffee bed until thermal equilibrium is reached between the group head, brew water, coffee bed, and portafilter. I would imagine this to be less than 10 seconds (experiment??), which means that the vast majority of a 25 second shot takes place at thermal equilibrium. ”

    From brief experimentation this time is closer to 18-25s, depending on puck depth.

    “If not, I would try sticking a flexible temperature probe up one spout of the portafilter, which would help insulate the stream.”

    I like this idea a lot, and will try it!

  17. James Hoffmann

    Chris – thank you so much for the comment, this is awesome stuff!

    As for the challenges – I am sure you’re right, and it is a concern for me. I think I need to rush a little less and make a bit more time for smaller experiments to end up with something worthwhile.

  18. James Hoffmann

    I think this would be an excellent experiment to try – and one that would be reasonably simple to set up with a constant water source and 10 cupping bowls, staggered in brew time to allow temp testing at consistent stages.

    I will have a go at putting this together.

  19. James Hoffmann

    Why an HX machine? Surely a flat line profile from a dual boiler, rather than a curve, would be easier to work with? (Of course some HX machines do flat line too – before I get into trouble for inaccuracy!)

  20. James Hoffmann

    My concern about cooling coffee is the difficulty in factoring in the loss of CO2 from hot grounds that have cooled on the way the coffee brews. We are reasonably sure that more CO2 in the grounds makes it harder to extract the coffee – so this would probably influence tasting of the shots.

  21. Brendon

    Illy’s Espresso Coffee: The Science of Quality has a short section (5.5, p227 in 2nd Edition) on physio-chemical changes that result from grinding, including heating from grinding. The main concerns are
    1.Accelerated loss of CO and CO2 due to increase in internal pressure of the gas-containing cells (as they heat up), and the volatile aromatic which are carried with these gasses.
    2.Decreased viscosity of the lipids in the ground coffee, which allows these semi-viscous fluids to seep out the cells and coat the outside of the fractured particles (grounds). This allows the coffee particles to become more cohesive and, with compaction into the portafilter, may influence the hydraulic resistance of the cake

  22. James Hoffmann

    “may influence the hydraulic resistance of the cake”

    This seems a little bit of a cop out to me. It does or it doesn’t. And if it does – then how? They know so much, I wonder why this was left so open…

  23. Brendon

    The problem is that it is difficult to say things with certainty in scientific literature. If they were to say “it does”, that requires proving causation. That is, proving that an increase in the hydraulic resistance of the cake is primarily caused by the increased fluidity (decreased viscocity) of the lipids in the roasted coffee, and not by another factor that is brought about by a hot grinder. A conclusive statement about the causation would require proving that other influences (such as loss of CO2 or alterations in fragmentation affecting the size distribution of the grinds) are not the primary factors.

    Anyway, from the literature I’ve read on espresso, I would conclude that not that much representative research is done on *quality* espresso preparation. Most papers I’ve read use pods in their experiments, and I distinctly recall a paper that used a Krups consumer machine for preparation. I strongly believe that there needs to be a better exchange of equipment and know-how between researchers and coffee professionals.

  24. Ole

    I made a little test here, completely unscientific, and it gave results everybody would expect. I ground two portions of coffee and let them sit for 20 minutes, one at room temperature, the other on the cup warmer of the espresso machine. The heated grounds were a bit clumpier when stirred, but not a big deal. The shot ran a bit slower too with the warmed grounds, but no big deal either. Both shots tasted worse than a freshly ground shot, but the warmed gounds produced a shot with even more stale aroma – and a burned flavour on top.

    So, this was completely exaggerating the conditions, and it was only one try, but I will not stick to warming my grounds…

  25. Bill

    You could remove the human tasting element to begin with, and focus on looking for a statistically significant measurable difference (refractometer, extracted wieght/time, whatever you think is most likely to demonstrate the difference) . A second series repeating the experiemnt, and introducing a human ‘sensory’ component will then tell you if there is a tasteable difference, and assuming you have experimentented with a variety of grind temperatures, you could then also establish the level of “minimal clinically important difference”.

    Chris quite rightly suggests undertaking a power calculation, but you will have some difficulty calculating this without knowing the minimal clinically important difference, and having no previous experiments to base your experiment on. A post hoc power calculation may be the idea or at least a starting point.

    I would keep the experiment isolated to a single machine with seperate boilers set up identically and compare the same coffee from two of the same grinders, set up identically, with the exception of workload.

    I would also get an independent person, blinded to which grinder is overworked, to take the measures, seperate from the person pulling the shots.

    This experiment above would provide the evidence that the hot grinds impacts on cup quality (unless this has already been demonstrated somewhere), and move it beyond an assumption. Once this aspect is supported, you can then specifically look at repeating the expeeriment, with the workloaded grinder, and alter water temperature as the dependemt variable, and so on…

  26. Brady Butler

    Greetings all. Not to live in the past, but did anybody ever do any of this? I know I didn’t and was curious.

Leave A Comment