Post equation I know is wrong
I shipped a “wrong” equation once — and I would do it again in a heartbeat
There is a line of code in one of the devices I worked on that I cannot mathematically justify. I knew it was wrong while I was writing it. I would do the exact same thing in the exact same circumstance again, and I can tell you why. Working out the reasoning behind that decision taught me more than any of the parts of the project that actually worked properly.
Some context. A few classmates and I built a portable water-quality tester that uses a firefly enzyme called luciferase. Luciferase glows in proportion to the biological activity in a water sample, which is a very neat property to have, except that luciferase is also fussy about temperature. Cold makes it sluggish. Heat makes it fall apart. The brightness of the glow ends up depending not just on how much life is in the water, but on how warm the water happens to be. My job was to deal with that. Measure the temperature, work out how much activity the luciferase had lost, and adjust the reading back to a standard.
To do this, I needed a way to express any given temperature as a "percent of peak activity." The published data I had was only available at specific temperatures, but a real sample could be at anything in between, so I needed a continuous function. I decided to use a quartic regression. Below is the equation I actually shipped.
Substitute a temperature in, get an estimated activity percent out. Not mathematically correct, but functional.
Why it’s wrong
There is no fourth-degree function in a biological sense. Nothing in enzyme kinetics is fourth-degree. I picked a quartic because it happened to fit the data nicely between 5 and 40 °C, which is the entire scientific reasoning behind the choice.
The real curve is the result of a balance between two competing things. The reaction rate goes up with temperature, while the protein itself starts denaturing as it gets hotter. A proper model would combine both processes and would have parameters with actual physical meaning, like denaturation rates and activation energies. My parameters mean nothing. They are the numbers Desmos handed back.
This sounds like a cosmetic issue, but it has teeth. My polynomial is only reliable between 5 and 40 °C. Outside that range, a quartic is free to go wherever it wants. It can shoot off to infinity, while in reality the enzyme would have given up entirely. My function will happily hand back a confidently wrong number. This is the worst kind of failure. Not a crash, not an error, just a number that looks fine and is actually completely false.
So why keep it?
The question I had to answer was not "what is the most correct model of this enzyme?" The question I had to answer was "what is the most correct model of this enzyme for a cheap device that has to run on an Arduino in the field and never mislead anyone?" Those are very different questions. A lot of engineering, I think, is about quietly answering the second one while everyone assumes you've been answering the first.
A research instrument needs the mechanistic model, because accuracy is the whole point and the extra complexity is worth it. A two-pound microcontroller in a portable water tester is a very different thing. Within the range of temperatures water can realistically reach, the quartic and the "proper" enzyme kinetics model would probably give nearly identical answers. The extra rigour would buy me accuracy I cannot actually use, in exchange for more complex code running on hardware I barely trust to begin with.
What justifies the decision isn't the polynomial. It's the fact that I bounded the polynomial honestly. There's a fail-safe. If the temperature or pH reads an unreasonable value, the device throws an error rather than guessing. The quartic is not allowed to operate in a region where it would be lying. That guardrail, not the polynomial itself, is the part of the design I would defend the hardest.
What this changed about how I think
I used to think the most complex solution was always the best one. I thought of approximations as a more rigorous answer that just hadn't been figured out yet. I don't really think that anymore. An approximation that is understood and bounded can be a better engineering choice than a precise solution pointed at the wrong problem.
I don't want this to become a crutch or an excuse. The honest position isn't "approximations are acceptable." The honest position is: I can tell you exactly how my model fails, where it fails, and what I have in place to make sure it doesn't fail dangerously. Not knowing how your model fails is dangerous. Refusing to figure it out is worse.
If I come back to this project, the first thing I would do is not redesign the device. The first thing I would do is run the experiment. Swap in a proper denaturation-based model and see how much accuracy I actually gain across realistic water temperatures. I suspect the answer would be "not much," which would justify the original decision. But I don't know yet. And really, the point isn't whether I turn out to be right. The point is that the trade-off needs to be tested, not just assumed.


