× {{alert.msg}} Never ask again
Get notified about new tutorials RECEIVE NEW TUTORIALS

Refactor! Refactor! Refactor!

Jonathan Eunice
Apr 24, 2015
<p>I worked on a project today that had a 47-line function. It was complex, needing a convoluted dance of preprocessing, main logic, and what-happens-at-the-end cleanup code. When I stopped work, three refactoring cycles later, it was 20 lines of clean, concise, easily-understood code. I understand what's going much better, and I trust that it does what it should do much more. (Don't worry, that's not just a feeling—the refactored code passes the same test suite.)</p> <p>A few notes about this process:</p> <ol> <li> <p>It's iterative. It doesn't all happen in one feel swoop. Much like any attic or garage, you often have to clean up one bit of code to see what other things are going on—what other things can be cleaned up.</p> </li> <li> <p>Line counts are deceptive. The 20 lines of code? It's a lie. The count is accurate--but among those 20 are two <code>import</code> statements bringing in modules that together account for another 134 lines. The other modules weren't just pulled in from "out there"—a repository, a standard library, etc. No, those are lines I wrote. So shouldn't the comparison be 47 in vs. 154 out? Isn't this a net loss?</p> <p>Not really. There are only about 50 lines of primary code and comments in the two modules. Clean, simple, comprehensible lines. They're also general and reusable. Taking out the generous comments, test cases, and utility functions, the core of the extra modules accounts for about 20 lines.</p> <p>So I could inline that code and make a less aggressive claim, of 47 lines cut to 40. But it doesn't <em>feel</em> like 40, or 134, or whatever aggregate count you'd like to use. It feels like 20. In the main program logic, there are just 20 lines. Everything else is nicely modularized and encapsulated. Whatever the line count, I get the cognitive and quality advantages inherent in a 47 vs. 20 cleanup.</p> </li> <li> <p>Test cases? You betcha! Those support modules have their own isolated test cases. Tests that help improve and verify the correctness of that code. Even if I extensively tested the original 47 lines, there is no practical way that I would have carefully tested the intent of the logic I teased out into separate modules. Yet as separate modules, testing them was natural, direct, and focused. So was adding comments to explain what was going on—something that generally would not happen in the middle of a long, complex stream of "business logic."</p> <p>I was even able to do comparative benchmarks for several different alternative implementations. One of those benchmarks identified that a standard Python recipe creates a temporary list—a list that with my data is 200,000 items long. Eliminating that sped up the longest operation in the program by about 3.5x. Production runs might use ~1M item groups. The performance bump should be even more dramatic there; it might even save us from a resource exhaustion (out of memory) program failure. Thus modularity has given us testedness, quality, clarity, performance, reliability, and other advantages not even hinted at in the line count differential.</p> </li> <li> <p>I'm very happy with the results. I have much cleaner code. It runs much faster. It will be more robust, more maintainable, and more extensible. And I'll be able to use the modules created in other situations. This didn't come for free, of course. It took an extra project day to get in the rework. From this vantage, that was a good investment. Refactoring often is. "Writing is rewriting."</p> </li> </ol> <p>I love making code better. If you need a mentor to help you evaluate and refactor yours, I am here!</p>
comments powered by Disqus