Tuesday, November 22, 2011

How to Refactor Your Code

Red-Green-RefactorIf you develop your products using TDD or any of its derivatives (ATDD, BDD, etc), you are following the RedGreen → Refactor loop. If you are just getting started with TDD, you might find that while the red and green (write a failing test, and then make the test pass) are simple enough to understand and to demonstrate, the refactor phase is much less clear, and I’ve found few sources that explain it properly. In this post I will attempt to thoroughly explain what I do, and what I consider to be a best practice.

Refactor to Remove Redundancy

I usually use the refactor phase to reconstruct my code by removing duplications in code. I do this mostly in my production code (the code that will ship to my customer). I often leave duplications in my test code (the code that tests the production code), because for my tests I value readability over adhering to the DRY principle.

You should always start with your failing test (or unsatisfied specification, if you're into BDD). Production classes and methods that you pull from your tests should be public.

  • Note: If you're developing in .NET, you may wish to consider using the internal keyword (and adding the InternalsVisibleTo assembly attribute to your production assembly, so that your test project can use the code) instead. Then you would make it public only if another production assembly depends on it.

When you reach the refactoring phase (after making at test pass, or before writing a failing one), you should look at the code and walk through the following steps:

  1. Any substantial or complex code that you need in two (or more) methods should be extracted into a private method, and called from both (all) of the original locations.
    • Note: Any method that you create as part of the refactoring phase of your TDD should be made private.
  2. Make helper methods protected only when a derived production class needs them.
  3. Any method created in the refactoring phase (now private or protected), that you need in another production class should be extracted to a helper class and made internal (or public in any language that doesn’t support the concept).
  4. If the helper class is needed in another assembly, then:
    • If the other assembly already depends on the helper class' assembly, make the helper class public (if it isn't already).
    • If the other assembly does not depend on the helper class' assembly, extract the helper class to a helper assembly (the best fit, or a new one) and make it public. Be sure to make the original assembly reference the new helper assembly, if it doesn't already.

Refactor for Readability

There are no real steadfast rules for this. What I usually do is go over the code I just wrote (or modified) and look for any violations of team coding standards, or poorly named identifiers (classes, methods, variables, etc.). I usually don’t have very many of those, as I tend to name them properly (which is to say, to my own satisfaction, but then again I’m often considered the strictest coder on my teams).

One trick I do use, is to go over the methods and see if I it reads easily. By easily I mean is it fluent? I don’t go all-out trying to develop a fluent API, but just see if the classes, methods and variables lend themselves towards some semblance of the English language. The following is an example of a block of “fluent” code:

Code Snippet
  1. var user = new User(usernameInput.Text, passwordInput.Text);   
  2.  
  3. if (userManagementService.Authenticates(user))   
  4. {   
  5.     Navigate.To(requestedWebPage);   
  6. }

Fluency isn’t a must-have, but I find it better than commenting. Comments can lie. Code doesn’t.

Refactor for Complexity Reduction

Cyclomatic Complexity is a code metric that measures how many distinct paths exist in a program’s code. A high value is more difficult to read and maintain. I find that sometimes reducing complexity makes it easier to maintain the code. I therefore advise on doing this kind of refactoring especially when you need to improve your understanding of the codebase, rather than after making every test pass.

Here’s a list of techniques that can reduce the complexity of your code:

  • Invert conditional statements – If your code is nested in if statements, where on one branch you do something and on the other you do nothing, you might invert the condition (checking whether the condition does not occur), and then return (exit the method) if the if-statement evaluated to true. The following block of code will run if the statement evaluates to false (what would originally have run if it did evaluate positively).
  • Extract blocks of code into private methods. The complexity is moved to a different method

 

Refactor for Extensibility

This refactoring technique ties in with complexity reduction, and requires the use of some design patterns. I usually use the Strategy Pattern for this kind of refactoring. The idea is to remove multi-branch conditionals (such as chained if statements or switch-case statements) and extract the different cases into concrete strategy classes.

There are several benefit to this refactoring technique:

  1. New cases can be added without modifying (or with minimal modification to) existing code. This reduces the chance that existing code may break
  2. Cyclomatic complexity is reduced
  3. The strategies can be tested separately if you feel the need to (remember that they are originally covered by the original tests).

Other Refactorings

There are many more reasons to refactor code, but the above cover most of the cases I come across. If you are interested in reading more on refactoring software, I suggest reading the following books:

Hope this helps.

Assaf.