Thursday, December 06, 2007

Code Duplication Woes

Code duplication is EVIL! We all know this right?

First let's define duplicate code. I will take the definition right off of wiki....

Duplicate code
is a computer programming term for a sequence of source code that occurs more than once, either within a program or across different programs owned or maintained by the same entity. Duplicate code is generally considered undesirable for a number of reasons. A minimum requirement is usually applied to the quantity of code that must appear in a sequence for it to be considered duplicate rather than coincidentally similar. Sequences of duplicate code are sometimes known as clones.

The following are some of the ways in which two code sequences can be duplicates of each other:

  • character for character identical
  • character for character identical with white space characters and comments being ignored
  • token for token identical
  • functionally identical
So why is it bad? Well for many reasons, one it causes the cost of maintenance to go way up because now not only do you need to change the code once, but you need to search the code for other ares to change. Make the code fatter. More code is not easier to understand. More code also takes longer to write.

Let me give you an example I just came across in the last few weeks. I have been doing some work patching a website. There was an error that was occurring anytime two new products were added to the database and a customer was purchasing one of the new products. Do I had to debug the issue. Basically what happened was the developer who wrote the web page that was failing decided he was going to create two static arrays. One array would hold all the prices of products, and the other would hold the product descriptions. he then put he item in the array at the array index corresponding to the database product key. that way when the page was passed the product id, he could look in these arrays to get the info out. Well, what happens if you forget to update these arrays when adding new products, well you get index exceptions and pages crashing. I have know idea why the developer did this. I mean you have the product id why not just get the details from the database? This poor design and duplication cost real world dollars.

Please never duplicate code, data or logic in a software application. It is never good practice. Oh there are tools you can get to search for code duplication. The one tool that comes to mind is Simian. I have never used it, but I might start.


No comments: