Code Metrics: Lines of Code (LoC)
I started writing series of blog posting about code metrics. I plan to introduce different metrics and explain their meaning. Also I plan to introduce tools you can use to measure those metrics. Where possible I will introduce you how to use one or another metric. The first metric is the simplest one and it is called Lines of Code (LoC).
Purpose
Lines of Code shows how many lines of source code there is in your application, namespace, class or method. LoC can be used to:
- check the size of code units. If method size is more then 20 code lines then method may be too complex and not so easy to understand. You can also find large classes that must be split to smaller classes.
- estimate the size of project. You have to understand that LoC is useful estimation characteristic only under certain conditions and until you find some better estimation method (be quick finding it). You can use LoC of one application to estimate another one if they are similar applications by logic, requirements and functionalities. You must be still very careful when using this metric.
My suggestion is to use LoC to monitor the size of your code units. When it comes to software estimation you may probably find better estimation methods.
LoC is not linear estimation characteristic
There is very good book about software estimation: Software Estimation – Demystifying the Black Art by Steve McConnell. Before using LoC as silver bullet I suggest you to come back to the ground and read this book.
Jeff Atwood wrote very good posting about LoC – Diseconomies of Scale and Lines of Code. He cites Steve McConnell:
[Using software industry productivity averages], the 10,000 LOC system would require 13.5 staff months. If effort increased linearly, a 100,000 LOC system would require 135 staff months. But it actually requires 170 staff months.
To get better idea about difference in linear and real estimation take look at the following chart.
I think error at size of 35 staff months is pretty horrible experience for budget, isn’t it?
Productivity cannot be measured by LoC
One of the classic mistakes is using LoC to measure programmers productivity. It is nonsense. One complex algorithm may take about 100 lines of code but the time it takes to make it work may be equal to system that has 10000 lines of code or even more. There is heavy difference in complexity. By example, writing ASP.NET MVC application is pretty easy and straightforward compared to algorithm I mentioned.
Also how can be programmer who wrote 1000 lines of code more effective than programmer who wrote 20 lines of code and achieved same or even better functionality? I see here one more danger – why should programmers write effective and easy to manage code if their work is respected when they write much less effective and way longer code? Measuring productivity this makes strong professionals to seem as horrible ballast in team – do you really want to disrespect or even lose your main workhorses?
Types of LoC
There are two LoC metrics and they differ by measuring method:
- logical LoC contains only lines of executed code – definitions, namespace imports etc are not considered as executed code. As Patrick Smacchia points out in his blog posting How do you count your number of Lines Of Code (LOC)? the logical LoC is better characteristic because it is not dependent on coding style and language.
- physical LoC contains all the lines of code and it is measured by parsing files of source code. Take a look at Hackles to get better idea of physical LoC.
It turns out that logical LoC – however you measure it- is way better than physical LoC because it contains less noise.
Conclusion
LoC is good metric to measure size of code units. It can be also used as estimation metric but under very narrow and restrictive limits. It is something you can use when you start estimating but you have to leave it as soon as you find some more exact estimation method. You cannot use LoC to measure progress of project or productivity of programmers – don’t even think about it. :)
In my next postings I will introduce you how to measure code metrics using Visual Studio Code Analysis tools and NDepend.
Sure I can, Dave. :) This is not the only posting about LoC and I don’t want my postings to be too long.
nicely written Gunnar.
I don’t see how the NCSS and/or NCSL acronyms could be overlooked in an article about counting code lines.
As you have mentioned, the better way is to use it to measure the length of your methods, functions, etc to check for complexity of that method or function. However as Bill Gates has mentioned, Measuring programming progress by lines of code is like measuring aircraft building progress by weight.
Thank you for this post. I’ll be following your series on code metrics.
Idunno.
There are other metrics to measure code quality. As far as I know, there is an industry average of 10 to 12 KSLOCs per year for a programmer, which has proved to be highly consistent across a large range of languages and project types – I think Steve McConnell mentions this in the book you mentioned.
Precisely because of the stability of productivity in terms of SLOCs I think it is an excellent measure of programmer productivity. All things being equal (I mean code quality, quality of design, bug density and so on), it is obvious that the programmer cranking out more SLOCs a year is more productive. IMO this metric is is usable the other wa y round too: if a programmer or a team is cranking outn significantly more than 1000 SLOCs per month per man, there are probably quality or technology issues there, or at least counting issues.
To talk in terms of airplane building and weight: obviously, a plane weighing more, but being built similarly well with one weighing less will be smaller. In fact, in airplane building, economies of scale do exist, and of two planes built using the same technology, probably the heavier one is more fuel efficient and can carry more weight for less money. Only, technologies have limits, and you can’t go beyond a certain size using a specific technology. This comparison translated to software building would mean that a larger code base is probably doing more than a smaller one, and as such should earn more money for the company proportionally to size, if both rely on the same platform/libraries and were written using the same language.
IMO, the problem of using SLOCs as a productivity mnetric is comparing apples to apples. Of course, if you compare a sloppy codebase having 100000 SLOCs, heavily using copy/paste reuse, with a well written, well designed codebase of only 30000 SLOCs, and both to the same thing, you definitely can’t say the programmers who have written the 100000 SLOCs application were more productive, but you definitely can say they worked more, while being less productive.
Which is why I think SLOCs is an important metric, which, when used properly, can help you a lot in keeping track of software development efforts.
Thanks for good comment! One little thought from McConnell: which programmer is more productive: the one who gets high SLOC building five web forms per week or programmer who is working on complex algorithm and in the end of the week he has 50 lines of code written? In first case tools will help programmer to get more lines (by example Windows Forms designer generates hell load of code). In second case programmer writes a lot of code but most of it will be deleted during work.
I think SLOCs are good metric if you know how to apply it in the context of your team. But for sure it should not be the only metric when deciding about effectiveness of your team. The number of defects is also good metric although it is not directly measurable through source code analysis.