OK, all. Prepare for a dense article because I’m about to summarize everything you should know about software measurement definitions. I’ll tell you that most of this content was taken from the ISO/IEC/IEEE 15939:2017 “Systems and software engineering — Measurement process”. But truth be told, most of this standard came from the Practical Software Measurement and the GQM, so I guess it’s all good. :-)
By the way, here is the PSM book if you are interested:
This standard covers the software measurement process and the software measurement definitions. However, it really doesn’t do a lot for what measures you should choose for your organization. I think that practices such as KPIs and OKRs cover it very well, and I intend to write about these very soon. For now, I want to talk about how to define a measure because many organizations end up measuring inaccurately due to poor measure definitions. I want to emphasize that when I talk about accuracy, I’m not talking about being extremely precise but having a good idea of how imprecise your numbers are.
Another common mistake that I see is companies that at different moments measure different things thinking they are the same (like comparing story-point velocities of different teams or measuring the performance executing a process that sometimes doesn’t include significant activities).
Without extending much more, I bring you a summary of what I think are the most important parts of this excellent standard, focusing on what can help you when defining and documenting measures.
- Derived measure: measure that is defined as a function of two or more values of base measures.
- Base measure: measure defined in terms of an attribute and the method for quantifying it.
- Indicator: measure that provides an estimate or evaluation of specified attributes derived from a model with respect to defined information needs.
- Measurement method: logical sequence of operations, described generically, used in quantifying an attribute with respect to a specified scale. The type of measurement method depends on the nature of the operations used to quantify an attribute. Two types can be distinguished:
- subjective: quantification involving human judgment; and
- objective: quantification based on numerical rules.
- Scale: ordered set of values, continuous or discrete, or a set of categories to which the attribute is mapped.
Note that ISO doesn’t use the term “metric”. Each source material uses metric and measure with different meanings, so you must pay attention to the definitions. Always try to map to the concepts of base and derived measures here, and it will be much easier to understand other texts.
The type of scale depends on the nature of the relationship between values on the scale and the measurement method. For example, subjective measurement methods usually support only ordinal or nominal scales. Four types of scale are commonly defined:
- Nominal: the measurement values are categorical. For example, the classification of defects by their type does not imply order among the categories.
- Ordinal/Likert: the measurement values are rankings. For example, the assignment of defects to a severity level is a ranking. Many people try to do math with this scale, attributing numbers to the values.
You have probably seen this with “ratings”. Imagine you are researching how much people like a particular wine. They taste it and rank it among “Strongly dislike”, “Dislike”, “Indifferent”, “Like”, and “Strongly like”. Then some magician turns these into numbers from 1 to 5 and calculates an average, reaching the conclusion that everyone liked it 3.5 in 5. Well, first of all, there’s nothing to say that the distance between any 2 of these values is the same as any other 2. That’s for another scale type (Interval). Second, having most people liking your product around the average is very different from having a group of lovers and a group of haters.
- Interval: the measurement values have equal distances corresponding to equal quantities of the attribute. For example, cyclomatic complexity has a minimum value of one, but each increment represents an additional path. The value of zero is not possible.
- Ratio: the measurement values have equal distances corresponding to equal quantities of the attribute where the value of zero corresponds to none of the attribute. For example, the size in terms of the number of requirements is a ratio scale because the value of zero corresponds to no requirements and each additional requirement defined represents an equal incremental quantity.
I’m adding this here as a guide of what you can do with each type of scale:
|Addition and subtraction||X||X|
|Division and multiplication||X|
Criteria for selecting measures
Many different combinations of base measures, derived measures, and indicators may be selected to address a specific information need. You can measure how much of the software was built using story points, use case points, function points, lines of code, etc. This is just one example of how many different measures you can employ for one single information need. When deciding which one you want to employ, you should consider some criteria such as:
- relevance to the prioritized information needs;
- feasibility of collecting the data in the organizational unit;
- ease of data collection;
- extent of intrusion and disruption of staff activities;
- availability of appropriate tools;
- protection of privacy;
- potential resistance from data provider(s);
- number of potentially relevant indicators supported by the base measure;
- evidence (internal or external to the organizational unit) as to the measure’s fitness for purpose or information need, and its utility; and
- The costs of collecting, managing, and analyzing the data at all levels should also be considered. Costs include the following:
- Measures utilization costs: associated with each measure are the costs of collecting data, automating the calculation of the measure values (when possible), analyzing the data, interpreting the analysis results, and communicating the information products;
- Process Change Costs: the set of measures may imply a change in the development process, for example, through the need for new data acquisition;
- Special Equipment: system, hardware, or software tools may have to be located, evaluated, purchased, adapted or developed to implement the measures; and
- Training: the quality management/control organization or the entire development team may need training in the use of the measures and data collection procedures. If the implementation of measures causes changes in the development process, the changes needs to be communicated to the staff.
I know I said I was going to focus on defining the measures, not actually measuring as a process or method. However, how you measure will impact what you are measuring and vice-versa. As I said before, there are many alternatives to fulfill an information need. It’s important to understand which measures will make it easier to ensure some key aspects of your measurement process.
Accuracy of a measurement procedure
Accuracy is the extent to which the procedure implementing a base measure conforms to the intended measurement method. An accurate procedure produces results similar to the true (or intended) value of the base measure.
Measurement procedures implement the measurement methods described for base measures. These procedures may produce results different from what was intended due to problems such as a systematic error in the procedure, random error inherent in the underlying measurement method, and poor execution of the procedure.
The actual human procedure or automated implementation of a base measure may depart from the measure’s definition. For example, a static analysis tool may implement a counting algorithm differently from how it was originally described in the literature. Discrepancies also may be due to ambiguous definitions of measurement methods, scales, units, etc. Even good measurement procedures may be inconsistently applied, resulting in the loss of data or the introduction of erroneous data.
Subjective methods depend on human interpretation. The formulation of questionnaire items, for example, may leave respondents uncertain about the question and even bias the responses. Clear and concise instructions help to increase the accuracy of surveys.
Accuracy can be enhanced by ensuring that, for example:
- the extent of missing data is within specified thresholds;
- the number of flagged inconsistencies in data entry are within specified thresholds;
- the number of missed measurement opportunities are within specified thresholds (e.g., the number of inspections for which no data were collected);
- all base measures are well‐defined and those definitions are communicated to data providers. Poorly defined measures tend to yield inaccurate data. The repeatability and reproducibility of the underlying measurement method (see below) may also limit the accuracy achievable by a measurement procedure.
Repeatability of a measurement method
Repeatability is the degree to which the repeated use of the base measure in the same Organizational Unit following the same measurement method under the same conditions (e.g., tools, individuals performing the measurement) produces results that can be accepted as being identical. Subjective measurement methods tend to experience lower repeatability than objective methods. Random measurement error reduces repeatability.
Reproducibility of a measurement method
Reproducibility is the degree to which the repeated use of the base measure in the same Organizational Unit following the same measurement method under different conditions (e.g., tools, individuals performing the measurement) produces results that can be accepted as being identical. Subjective measurement methods tend to experience lower reproducibility than objective methods. Random measurement error reduces reproducibility.
Establishing a measure
If a picture speaks a thousand words, an example may speak a couple of hundreds at the very least. This is a good example of how to document a measure. This was adapted from the 15939:
|Information Need||Estimate productivity of future project|
|Measurable Concept||Project productivity|
|Type of Measurement Method||
|Type of Scale||
|Unit of Measurement||
|Derived Measure||Project X Productivity|
|Measurement Function||Divide Project X Requirements Implemented by Project X Days of Effort|
|Model||Compute mean and standard deviation of all project productivity values.|
|Decision Criteria||Computed confidence limits based on the standard deviation indicate the likelihood that an actual result close to the average productivity will be achieved. Very wide confidence limits suggest a potentially large departure and the need for contingency planning to deal with this outcome.|
As you can see, choosing which measure to employ and documenting it may be pretty straightforward once you know the aspects that need to be considered and the attributes that you need to define. This will help tremendously with interpreting it correctly and ensuring consistency over time across different organizational units, departments, and teams. I’d say this is the easy part of software measurement. Deciding WHAT to measure is the big problem.
I’ve written a nice article about unexpected negative outcomes of measurement recently that you will want to take a look at. It’s intended to help to avoid pitfalls such as “what you should not measure” but does not provide a lot of insights into what you should measure. Stay tuned for more articles on this matter, as I intend to write about software measurement and organizational alignment soon.
If you like this post, please share it (you can use the buttons in the end of this post). It will help me a lot and keep me motivated to write more. Also, subscribe to get notified of new posts when they come out.