Prepared by the
U.S. DEPARTMENT OF COMMERCE
Technology Admistration
National Institute of Standards and Technology
Computer Systems Laboratory
Gaithersburg, MD 20899
For the
Department of Defense
Ballistic Missile Defense Organization
NISTIR 5459May 1994
This report is preliminary. Additional research is needed to ensure the completeness of the quality characteristics and supporting metrics, and to provide guidance on using the metrics.
Reusable Software; Quality Characteristics; Software Reliability; Completeness; Correctness; Software Metrics
The Software Producibility Manufacturing Operations
Development
and Integration Laboratory (Software Producibility MODIL)
was
established at the National Institute of Standards and
Technology
(NIST) in 1992. The Software Producibility MODIL was one of
four
MODILs instituted at national laboratories by the U.S.
Department of
Defense/Ballistic Missile Defense Organization (BMDO
The initial focus of the Software Producibility MODIL was
software
reuse. When software is considered for reuse, especially in the
high-
integrity
This report on quality characteristics and metrics for reusable software is preliminary. It identifies a set of quality characteristics that are most commonly referenced in technical literature and standards. These quality characteristics are common to all software products (e.g., requirements documentation, design documentation, code, and test documentation.) Different metrics for each product may be used to assess the degree to which the product contains each quality characteristic. This report provides some explanation of the value of each metric for determining the reusability of the software. However, more research is needed to ensure the completeness of the quality characteristics and associated metrics.
This preliminary study alone does not provide sufficient measurement information for determining the reusability of software. For example, the value of each characteristic and each metric for each product should be correlated to a reusability index. The reusability index, which no one yet has defined, should also include process metrics (e.g., effort). Most of the metrics have been used for several years and are well-understood relative to structured design methods but most have not yet been applied to software developed with object-oriented technology. Another important topic to be considered for reusing software in high integrity applications is the knowledge about the software relative to its specific domain and application within that domain and the type of information that must be present for analysts to decide how the software must change to meet new requirements.
Additional research topics include, but are not limited to, the following:
The purpose of this report is to summarize a set of metrics that are useful in measuring certain quality characteristics for software. In particular, these characteristics are applicable in assessing the reusability of software products. The quality characteristics are defined first. For each software product (requirements, design, code, test documentation), several metrics are given that will help to qualify the quality characteristics for that product.
A definition of software quality characteristics as given in [ISO9126] is "A set of attributes of a software product by which its quality is described and evaluated." The set of attributes includes functionality, reliability, usability, efficiency, maintainability, and portability. Several other documents ([SAC], [GPALS]) include other characteristics (e.g. adaptability, complexity) that apply to assessing the reusability of software products.
The metrics listed in this report are product metrics, and therefore, are meant to be applied to software products. The products are requirements documentation, design documentation, code, and test documentation. Because the focus on quality for software in the past has been on code, there are many more code metrics listed in this report.
A complete measure of reusability should also include process metrics applied to the development process. This report does not include process metrics.
One document, [SAC2], gives three criteria that should be used to evaluate the metrics themselves. These criteria are:
In this report, the attributes of interest are the quality characteristics of the software product. The metrics and characteristics can then be used to assess the reusability of the product, and that is the overall attribute of interest.
Most of the metrics in this report have been used for several years and are well-understood relative to structured design methods, but most have not yet been applied to software developed with object-oriented technology. Another important topic to be considered for reusing software in high integrity applications is the knowledge about the software relative to its specific domain and application within that domain and the type of information that must be present for analysts to decide how the software must change to meet new requirements. Some work on this topic has been conducted by NIST and may be found in [NIST5309].
Section 2 of this report provides definitions for each quality characteristic. Section 3 lists suggested metrics for each quality characteristic which pertain to a specific software product. Section 4 provides a summary of this report.
The definitions in sections 2.1 and 2.2 are stated as they appear in the referenced documents and may be associated with software reuse. Several quality characteristics have more than one definition.
This report makes no preference on the source of the definitions. The definition used for each characteristic in section 3 of this report is an aggregate meaning taken from the various sources, mixed with the intuitive definition.
2.1 Definitions of Software Quality CharacteristicsOne standard, [ISO9126], has identified several software quality characteristics: portability, efficiency, reliability, functionality, usability, and maintainability. In this section, completeness and correctness are equivalent to functionality as described in [ISO9126]. Understandability is substituted for usability in this section because the quality characteristics used in this report are meant to be applied to individual software products and not the only final software system.
The other quality characteristics in the following list have been extracted from several documents, including [AFSCP800-14] and [CONTE]. There is no standard set of quality characteristics for assessing software which is widely accepted. The list below is an aggregation of the most common quality characteristics.
The definitions in this section are definitions of terms used throughout this report. Several of the terms defined here are characteristics of software products which can be measured (e.g. cohesion and complexity.) These characteristics are used to further define the quality characteristics from section 2.1 in sections 3.1 through 3.5.
This section suggests metrics for each quality characteristic associated with software products (requirements documentation, design documentation, code, test documentation). Each product has several quality characteristics, along with metrics to be used in assessing the quality characteristic for the product.
The analyst must identify the objectives to be achieved from collecting and analyzing metric data. It must be possible to collect the data, and methods should exist for analyzing the data. The analyst must present the results such that recommendations can be made concerning the reusability of the software. Obviously, optimum values for all metrics cannot be achieved simultaneously. Selection criteria for requirements for metrics include tradeoffs between the quality characteristics and their priorities. For example, efficiency and understandability often are in conflict with each other; efficient code may be necessarily coded in an assembly language, which reduces the understandability (and hence, maintainability) of the software.
Another selection criterion is the ability to collect the required data for the metric. The simplicity of data collection is one reason lines-of-code remains a popular metric despite its deficiencies. Many other metrics have data collection requirements that can be automated, and therefore, may be less costly to implement. Examples include defect counts and some complexity metrics. Other metrics require more human involvement, such as requirements tracing, function point analysis, and test coverage.
Section 3.1 identifies the metrics for quality characteristics that may be applied to all software products, with minor adjustment. Sections 3.2 through 3.5 identify metrics for the quality characteristics of products of requirements, design, code, and test.
The metrics for the quality characteristics in this section can
be applied to all software products. Most of these metrics are
primitive in the sense that they are simple counts of problem
report values. Cause and effect graphing and
RELY
cause and effect graphing [IEEE982.1]
number of problem reports per phase, priority, category, or
cause [NIST500-209]
number of reported problems per time period [NIST500-209]
number of open real problems per time period [NIST500-209]
number of closed real problems per time period [NIST500-209]
number of unevaluated problem reports [NIST500-209]
age of open real problem reports [NIST500-209]
age of unevaluated problem reports [NIST500-209]
age of real closed problem reports [NIST500-209]
rate of error discovery [NIST500-209]
RELY - Required Software Reliability [IEEE982.1]
Because requirements documentation is usually written in human readable format, the metrics that have been defined for requirements typically require manual analysis. The data needed for many of the metrics must be gathered by hand (i.e. counting requirements). Many of the metrics are subjective in nature, and it is up to the analyst to decide what is a "good" value. For example, the analyst must determine the acceptable readability level of the documents.
Many of the metrics for quality characteristics in this section can be used for other products. The readability metrics may be applied to preliminary and software system design documents, for example.
completeness metric [IEEE982.1] [AFSCP800-14]
requirements traceability [IEEE982.1]
R1
TM = -- X 100%
R2
deviation between planned number of
System/Segment Design Document (SSDD) software
requirements to be documented in the Software
Requirements Specification (SRS) and actual
number of SSDD software requirements completely documented
in the SRS [AFP800-48]
Document Relationships [NSWC2]
number of discrepancies as a result of each review [STEP]
number of conflicting requirements [IEEE982.1]
requirements compliance [IEEE982.1]
requirement errors reported / total number of requirements
[GPALS2]
requirement errors corrected / total number of requirements
[GPALS2]
number of requirements faults and structural design faults
detected during detailed design [NIST500-209]
size of the application domain [STARS]
size metrics
readability metrics
complexity
Software design documentation is often divided into three activities: functional allocation, software system design, and unit design. These software design activities occur in three chronological phases: preliminary design, detailed design, and unit design [CARD].
The first phase is preliminary design. Designers collect related requirements into functional groups and identify dependencies among functions. The preliminary design may be represented by data flow diagrams, high-level structure charts, or a simple list of requirements by subsystems [CARD]. The choice of metrics depends on the representation used. Requirements traceability can be applied to any representation to verify that requirements are being met. Data flow complexity can be applied to data flow diagrams to evaluate the understandability of the diagrams.
Next comes detailed design, where the overall architecture of the
software system is defined. This step allocates data and functions to
individual units or design parts. Internal interfaces must also be
specified at this stage [CARD]. A structure chart is commonly
used to represent the system design. Some of the metrics applied to
this type of design are data or information flow complexity, external
(D
The final design phase is unit design. In this phase, algorithms
and data structures are defined. Application and implementation
specific information is added to the design. The design itself is often
represented as pseudo-code and module prologues [CARD].
Nearly all of the metrics specified below can be applied to unit
design documents. Some metrics specific to unit designs are internal
(D
The unit design phase is the first indication of the reusability of individual modules. For example, the complexity of a module can be determined by its design, before any coding is done. Many of the same metrics that are typically applied to code can be applied to unit designs. Modularity can often be assessed at the unit design level. See [CARD] for references to studies on heuristics for achieving modularity. These heuristics are small modules, limited data coupling, medium span of control, and singleness of purpose. Modularity is a good indicator of the reusability of an individual software module.
requirements traceability [IEEE982.1] (Applied to all
designs)
deviation between planned number of Software
Requirements Specification (SRS) requirements to be documented
as Computer Software Components (CSC) into the Software Design
Document (SDD) and actual number of SRS
requirements completely documented as CSCs in the SDD
[AFP800-48] (Applied to system design)
defect density [IEEE982.1] [NIST500-209] (Applied to unit design)
number of structural (architectural) design faults detected during
detailed design [NIST500-209] (Applied during unit design)
number of design faults associated with each module [NIST500-
209] (Applied to system and unit design)
number of integration test cases planned/executed involving
each module [NIST500-209] (Applied to system and unit design)
number of black box test cases planned/executed per module
[NIST500-209] (Applied to system and unit design)
number of design errors reported / total number of units [GPALS2]
(Applied to all)
size of application domain [STARS] (Applied all designs)
target CPU usage as percent of capacity [STEP]
target I/O usage as percent of capacity [STEP]
target upper bound storage usage [STEP]
percent actual of target upper bound storage usage [STEP]
target upper bound RAM usage [STEP]
percent actual of target upper bound RAM usage [STEP]
cohesion metric [NIST500-209] [SQE] [CARD] (Applied to unit
design)
coupling [NIST500-209] [SQE] [CARD] (Applied to system and unit
designs)
number of features that are language-specific (Applied
to unit design)
number of features that are operating system specific
(Applied to unit design)
cumulative failure profile [IEEE982.1]
Design Structure Metric [IEEE982.1]
size metrics
complexity metrics
readability metrics
Much of the research into metrics has focused on code metrics. Hence, there are many different kinds of code metrics, and many variations on common metrics. In terms of reusability, useful metrics are the product metrics are used to measure the size, complexity, and readability of the source program. In order for a component to be reusable, it must be understandable by the software engineers. Also, the component should encapsulate as much implementation detail as possible. Well-defined, simple interfaces are desirable.
In assessing existing components for reusability, it is useful to examine the history of the component in actual use. Fault density, code-related problem counts, defect density, and efficiency are some of the metrics used for this assessment. The longer a component has been in actual use, the higher the confidence in the component's correctness, assuming low fault and defect counts. Also, the testability of the component is critical when reusing the software. A well-defined set of test cases aids in quickly assessing the components use in a new environment. The testability of a component is defined in part by its complexity, as well as its size.
There are many methods used to calculate lines-of-code. Two documents, [IEEE1045] and [SEI], give methods which are used to ensure consistent counting of lines-of-code.
number of ambiguous references [SAC]
number of improper data references [SAC]
percentage of defined functions used [SAC]
percentage of referenced functions defined [SAC]
percentage of conditional processing defined [SAC]
fault density [IEEE982.1]
number of code-related problems/errors reported [CONTE]
number of code-related problems fixed [CONTE]
number of program changes per time period [CONTE]
number of changed lines of code per time period [CONTE]
number of coding errors / total number of units [GPALS2]
defect density [IEEE982.1] [NIST500-209]
defect indices [IEEE982.1]
non-loop dependent statement in loops: (number of modules with
non-loop dependent statement in loops) / (total number of modules)
[SAC]
compound expression evaluation: (number of modules with
repeated compound expression evaluation) / (total number of
modules) [SAC]
total number of memory overlays [SAC]
amount of non-functional executable code: (number of
modules with non-functional executable code) / (total number of
modules) [SAC]
coding of decision statements: (number of modules
with inefficient decision coding) / (total number of modules) [SAC]
data grouping: (number of modules with inefficient data grouping) /
(total number of modules) [SAC]
initialization of variables: (number of modules with variables not
initialized whendeclared) / (total number of modules) [SAC]
target CPU usage as percent of capacity [STEP]
actual CPU usage as percent of capacity [STEP]
projected CPU usage as percent of capacity [STEP]
target I/O usage as percent of capacity [STEP]
actual I/O usage as percent of capacity [STEP]
projected I/O usage as percent of capacity [STEP]
duplicate global data definitions: (number of modules
with duplicated data definitions / (total number of modules) [SAC]
duplicate code: (number of modules with duplicated
code) / (total
number of modules) [SAC]
software requirements allocation [SAC]
dynamic memory management [SAC]
storage optimizer [SAC]
target upper bound storage usage [STEP]
actual storage usage [STEP]
percent actual of target upper bound storage usage [STEP]
projected storage usage [STEP]
target upper bound RAM usage [STEP]
actual RAM usage [STEP]
percent actual of target upper bound RAM usage [STEP]
projected RAM usage [STEP]
expandability
multiple usage metric: (number of modules referenced
by more
than one module) /
(total number of modules) [SAC]
mixed function metric: (number of modules that mix
functions) / (total number of modules) [SAC]
data volume metric: (number of modules that are data
volume
limited) / (total number
of modules) [SAC]
data value metric: (number of modules that are data
value
limited) / (total number of
modules) [SAC]
redefinition of constants metric: (number of constants
that are redefined) / (total number of constants) [SAC]
complexity
size metrics
effort to fix bugs
cohesion [NIST500-209] [SQE] [SPC]
coupling [NIST500-209] [SQE] [SPC]
number of entries/exits per module [IEEE982.1]
[NIST500-209]
software independence
hardware independence
reliability models
testability
size metrics
traceability metrics
complexity metrics
readability metrics
Software testing metrics are used to assess the adequacy of the test procedures and test data in verifying the software code. In order to gain confidence in a software component's reusability, a comprehensive set of test cases is necessary. A direct relationship between test cases and components is necessary in order for the component to be adequately tested in a new environment. Component test cases should be traceable to the components, and should be maintained as the components are changed. Also, component test cases should be delivered with the components.
System test cases should be linked to requirements specifications, and ideally, to the domain of interest. In order for system test cases to be reusable, there must be a tie-in to a specific requirements area, and therefore, a specific application domain. System test plans may be extracted from several subdomains and regrouped to test subdomains in a new domain area.
See [MCCABE] for details using cyclomatic complexity to "measure
the completeness of the testing that a programmer must satisfy."
Specifically, branch coverage and path coverage are verified for
completeness. McCabe's technique can be used to develop a
set of test cases which test every outcome of each decision, and
execute a minimal number of distinct paths.
test coverage [IEEE982.1]
Test Sufficiency Indicator [AFSCP800-14]
[IEEE982.1]
coverage metrics
Data flow metrics [WEYUKER]
Definition: variable is given a new value
P-use: variable is used in predicate portion of a
decision statement
C-use: all other variable uses, including variable
occurrences in the right-hand side of an assignment
statement, or an output statement
all-definitions: test data be included that causes the
traversal of at least one subpath from each variable definition to
some p-use or some c-use of that definition
all-c-uses: test data be included that
causes the traversal of at least one path from each variable
definition to every c-use of that definition
all-p-uses: test data be included that
causes the traversal of at least one path from each variable
definition to every p-use of that definition
all-uses: test data be included that
causes the traversal of at least one subpath from each variable
definition to every p-use and every c-use of that definition
all-du-paths: test data be included that
causes the traversal of every simple subpath from each variable
definition to every p-use and c-use of that definition
Percent of all-definitions covered by test scenarios
Percent of all-c-uses covered by test-scenarios
Percent of all-p-uses covered by test scenarios
Percent of all-uses covered by test scenarios
Percent of all-du-paths covered by test scenarios
percentage of defects uncovered in testing: (number of
defects located by testing) / total number of system defects)
[PERRY]
execution time of test cases
(number of tests required) / (number of system errors)
[PERRY]
(number of defects uncovered) / (size of system)
[PERRY]
size metrics
The initial focus of the Software Producibility MODIL was software reuse. When software is considered for reuse, especially in the high- integrity applications, the quality of the software is of paramount importance. While organizations must consider many variables to decide if software is fit for reuse, one important variable is the quality of the software. The term "quality" has many different meanings and even the characteristics commonly used to define quality have different meanings. Yet, organizations must be able to assess the quality of existing software in terms of its completeness for a new use, its correctness, its maintainability and other characteristics that impact how much work will be necessary to adapt the existing software for another application. Organizations who are developing new software that is intended for reuse must also be able to assess whether the software meets reusability criteria.
This report on quality characteristics and metrics for reusable software is preliminary. It identifies a set of quality characteristics that are most commonly referenced in technical literature and standards; these are completeness, correctness, generality, understandability, efficiency, modularity, portability, reliability, adaptability, and maintainability.
The metrics listed in this report help to define the quality attributes for software products. This report does not address any of the process metrics that should also be considered in assessing reusability. The products associated with metrics in this report are requirements documentation, design documentation, code listings and test documentation. Because the focus on quality for software in the past has been on code, there are many more code metrics listed in this report.
Different metrics for each product may be used to assess the degree to which the product contains each quality characteristic. This report provides some explanation of the value of each metric for determining the reusability of the software. However, more research is needed to ensure the completeness of the quality characteristics and associated metrics.
This preliminary study alone does not provide sufficient measurement information for determining the reusability of software. For example, the value of each characteristic and each metric for each product should be correlated to a reusability index. The reusability index, which no one yet has defined, should also include process metrics (e.g., effort). Most of the metrics have been used for several years and are well-understood relative to structured design methods but most have not yet been applied to software developed with object-oriented technology. Another important topic to be considered for reusing software in high integrity applications is the knowledge about the software relative to its specific domain and application within that domain and the type of information that must be present for analysts to decide how the software must change to meet new requirements.
Additional research issues for measuring the reusability of software include, but are not limited to, the following: