TESTING
SOFTWARE QA
RESOURCES
There are fifteen identifiable components to system testing. These tests should be conducted as independently from the development group as is possible.

Notes taken from Glenford Myer's The Art of Software Testing.

System testing is both the most misunderstood and most difficult testing process. System testing is not a process of testing the functions of the complete system or program, because this would be redundant with the process of function testing. As shown in Figure 1, system testing has a particular purpose: to compare the system or program to its original objectives. Given this purpose, two implications are
System testing is not limited to "systems." If the product is a program, system testing is the process of attempting to demonstrate how the program does not meet its objectives.

System testing, by definition, is impossible if the project has not produced a written set of measurable objectives for its product.

In looking for discrepancies between the program and its objectives, much of the focus is on translation errors made during the process of designing the external specification. This makes the system test a vital test process, because in terms of the product of the number of errors made and the severity of those errors, this step in the development cycle is usually the most error-prone step. It also implies that, unlike the function test, the external specification cannot be used as the basis for deriving the system test cases, since this would subvert the purpose of the system test. On the other hand, the objectives document cannot be used, by itself, to formulate test cases, since it does not, by definition, contain precise descriptions of the program's external interfaces. This dilemma is solved by using the program's user documentation or publications. System test cases are designed by analyzing the objectives and then formulated by analyzing the user documentation. This has the useful side effect of not only comparing the program to its objectives, but also comparing the program to the user documentation and comparing the user documentation to the objectives, as shown in the figure:

Figure 1: The system test.

The reason that system testing is the most difficult testing process is that the leftmost arrow in Figure 1, comparing the program to its objectives, is the central purpose of the system test, but there are no known test-case-design methodologies. The reason for this is that objectives state what a program should do, and how well the program should do it, but do not state the representation of the program's functions. For instance, the objectives for a DISPLAY command might have read as follows:

A command will be provided to view, from a terminal, the contents of main-storage locations. Its syntax should be consistent with the syntax of all other system commands. The user should be able to specify a range of locations, both via an address range or an address and a count. Sensible defaults should be provided for command operands.

Output should be displayed as multiple lines of multiple words (in hexadecimal), with spacing between the words. Each line should contain the address of the first word of that line. The command is a "trivial" command, meaning that, under reasonable system loads, it should begin displaying output within two seconds, and there should be no observable delay between output lines. A programming error in the command processor should, at the worst, cause the command to fail; the sytem and the user's session must not be affected. The command processor should have no more than one user-detected error after the system is put into production.

Given this statement of objectives, there is no identifiable methodology that can be applied to it to yield a set of test cases, other than the vague, but useful, guideline of writing test cases to attempt to show that the program is inconsistent with each sentence in the objectives statement. Hence a different approach to test-case design is taken here; rather than describing a methodology, distinct categories of system test cases are discussed. Because of the absence of a methodology, system testing requires a substantial amount of creativity; in fact, the design of good system test cases requires more creativity, intelligence, and experience than that required to design the system or program.

The 15 categories of test cases are discussed below. It is not claimed that all 15 categories will be applicable to every program, but, to avoid overlooking something, all 15 categories should be explored when designing test cases.

Facility Testing

The most obvious type of system testing is the determination of whether each facility (or function, but the word "function" is not used here to avoid confusing this with function testing) mentioned in the objectives was actually implemented. The procedure is to scan the objectives sentence by sentence and, when the sentence specifies a what (e.g., "syntax should be consistent.. .," "user should be able to specify a range of locations. . ."), determine if the program satisfies the "what." This type of testing can often be performed without the use of a computer; a mental comparison of the objectives with the user documentation is sometimes sufficient.
Volume Testing

A second type of system testing is subjecting the program to heavy volumes of data. For instance, a compiler would be fed an absurdly large source program to compile. A linkage editor might be fed a program containing thousands of modules. An electronic-circuit simulator would be given a circuit containing thousands of components. An operating system's job queue would be filled to capacity. If a program is supposed to handle files spanning multiple volumes (e.g., tape reels), enough data are created to cause the program to switch from one volume to another. In other words, the purpose of volume testing is to show that the program cannot handle the volume of data specified in its objectives.

Since volume testing being obviously expensive, in terms of both machine and people time, one must not go overboard. However, every program must be exposed to at least a few volume tests.

Stress Testing

Stress testing involves subjecting the program to heavy loads or stresses. This should not be confused with volume testing; a heavy stress is a peak volume of data encountered over a short span of time. An analogy is an appraisal of a typist. A volume test is the determination of whether the typist can cope with a draft of a large report; a stress test is the determination of whether the typist can type at a rate of 50 words per minute.

Because stress testing involves an element of time, it is not applicable to many programs, for example, a compiler or a batch-processing payroll program. It is applicable, however, to programs that operate under varying loads, or interactive, real-time, and process-control programs. If an air-traffic-control system is supposed to keep track of up to 200 planes in its sector, it is stress tested by simulating the existence of 200 planes. Since there is nothing to physically keep a 201st plane from entering the sector, a further stress test would explore the system's reaction to this unexpected plane. An additional stress test might simulate the simultaneous entry of a large number of planes into the sector.

If an operating system is supposed to support a maximum of 15 multiprogrammed jobs, the system is stressed by attempting to run 15 jobs simultaneously. If a time-sharing system supports up to 64 terminals, subject the system to the extreme pressure of 64 terminal users trying to sign onto the system simultaneously. (This is not a "never will occur" situation; it occurs in real life when such a system crashes during operation and is immediately brought back on the air by its operator.) Stress a pilot-training aircraft simulator by determining the system's reaction to the trainee's forcing the rudder left, pulling back on the throttle, lowering the flaps, lifting the nose, lowering the landing gear, turning on the landing lights, and banking left, all at the same time. (Such a test case might require a four-handed pilot or, more realistically, two test specialists in the cockpit.) A process-control system might be stress tested by causing all of the monitored processes to generate signals simultaneously. A telephone-switching system is subjected to stress tests by routing to it a large number of simultaneous phone calls.

Although many stress tests do represent conditions that the program will likely experience during its operational use, other stress tests may truly represent "never will occur" situations, but this does not imply that these tests are not useful. If errors are detected by these "impossible" conditions, the test is valuable, because it is likely that the same errors might also occur in realistic, less-stressful, situations.

A checklist of QA Management stress testing tasks is useful.

Usability Testing

Another important category of system test cases is an attempt to find human-factor, or usability, problems. Unfortunately, since the computing industry has placed insufficient attention on studying and defining good human-factor considerations of programming systems, an analysis of human factors is still a highly subjective matter. The following is a list illustrating the kinds of considerations that might be tested.
  • Has each user interface been tailored to the intelligence, educational background, and environmental pressures of the end user?
  • Are the outputs of the program meaningful, nonabusive, devoid of "computer gibberish," and so on?
  • Are the error diagnostics (e.g., error messages) straightforward, or does one need need a Ph.D. in computer science to comprehend them? For instance, does the program produce such messages as "IEK022A OPEN ERROR ON FILE 'SYSIN' ABEND CODE= 102"?
  • Does the total set of user interfaces exhibit considerable "conceptual integrity", an underlying consistency and uniformity of syntax, conventions, semantics, format, style, and abbreviations?
  • Where accuracy is vital (e.g., in an online banking system), is sufficient redundancy present in the input (e.g., account number and customer name)?
  • Does the system contain an excessive number of options, or options that are unlikely to be used?
  • Does the system return some type of immediate acknowledgment to all inputs?
  • Is the program easy to use? For instance, does entering a command into a time-sharing system require repeated shifts between upper- and lower-case characters?
  • Security Testing

    Because of society's increasing concern about privacy, many programs have specific security objectives. Security testing is the process of attempting to devise test cases that subvert the program's security checks. For instance, one tries to formulate test cases that subvert an operating system's memory-protection mechanism. One tries to subvert a data-base-management system's data-security mechanisms. One way to devise such test cases is to study known security problems in similar systems and generate test cases that attempt to demonstrate similar problems in the system at hand. For instance, descriptions exist of known security holes in operating systems.
    Performance Testing

    Many programs have specific performance or efficiency objectives, stating such properties as response times and throughput rates under certain workload and configuration conditions. Again, since the purpose of a system test is to demonstrate that the program does not meet its objectives, test cases must be devised that attempt to show that the program does not satisfy its performance objectives.
    Storage Testing

    Similarly, programs occasionally have storage objectives, stating, for instance, the amounts of main and secondary storage used by the program and the sizes of required temporary or spill files. Test cases should be devised to show that these storage objectives have not been met.
    Configuration Testing

    Such programs as operating systems, data-base management systems, and message-switching programs support a variety of hardware configurations (e.g., types and number of I/O devices and communication lines, different memory sizes). Often the number of possible configurations is too large to attempt to test the program with each one, but, at the least, the program should be tested with each type of hardware device and with the minimum and maximum configuration. If the program itself can be configured (e.g., components of the program can be omitted or placed in separate processors), each possible configuration of the program should be tested.
    Compatibility/Conversion Testing

    Most programs that are developed are not completely new; they are often replacements for some deficient system, either a data-processing or manual system. As such, programs often have specific objectives concerning their compatibility with, and conversion procedures from, the existing system. Again, in testing the program to these objectives, the orientation of the test cases is to demonstrate that the compatibility objectives have not been met and that the conversion procedures do not work.
    Installability Testing

    Some types of software systems have complicated procedures for installing the system (e.g., the system generation, or "sysgen," process in IBM's operating systems). The testing of these installation procedures is part of the system-testing process.
    Reliability Testing

    Of course, the goal of all types of testing is the improvement of the eventual reliability of the program, but, if the program's objectives contain specific statements about reliability, specific reliability tests might be devised. Testing reliability objectives can be difficult. For instance, the Bell System's TSPS switching system has a down-time objective of 2 hours or less per 40 years of operation; there is no known way that one can test this objective given a test period of months or even a few years. However, if the program has mean-time-to-failure objectives (e.g., MTTF = 20 hours) or operational-error objectives (e.g., the program should experience no more than 12 unique errors after it is placed into production), there are a set of mathematical models that allow one to estimate the validity of such objectives.
    Recovery Testing

    Such programs as operating system, data-base management systems, and teleprocessing programs often have recovery objectives, stating how the system is to recover from programming errors, hardware failures, and data errors. One objective of the system test is to show that these recovery functions do not work correctly. Programming errors can be purposely injected into an operating system to determine if it can recover from them. Hardware failures (e.g., memory parity errors, I/O device errors) can be simulated. Data errors (e.g., noise on a communications line, an invalid pointer in a data base) can be purposely created or simulated to analyze the system's reaction.
    Serviceability Testing

    The program may also have objectives for its serviceability or maintainability characteristics. All objectives of this sort must be tested. Such objectives might define the service aids to be provided with the system (e.g., storage-dump programs, diagnostic programs), the mean time to debug an apparent problem, the maintenance procedures, and the quality of internal-logic documentation.
    Documentation Testing

    As was illustrated in Figure 1, the system test is also concerned with the accuracy of the user documentation. The principal way of accomplishing this is the use of the user documentation to determine the representation of the prior system test cases (e.g., once a particular stress test is devised, the user documentation is used as a guide for writing the actual test case). Also, the user documentation should be the subject of an inspection, checking it for accuracy and clarity. Any examples illustrated in the documentation should be encoded into test cases and fed to the program.
    Procedure Testing

    Finally, many programs are parts of larger, not completely automated, systems involving procedures performed by people. Any prescribed human procedures, such as procedures to be followed by the system operator, data-base administrator, or terminal user, should be tested during the system test.
    Performing the System Test

    One of the most vital considerations in implementing the system test is the determination of who should do it. To answer this in a negative way, (1) a system test should not be performed by programmers; and (2) of all the testing phases, this is the one that should definitely not be performed by the organization responsible for developing the program.

    The first point stems from the fact that a person performing a system test must be capable of thinking like an end user of the program, which implies a thorough understanding of the attitudes and environment of the end user and of how the program will be used. Obviously then, if feasible, a good candidate is one or more end users. However, because the typical end user will not have the ability or expertise to perform many of the categories of tests described earlier, an ideal system-test team might be composed of a few professional system-test experts (people who spend their lives performing system tests), a representative end user or two, a human-factors engineer, and the key original analysts or designers of the program. Including the original designers does not violate the earlier principle recommending against one testing one's own program, since the program has probably passed through many hands since they conceived it. Hence, the original designers do not have the troublesome psychological ties to the program that motivated this principle.

    The second point stems from the fact that a system test is an "anything goes, no holds barred" activity. Again, the development organization has psychological ties to the program that are counter to this type of activity. Also, most development organizations are most interested in having the system test proceed as "smoothly" as possible and on schedule, and are not truly motivated to demonstrate that the program does not meet its objectives. At the least, the system test should be performed by an independent group of people, with few, if any, organizational ties to the development organization. Perhaps the most economical way of conducting a system test (economical in terms of finding the most errors with a given amount of money, or spending less money to find the same number of errors) is to subcontract it to a separate company.