Classifying generated white-box tests: an exploratory study
Title | Classifying generated white-box tests: an exploratory study |
Publication Type | Journal Article |
Year of Publication | 2019 |
Authors | Honfi, D., and Micskei, Z. |
Journal | Software Quality Journal |
Start Page | 1 |
Pagination | 42 |
Keywords | Software testing. White-box test generation. Empirical study, Test classification |
Abstract | White-box test generation analyzes the code of the system under test, selects relevant test inputs, and captures the observed behavior of the system as expected values in the tests. However, if there is a fault in the implementation, this fault could get encoded in the assertions (expectations) of the tests. The fault is only recognized if the developer, who is using test generation, is also aware of the real expected behavior. Otherwise, the fault remains silent both in the test and in the implementation. A common assumption is that developers using white-box test generation techniques need to inspect the generated tests and their assertions, and to validate whether the tests encode any fault or represent the real expected behavior. Our goal is to provide insights about how well developers perform in this classification task. We designed an exploratory study to investigate the performance of developers. We also conducted an internal replication to increase the validity of the results. The two studies were carried out in a laboratory setting with 106 graduate students altogether. The tests were generated in four open-source projects. The results were analyzed quantitatively (binary classification metrics and timing measurements) and qualitatively (by observing and coding the activities of participants from screen captures and detailed logs). The results showed that participants tend to incorrectly classify tests encoding both expected and faulty behavior (with median misclassification rate 20%). The time required to classify one test varied broadly with an average of 2 min. This classification task is an essential step in white-box test generation that notably affects the real fault detection capability of such tools. We recommended a conceptual framework to describe the classification task and suggested taking this problem into account when using or evaluating white-box test generators. |
DOI | 10.1007/s11219-019-09446-5 |
Refereed Designation | Refereed |