Supporting study selection of systematic literature reviews in software engineering with text mining

by Q (Qianhui) Zhong

Institution: University of Oulu
Year: 2017
Keywords: Information Processing Science
Posted: 02/01/2018
Record ID: 2154509
Full text PDF: http://urn.fi/URN:NBN:fi:oulu-201704121478


Abstract Since Systematic Literature Review (SLR) was introduced to Software Engineering (SE) field in 2003, it has gained plenty of attention and became a quite popular method to collect scientific evidence. However, SLR is a very time-consuming process, especially the study selection procedure. This study was concentrating on how to support study selection process. Design science research methodology was used. An SLR was conducted to reveal the existing evidence of supporting SLR process. The review results indicated that study selection supporting evidences can be classified into three categories: tools, information visualization techniques, and text mining techniques. However, in SE field all those existing methods or tools are still in early stage, either the effectiveness is not good or the proposed tools and techniques are too difficult to use. Thus, there is a big requirement of supporting SLR, especially study selection procedure in SE. Based on the SLR results, an iterative framework was proposed from a new perspective: refining the selection criteria to improve the selection results. There is research has indicated that inadequate SLR protocol can influence the SLR quality. Different from the previous research work, which mostly adopts automating or semi-automating selection process with various techniques, this new framework will be focused on refining the inclusion/exclusion criteria by extracting valuable terms from candidate papers. Text mining technique was chosen as the potential technique applied in the extracting process. To prove the effectiveness of the proposed framework, two SLR study selection experiments were conducted and the results were evaluated with precision, recall and F score. The first experiment used the existing SLR that was conducted in this thesis, while the second one used another existing large SLR. The results from both experiments indicates that the proposed framework performs better comparing to the traditional method, with higher F score which combines recall and precision. In conclusion, this thesis found a new effective method to support the SLR study selection process, which is aiming at refining the selection criteria before conducting reviewing work. Further researches can focus on improving the information extracting or combining the proposed iterative framework with other study selection support method.