First International Summer School on Search-Based Software Engineering

Javier Dolado

Data Analysis in Software Engineering with R

The purpose of the talk is to give a hands-on experience of data analysis procedures in software engineering. We will describe the basic steps followed when analysing data in the software engineering field. We will start by mentioning the sources of data and the preliminary activities of exploratory data analysis. Two public datasets will be used throughout the talk. The problems related the normality of the data will be commented. Several regression models will be built and, afterwards, they will be assessed using the validation approach. The usual measures for evaluating models in software engineering will be described jointly with the problems associated to them. Additionally, machine learning methods will be applied to the datasets. We will describe the non-parametric bootstrapping resampling method for dealing with non-normal observations. Finally, we will comment on the concepts of hypothesis testing. The slides of the talk will be complemented with running R code of the data analysis models and methods.

Outline:

Setting the environment
The purpose of analysing data in software engineering
Getting data
Exploratory Data Analysis
Model Building for Prediction
- Linear Regression
- Genetic Programming for Symbolic Regression
Model Evaluation
- Descriptive Statistics
- Standardised Accuracy and related measures
Confidence Intervals
- Bootstrap
Classical Hypothesis Testing

José Antonio Parejo

The EXEMPLAR and STATService Tools for Experimental Software Engineering

A proper analysis of the data and the replicability of the experiments are two key elements that largely determine the quality of any empirical study. Since most research SBSE is based on experimentation, these factors are of great importance in this area. Currently there are many tools for data visualization and analysis, from desktop tools such as SPSS or R to cloud platforms like Tableau. However these tools are general with a steep learning curve, and they rely on the shoulders of users the burden of properly implementing the methodologies without much help, guidance or aiding. This session will present two web tools: STATService and Exemplar.

The goal of STATService is to help users in the application of statistical tests of hypothesis. The objective of Exemplar provide a platform for publishing and tracking of materials of the experiments and to support generation of descriptions for such experiments with the details required for enabling the replication by other researchers.

Seminars

Javier Dolado

José Antonio Parejo