1) To prepare Train/Test sets for defect prediction. Do the following:

   (i)   Run prepareInputForDefectPrediction.m
   (ii)  Run mergeTrainOrTestSets.m
   (iii) Run shuffleData.m


   1.1) Settings for each dataset are as follows:

	1.1.1) Dataset: GSM Company (first dataset)
               
	       1.1.1.1) In file prepareInputForDefectPrediction.m, the input parameters for the GSM Company must be set as follows:
	       
	       		-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
			companyName = 'Turkcell';
	       	
			department = 'CSI';
		
			project = 'CSI_Project';

			% For variables "inputFolder" and "outputFolder", you can replace 'D:\' with the location where you extract the Data folder		
			inputFolder =  strcat('D:\Data\',companyName, '\', department, '\', project,'\input_files\');		
			outputFolder = strcat('D:\Data\',companyName, '\', department, '\', project, '\output_files\');
		
			% For variable "outputDataFolder", you can replace 'D:\' with the targer location where you would like to place Train/Test Sets 
			outputDataFolder = strcat('D:\', companyName,'\', department,'\', project);

			versions = struct('versionNo','2.59','devStartDate', '08.06.2009', 'devEndDate', '18.06.2009', 'codeFreezeStartDate','19.06.2009','productionReleaseDate', '02.07.2009');
	
			versionNumbers   = {'2.59', '2.60', '2.61', '2.62'};
	
			devStartDates = {'08.06.2009', '19.06.2009', '03.07.2009', '17.07.2009'};

			devEndDates   = {'18.06.2009', '02.07.2009', '16.07.2009', '30.07.2009'};

			codeFreezeStartDates  = {'19.06.2009', '03.07.2009', '17.07.2009', '31.07.2009'};

			productionReleaseDates    = {'02.07.2009', '16.07.2009', '28.07.2009','13.07.2009'};

			numDevelopers = 10;  		% Total number of developers with known confirmation bias metrics
		
			numContConfBiasMetrics = 59; 	% Total number of confirmation bias metrics which take continuous values

	 		numCategConfBiasMetrics = 17;	% Total number of confirmation bias metrics which take categorical values

			numStaticCodeMetrics = 20;   	% Total number of static code metrics 

			metricTypes = {'StaticCode_Metrics', 'ConfBias_Metrics', 'Churn_Metrics', 'StaticCode_and_ConfBias_Metrics', 'StaticCode_and_Churn_Metrics', 'ConfBias_and_Churn_Metrics', 'StaticCode_ConfBias_and_Churn'};

			fileHeaderMode = 'Off'; % in order to merge all trainsets from all versions, file header mode must be 'Off'

			preprocessingTypes = {'NoPreprocessing', 'LogFilter'};

			INPUT_FILENAME_ChurnData = strcat(inputFolder,'ChurnData.csv');

			INPUT_FILENAME_DeveloperVsConfBiasMetrics_continuous = strcat(inputFolder,'ConfBias_Metrics_Continuous.xls');

			INPUT_FILENAME_DeveloperVsConfBiasMetrics_categorical = strcat(inputFolder,'ConfBias_Metrics_Categorical.xls');

			INPUT_FILENAME_AttributeHeaders = strcat(inputFolder,'File_Headers.xls');

			OUTPUT_FILENAME_fileCommitHistory = strcat(outputFolder,'File_Commit_History.xls');



        	        -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


	     1.1.1.2) In file mergeTrainOrTestSets.m, the input parameters for the GSM Company must be set as follows:
	        	
			-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
			company = 'Turkcell';
			
			department = 'CSI';
			
			project = 'CSI_Project';

			preProcessingTypes = {'NoPreprocessing', 'LogFiltered'};
			
			metricTypes = {'StaticCode_Metrics', 'ConfBias_Metrics', 'Churn_Metrics', 'StaticCode_and_ConfBias_Metrics', 'StaticCode_and_Churn_Metrics', 'ConfBias_and_Churn_Metrics', 'StaticCode_ConfBias_and_Churn'};
			
			versionNumbers   = {'2.59','2.60','2.61','2.62'};

			numVersions = size(versionNumbers,2);

			numMetrics = [20, 134, 9, 153, 28, 143, 162];

			rootFolderName = strcat('D:\',company, '\', department, '\', project, '\Train_Test_Set\');

                        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 
		
	    1.1.1.3) In file shuffleData.m, the input parameters for the GSM Company must be set as follows:

			-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------			
			companyName = 'Turkcell';

			department = 'CSI';

			project ='CSI_Project';

			versionSet = '2.59-2.62';
			-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



	1.1.1) Dataset: ISV Company (second dataset)

	      1.1.2.1) In file prepareInputForDefectPrediction.m, the input parameters for the ISV (Independent Software Vendor) must be set as follows:
                        
                        -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
			companyName = 'Logo';
	
			department = 'ERP';

			project = 'ERP_Project';

			% For variables "inputFolder" and "outputFolder", you can replace 'D:\' with the location where you extract the Data folder
			inputFolder =  strcat('D:\Data\',companyName, '\', department, '\', project,'\input_files\');
			outputFolder = strcat('D:\Data\',companyName, '\', department, '\', project, '\output_files\');


			% For variable "outputDataFolder", you can replace 'D:\' with the targer location where you would like to place Train/Test Sets 
			outputDataFolder = strcat('D:\', companyName,'\', department,'\', project);

			versions = struct('versionNo','v1','devStartDate', '09.09.2007', 'devEndDate', '28.02.2011', 'codeFreezeStartDate','29.02.2011','productionReleaseDate', '29.03.2011');

			versionNumbers   = {'v1'};

                        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------

	     1.1.2.2) Since there is only one version for the ERP software of ISV, we do not execute mergeTrainOrTestSets.m.


  	     1.1.2.3) In file shuffleData.m, the input parameters for the ISV must be set as follows:

			-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------			
			companyName = 'Logo';

			department = 'ERP';

			project ='ERP_Project';

			versionSet = 'v1';
			-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------



2. Settings for source code defectPrediction.m are as follows:

   2.1) Dataset: GSM Company (first dataset)	

	---------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	companyName = 'Turkcell';

	department = 'CSI';

	project = 'CSI_Project';

	versionSet = '2.59-2.62';

	numBins = 10; % total number of folds used in cross validation

	numShuffle = 10; % total number of shuffles for numBins-fold cross validation, where numBins = 10

	numSampleShuffle = 10; % total number of shuffles for undersampling

	samplingOption = 'nosampling';       

	featureWeightOption = 'none'; % featureWeightOptions are 'InfoGain', 'none', or 'GroupDefectInfo'

	inputFolder = strcat('D:\MATLAB_CODES\churn_confBias_data\',companyName, '\', department, '\', project, '\inputFiles\');

	preProcessingTypes = {'NoPreprocessing', 'LogFiltered'};  % preProcessingTypes are 'NoPreprocessing', 'LogFiltered', 'Standardized'

	metricTypes = {'StaticCode_Metrics', 'ConfBias_Metrics', 'Churn_Metrics', 'StaticCode_and_ConfBias_Metrics','ConfBias_and_Churn_Metrics', 'StaticCode_ConfBias_and_Churn'};

	numMetrics =  [19,40, 9, 59, 28, 49, 68]; % Exclude cyclomatic density from static code metrics

       ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


   2.2) Dataset: ISV Company (second dataset)	

	-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	companyName = 'Logo';

	department = 'ERP';

	project = 'ERP_Project';

	versionSet = 'v1';

	numBins = 10; % total number of folds used in cross validation

	numShuffle = 10; % total number of shuffles for numBins-fold cross validation, where numBins = 10

	numSampleShuffle = 10; % total number of shuffles for undersampling

	samplingOption = 'nosampling';       

	featureWeightOption = 'none'; % featureWeightOptions are 'InfoGain', 'none', or 'GroupDefectInfo'

	inputFolder = strcat('D:\MATLAB_CODES\churn_confBias_data\',companyName, '\', department, '\', project, '\inputFiles\');

	preProcessingTypes = {'NoPreprocessing', 'LogFiltered'};  % preProcessingTypes are 'NoPreprocessing', 'LogFiltered', 'Standardized'

	metricTypes = {'StaticCode_Metrics', 'ConfBias_Metrics', 'Churn_Metrics', 'StaticCode_and_ConfBias_Metrics','ConfBias_and_Churn_Metrics', 'StaticCode_ConfBias_and_Churn'};

	numMetrics =  [19,40, 9, 59, 28, 49, 68]; % Exclude cyclomatic density from static code metrics

        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


3. Settings for source code createTrainSetWithMissingData.m are as follows:			 

   3.1) Dataset: GSM Company (first dataset)			


        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	versionSet = '2.59-2.62';
	
	company = 'Turkcell';

	preProcessingType = 'Standardized';
	
	iterationNo = 1;  % This code must be executed for each iteration ranging from 1 to 10

	folderName = strcat('D:\', company, '\Train_Test_Set\',preProcessingType,'\', versionSet,'\Iteration', num2str(iterationNo), '\');

	fileName = strcat('TrainSet_ConfBias_Metrics_', preProcessingType,'_',versionSet,'.csv');
	
	inputFileName = strcat(folderName, fileName);

	numDevelopers = 10;

	numMetrics = 84; 

	format = createFormat(numMetrics+1);

	srcFolder = 'D:\MATLAB_CODES\churn_confBias_data\';

	missingSrcFolder = strcat('D:\', company, '\Train_Test_Set_MissingData\');
        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


   3.2) Dataset:  ISV (second dataset)			

        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	versionSet = 'v1';
	
	company = 'Logo';

	preProcessingType = 'Standardized';
	
	iterationNo = 1;  % This code must be executed for each iteration ranging from 1 to 10

	folderName = strcat('D:\', company, '\Train_Test_Set\',preProcessingType,'\', versionSet,'\Iteration', num2str(iterationNo), '\');

	fileName = strcat('TrainSet_ConfBias_Metrics_', preProcessingType,'_',versionSet,'.csv');

	inputFileName = strcat(folderName, fileName);

	numDevelopers = 6;

	numMetrics = 84; 

	format = createFormat(numMetrics+1);

	srcFolder = 'D:\MATLAB_CODES\churn_confBias_data\';

	missingSrcFolder = strcat('D:\', company, '\Train_Test_Set_MissingData\');
        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------


4. Settings for source code completeMissingData.m are as follows:			 

   3.1) Dataset: GSM Company (first dataset)			

        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	numDevelopers = 10;

	company = 'Turkcell';
        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	

   3.2) Dataset: ISV (second dataset)			

        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	
	numDevelopers = 6;

	company = 'Logo';
        ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------	