------------------------------------------------------------------
Bogazici University
Master Thesis
Dennis Moskov
Ref. No. 10110944
Title: KNOWLEDGE EXTRACTION FROM PUBLISHED PAPERS IN LITERATURE FOR THE CATALYTIC METHANOL PRODUCTION FROM SYNTHESIS GAS USING DATA MINING TOOLS
Year: 2016

-------------------------------------------------------------------

Files in the disk:

10110944.pdf - Master Thesis
README.txt - This readme file
Methanol Database.xlsx - Methanol database

Source code folder (contains all source code files)

	Change classes.txt - Source code to change variable classes
	Clustering Hierarchical.txt - Source code for hierarchical clustering
	Clustering PAMK.txt - Source code for partitioning around medoids clustering
	CV by article.txt - Source code to seperate data by unique articles and make cross validation
	Dissimilarity Matrix Gower.txt - Source code to build a Gower dissimilarity matrix
	Dissimilarity Matrix Random Forest.txt - Source code to build a dissimilarity matrix with random forest
	randomForest find best m (regression).txt - Source code to find best number of random variables for random forest
	randomForest partial dependence.txt - Source code for partial dependence plots for random forest
	Standardize.txt - Source code to standardize variables
	Plit by criteria.txt - Source code for subsetting the database
	CT fitted.txt - Source code for prediction of training sets with classification tree
	CT predicted.txt - Source code for prediction of test sets with classification tree
	MLR fitted.txt - Source code for prediction of training sets with multiple linear regression
	MLR predicted.txt - Source code for prediction of test sets with multiple linear regression
	RF fitted.txt - Source code for prediction of training sets with random forest
	RF predicted.txt - Source code for prediction of test sets with random forest
	RT fitted.txt - Source code for prediction of training sets with regression tree
	RT predicted.txt - Source code for prediction of test sets with regression tree
	CT reduced fitted.txt - Source code for prediction of training sets with classification tree after variable selection
	CT reduced predicted.txt - Source code for prediction of test sets with classification tree after variable selection
	MLR reduced fitted.txt - Source code for prediction of training sets with multiple linear regression after variable selection
	MLR reduced predicted.txt - Source code for prediction of test sets with multiple linear regression after variable selection
	RF reduced fitted.txt - Source code for prediction of training sets with random forest after variable selection
	RF reduced predicted.txt - Source code for prediction of test sets with random forest after variable selection
	RT reduced fitted.txt - Source code for prediction of training sets with regression tree after variable selection
	RT reduced predicted.txt - Source code for prediction of test sets with regression tree after variable selection



Input folder (contains all input files)

	MeOH conti stand.csv - standardized continuous database
	MeOH conti.csv - continuous databse
	MeOH disc stand.csv - standardized discrete database
	MeOH disc.csv - discrete database
	clustered by hierarchical.csv - hierarchical clustered database
	clustered by PAMK.csv - partitioning around medoids clustered database



Output folder (contains output files of multiple linear regression as a sample output, other outputs are simmilar to these and can be computed with the source codes and input files)
 
 multiple linear regression folder

   fitted folder (outputs of training sets)

     complete folder (full databse)

	Conversion fittedVSobs full.png - Fitted vs. observed conversion plot
	Conversion p-values.csv - Variable p-values vor conversion
	Conversion regression_values.csv - Goodness of fit meassures for conversion
	Conversion results.csv - Residuals of responses for fitted conversion	
	Selectivity fittedVSobs full.png - Fitted vs. observed Selectivity plot
	Selectivity p-values.csv - Variable p-values vor Selectivity
	Selectivity regression_values.csv - Goodness of fit meassures for Selectivity
	Selectivity results.csv - Residuals of responses for fitted Selectivity
	Yield fittedVSobs full.png - Fitted vs. observed Yield plot
	Yield p-values.csv - Variable p-values vor Yield
	Yield regression_values.csv - Goodness of fit meassures for Yield
	Yield results.csv - Residuals of responses for fitted Yield

     reduced folder (database after variable selection)

	Conversion fittedVSobs full.png - Fitted vs. observed conversion plot
	Conversion regression_values.csv - Goodness of fit meassures for conversion
	Conversion results.csv - Residuals of responses for fitted conversion	
	Selectivity fittedVSobs full.png - Fitted vs. observed Selectivity plot
	Selectivity regression_values.csv - Goodness of fit meassures for Selectivity
	Selectivity results.csv - Residuals of responses for fitted Selectivity
	Yield fittedVSobs full.png - Fitted vs. observed Yield plot
	Yield regression_values.csv - Goodness of fit meassures for Yield
	Yield results.csv - Residuals of responses for fitted Yield

     Hierarchical folder (clustered database)

	Conversion fittedVSobs full.png - Fitted vs. observed conversion plot
	Conversion regression_values.csv - Goodness of fit meassures for conversion
	Conversion results.csv - Residuals of responses for fitted conversion	
	Selectivity fittedVSobs full.png - Fitted vs. observed Selectivity plot
	Selectivity regression_values.csv - Goodness of fit meassures for Selectivity
	Selectivity results.csv - Residuals of responses for fitted Selectivity
	Yield fittedVSobs full.png - Fitted vs. observed Yield plot
	Yield regression_values.csv - Goodness of fit meassures for Yield
	Yield results.csv - Residuals of responses for fitted Yield

     partitioning around medoids folder (clustered database)

	Conversion fittedVSobs full.png - Fitted vs. observed conversion plot
	Conversion regression_values.csv - Goodness of fit meassures for conversion
	Conversion results.csv - Residuals of responses for fitted conversion	
	Selectivity fittedVSobs full.png - Fitted vs. observed Selectivity plot
	Selectivity regression_values.csv - Goodness of fit meassures for Selectivity
	Selectivity results.csv - Residuals of responses for fitted Selectivity
	Yield fittedVSobs full.png - Fitted vs. observed Yield plot
	Yield regression_values.csv - Goodness of fit meassures for Yield
	Yield results.csv - Residuals of responses for fitted Yield

    predicted folder (outputs of test sets)

      complete folder (full databse)

	Conversion predVSobs full.png - predicted vs. observed conversion plot
	Conversion prediction_values.csv - prediction meassures for conversion
	Conversion results.csv - Residuals of responses for predicted conversion	
	Selectivity predVSobs full.png - predicted vs. observed Selectivity plot
	Selectivity prediction_values.csv - prediction meassures for Selectivity
	Selectivity results.csv - Residuals of responses for predicted Selectivity
	Yield predVSobs full.png - predicted vs. observed Yield plot
	Yield prediction_values.csv - prediction meassures for Yield
	Yield results.csv - Residuals of responses for predicted Yield

     reduced folder (database after variable selection)

	Conversion predVSobs full.png - predicted vs. observed conversion plot
	Conversion prediction_values.csv - prediction meassures for conversion
	Conversion results.csv - Residuals of responses for predicted conversion	
	Selectivity predVSobs full.png - predicted vs. observed Selectivity plot
	Selectivity prediction_values.csv - prediction meassures for Selectivity
	Selectivity results.csv - Residuals of responses for predicted Selectivity
	Yield predVSobs full.png - predicted vs. observed Yield plot
	Yield prediction_values.csv - prediction meassures for Yield
	Yield results.csv - Residuals of responses for predicted Yield

     Hierarchical folder (clustered database)

	Conversion predVSobs full.png - predicted vs. observed conversion plot
	Conversion prediction_values.csv - prediction meassures for conversion
	Conversion results.csv - Residuals of responses for predicted conversion	
	Selectivity predVSobs full.png - predicted vs. observed Selectivity plot
	Selectivity prediction_values.csv - prediction meassures for Selectivity
	Selectivity results.csv - Residuals of responses for predicted Selectivity
	Yield predVSobs full.png - predicted vs. observed Yield plot
	Yield prediction_values.csv - prediction meassures for Yield
	Yield results.csv - Residuals of responses for predicted Yield

     partitioning around medoids folder (clustered database)

	Conversion predVSobs full.png - predicted vs. observed conversion plot
	Conversion prediction_values.csv - prediction meassures for conversion
	Conversion results.csv - Residuals of responses for predicted conversion	
	Selectivity predVSobs full.png - predicted vs. observed Selectivity plot
	Selectivity prediction_values.csv - prediction meassures for Selectivity
	Selectivity results.csv - Residuals of responses for predicted Selectivity
	Yield predVSobs full.png - predicted vs. observed Yield plot
	Yield prediction_values.csv - prediction meassures for Yield
	Yield results.csv - Residuals of responses for predicted Yield

-----------------------------------------------------------------------------------------

Hardware requirements:

This thesis was conducted with following hardware:

	Processor:         Intel(R) Core(TM) i5-4570 CPU @ 3.20GHz  3.20 GHz, x64 based
	RAM: 	           8.00 GB
	Graphics card:     NVIDIA GeForce GT 630
	Interface devices: Mouse and keyboard
	
-------------------------------------------------------------------------------------------

Software requirements:

This thesis was conducted with following software:

	Operating system:       		Microsoft Windows 8.1 Pro, 64 bit
	Programming software:   		R x64 3.2.3
	Programming packages (R libraries): 	rpart, rattle, rpart.plot, randomForest, MASS, cluster, fpc
	Office software:        		Microsoft Office 2013 (Word, Excel)
	Other software:         		Engauge Digitizer 4.1
						WebPlotDigitizer 3.8

--------------------------------------------------------------------------------------------------








