Machine learning analysis of data collected from published literature on photocatalytic reforming of glycerol

Loading...
Thumbnail Image

Date

2023

Journal Title

Journal ISSN

Volume Title

Publisher

Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023.

Abstract

In this thesis, the aim is to extract knowledge from the data that was collected from published literature about photocatalytic reforming of glycerol. 791 data points were collected from 93 articles. This data was cleaned, organized, and prepared for the machine learning methods. Random forest and ANN (Artificial Neural Network) were used as machine learning techniques. By using them, the models for band gap and hydrogen production rates were constructed. Cross validation was applied to all models to prevent overfitting. For hydrogen production rate model, the missing values for band gap were filled with the predicted values of ANN of band gap. In random forest, feature importance was determined and the variables with the highest effect on the result were found. For band gap, the most important variables were weight percent of cocatalyst, percent of semiconductor and calcination temperature and duration. For hydrogen production rate, the most significant variables were photocatalyst load, band gap, glycerol concentration, weight percent of cocatalyst and pH. In random forest, the best model was determined by changing test/train split and k values in k-fold cross validation for various tree number and number of samples in a leaf node. For band gap model, 0.25 test/train split and 4-fold with 41 trees and 1 sample was the best model with RMSE (Root Mean Square Error) of 0.234 and R-squared of 0.73. For hydrogen production rate model, 0.25 test/train split and 5-fold with 81 trees and 2 samples was the best model with RMSE of 1.09 x 104 and R- squared value of 0.71. For ANN, test/train split ratio, k value for k-fold cross validation, the number of neurons and activation function were changed to find the best model. For band gap, 52 neurons and ReLU function gave the best model with RMSE of 0.282 and R-squared value of 0.70 with 0.3 test/train split and 4-fold cross validation. For hydrogen production rate model, 0.25 test/train split ratio, 7-fold cross validation, 63 neurons and ReLU function gave the best model with RMSE of 1.47 x 104 and R-squared value of 0.60.

Description

Keywords

Citation

Collections