Machine learning analysis of photocatalytic CO2 reduction on perovskite materials
Loading...
Date
2023
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023.
Abstract
The purpose of this study is to construct a database from the experimental studies about CO2 reduction on perovskite materials from published articles, then to extract information from this dataset to predict CO2 production yields and the bandgap of the perovskites by using machine learning methods such as decision tree (DT), random forest (RF), gradient boosting (XGBoost), association rule mining (ARM), and linear regression (LR). By using Web of Science, relevant articles were examined, and 61 articles were selected for data extraction; 309 samples with 29 features (14 numerical and 15 categorical) were collected; these features included properties of perovskites such as bandgap, elemental information, and conditions of the experiments such as reaction temperature, phase of reaction collected as the features. Before the machine learning applications, pre-processing steps were applied to the dataset for cleaning and organizing. For the missing bandgap values, linear regression was applied for prediction from the available data. The biased and the highly absent features were eliminated while the missing values of others were filled with the mod or mean of the dataset. The ML methods were applied using two separate databases which were for gas and liquid phase reactions. 133 out of 309 samples with 30 features were used for gas phase dataset while the remaining 176 samples with 29 features were for liquid phase. 17 missing band gap values were predicted using linear regression with the R-square and RMSE were found as 0.75 and 0.36 respectively for validation set. With DT, the accuracy for test set was obtained 0.76 for gas phase and 0.84 for liquid phase. In the RF predictions, R- square and RMSE were found to be 0.64 and 24.5, respectively for test set in gas phase while they were 0.49 and 221.0 in liquid phase. Bandgap was the most important feature for gas phase while the most important feature for the liquid phase was found to be the cocatalyst. Finally, in the XGBoost, R-square and RMSE for test set in gas phase were 0.65 and 14.75, respectively and for liquid phases, they were 0.79 and 145.6.