Transfer learning for continuous control

dc.contributorGraduate Program in Software Engineering.
dc.contributor.advisorAkın, H. Levent.
dc.contributor.authorAda, Suzan Ece.
dc.date.accessioned2023-03-16T13:45:29Z
dc.date.available2023-03-16T13:45:29Z
dc.date.issued2019.
dc.description.abstractAgentstrainedwithdeepreinforcementlearningalgorithmsarecapableofperforming highly complex tasks including locomotion in continuous environments. In order to attain a human-level performance, the next step of research should be to investigate the ability to transfer the learning acquired in one task to unknown tasks. Concerns on generalization and overfitting in deep reinforcement learning are not usually addressed in current transfer learning research. This issue results in simplistic benchmarks and inaccurate algorithm comparisons due to rudimentary assessments. In this thesis, we propose novel regularization techniques exclusive to policy gradient algorithms for continuous control through the application of sample elimination and early stopping. By discarding samples that lead to overfitting via strict clipping we will generate robust policies for a humanoid with high generalization capacity. We also suggest the inclusion of training iteration to the hyperparameters in deep transfer learning problems. We recommend resorting to earlier snapshots of parameters depending on the target task due to the occurrence of overfitting to the source task. We demonstrate that a humanoid is capable of performing forward locomotion in unseen environments with different gravities and tangential frictions using strict clipping and early stopping. Furthermore, we evaluate our propositions on a delivery task where a humanoid is required to carry a heavy box while walking and inter-robot transfer tasks where the humanoid transfers its learning to taller and shorter robots. Because source task performance is not indicative of the generalization capacity of the algorithm we propose three different transfer learning evaluation methods. We increase the generalization capacity of a state-of-art adversarial algorithm by introducing entropy bonus, proposing different critic architectures and using simpler adversaries. Finally, we evaluate the robustness of these adversarial algorithms on morphologically modified hopper environments and environments with unknown gravities according to the criteria we proposed.
dc.format.extent30 cm.
dc.format.pagesxvii, 111 leaves ;
dc.identifier.otherSWE 2019 A43
dc.identifier.urihttps://digitalarchive.library.bogazici.edu.tr/handle/123456789/19511
dc.publisherThesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019.
dc.subject.lcshControl theory.
dc.subject.lcshTransfer of training.
dc.titleTransfer learning for continuous control

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
b2034987.034198.001.PDF
Size:
15.69 MB
Format:
Adobe Portable Document Format

Collections