Ph.D. Theses
Permanent URI for this collection
Browse
Browsing Ph.D. Theses by Author "Akarun, Lale."
Now showing 1 - 10 of 10
Results Per Page
Sort Options
Item Bayesian source modelling for single-channel audio separation(Thesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009., 2009.) Dikmen, Onur.; Akarun, Lale.In many audio processing tasks, such as source separation, denoising or com- pression, it is crucial to construct realistic and flexible models to capture the physical properties of audio signals. This can be accomplished in the Bayesian framework through the use of appropriate prior distributions. In this thesis, we describe two prior models, Gamma Markov chains (GMCs) and Gamma Markov random fields (GMRFs) to model the sparsity and the local dependency of the energies of time-frequency expan- sion coefficients. We build two audio models where the variances of source coefficients are modelled with GMCs and GMRFs, and the source coefficients are Gaussian con- ditioned on the variances. The application area of these models are not limited to variance modelling of audio sources. They can be used in other problems where there is dependency between variables, such as the Poisson observation models. In single- channel source separation using non-negative matrix factorisation (NMF), we make use of GMCs to model the dependencies in frequency templates and excitation vectors. A GMC model defines a prior distribution for the variance variables such that they are correlated along the time or frequency axis, while a GMRF model describes a non-normalised joint distribution in which each variance variable is dependent on all the adjoining variance variables. In our audio models, the actual source coefficients are independent conditional on the variances and distributed as zero-mean Gaussians. Our construction ensures a positive coupling between the variance variables, so that signal energy changes smoothly over both axes to capture the temporal and/or spectral continuity. The coupling strength is controlled by a set of hyperparameters. Inference on the overall model, i.e., GMC or GMRF coupled with a Gaussian or Poisson observation model, is convenient because of the conditional conjugacy of all of the variables in the model, but automatic optimisation of hyperparameters is crucial to obtain better fits. In GMCs, hyperparameter optimisation can be carried out using the Expectation-Maximisation (EM) algorithm, with the E-step approximated with the posterior distribution estimated by the inference algorithm. In this optimisation, it is important for the inference algorithm to estimate the covariances between the variables inferred, because the hyperparameters depend on them. The marginal likelihood of the GMRF model is not available because of the in- tractable normalising constant. Thus, the hyperparameters of a GMRF cannot be optimised using maximum likelihood estimation. There are methods to estimate the optimal hyperparameters in these cases, such as pseudolikelihood, contrastive diver- gence and score matching. However, only contrastive divergence is readily applicable to models with latent variables. We optimised the hyperparameters of our GMRF- based audio model using contrastive divergence. We tested our audio models that are based on GMC and GMRF models in denois- ing and single-channel source separation problems where all the hyperparameters are jointly estimated given only audio data. Both models provided promising results, but the reconstructed signals by the GMRF model were slightly better and more natural sounding. Our third model makes use of Gamma and GMC prior distributions in an NMF setting for single-channel source separation. The hyperparameters are again optimised during the inference phase and the model needs almost no other design decisions. This model performs substantially better than the previous two models. In addition, it is less demanding in terms of computational power. However, it is designed only for source separation, i.e., it is not a general audio model as the previous two models.Item Biologically motivated 3D face recognition(Thesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2007., 2007.) Salah, Albert Ali.; Akarun, Lale.Face recognition has been an active area of study for both computer vision and image processing communities, not only for biometrics but also for human-computer interaction applications. The purpose of the present work is to evaluate the existing 3D face recognition techniques and seek biologically motivated methods to improve them. We especially look at findings in psychophysics and cognitive science for insights. We propose a biologically motivated computational model, and focus on the earlier stages of the model, whose performance is critical for the later stages. Our emphasis is on automatic localization of facial features. We first propose a strong unsupervised learning algorithm for flexible and automatic training of Gaussian mixture models and use it in a novel feature-based algorithm for facial fiducial point localization. We also propose a novel structural correction algorithm to evaluate the quality of landmarking and to localize fiducial points under adverse conditions. We test the effects of automatic landmarking under rigid and non-rigid registration methods. For the rigid registration approach, we implement the iterative closest point method (ICP). The most important drawback of ICP is the computational cost of registering a test scan to each scan in the gallery. By using an average face model in rigid registration, we show that the computation bottleneck can be eliminated. Following psychophysical arguments on the “other race effect”, we reason that organizing faces into different gender and morphological groups will help us in designing more discriminative classifiers. We test this claim by employing different average face models for dense registration. We propose a shape-based clustering approach that assigns faces into groups with nondescript gender and race. Finally, we propose a regular re-sampling step that increases the speed and the accuracy significantly. These components make up a full 3D face recognition system.Item Crowd - labelling for continuosun - valued annotations(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2018., 2018.) Kara, Yunus Emre.; Akarun, Lale.As machine learning gained immense popularity across a wide variety of domains in the last decade, it has become more important than ever to have fast and inexpensive ways to annotate vast amounts of data. With the emergence of crowdsourcing services, the research direction has gravitated toward putting ‘the wisdom of crowds’ to use. We call the process of crowdsourcing based label collection crowd-labeling. In this thesis, we focus on crowd consensus estimation of continuous-valued labels. Unfortunately, spammers and inattentive annotators pose a threat to the quality and trustworthiness of the consensus. Thus, we develop Bayesian models taking different annotator behaviors into account and introduce two crowd-labeled datasets for evaluating our models. High quality consensus estimation requires a meticulous choice of the candidate annotator and the sample in need of a new annotation. Due to time and budget limitations, it is beneficial to make this choice while collecting the annotations. To this end, we propose an active crowd-labeling approach for actively estimating consensus from continuous-valued crowd annotations. Our method is based on annotator models with unknown parameters, and Bayesian inference is employed to reach a consensus in the form of ordinal, binary, or continuous values. We introduce ranking functions for choosing the candidate annotator and sample pair for requesting an annotation. In addition, we propose a penalizing method for preventing annotator domination, investigate the explore-exploit trade-off for incorporating new annotators into the system, and study the effects of inducing a stopping criterion based on consensus quality. Experimental results on the benchmark datasets suggest that our method provides a budget and time-sensitive solution to the crowd-labeling problem. Finally, we introduce a multivariate model incorporating cross attribute correlations in multivariate annotations and present preliminary observations.Item Generative vs. discriminative models for vision based hand gesture recognition(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2017., 2017.) Keskin, Cem.; Akarun, Lale.In this thesis, we focus on the problem of modelling sequential data, and particularly hand gestures. We approach the modelling problem using automata theory and theory of formal languages, which allows us to determine the crucial aspects of hand gestures. Furthermore, we show how this approach can help us assess the capabilities of candidate models. The resulting framework can identify problems of models, and set requirements for models to properly represent the gestures. We use this approach to examine common graphical models such as hidden Markov models (HMM), input-output HMMs, explicit duration models, hidden conditional random elds, and hidden semi Markov models (HSMM). We also devise an e cient variant of HSMMs that conforms to all of the requirements set by our previous analysis. We further show that mixtures of left-right models is the most suitable setting for gestures. Finally, we compare all the mentioned models and report the results. In the second part of the thesis, we focus on modelling hand shape with randomized decision forests (RDF). In particular, we extend a known body pose estimation method to hand pose, and then introduce a novel RDF that directly estimates the hand shape. Furthermore, we propose a multi-layered expert network consisting of RDFs that either considerably increases the accuracy, or reduces memory requirements without sacri cing accuracy.Item Person detection and tracking using omnidirectional cameras, and rectangle blanket problem(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Demiröz, Barış Evrim.; Akarun, Lale.; Salah, Albert Ali.Person detection and tracking can provide the crucial analysis needed to avoid accidents with autonomous machinery, optimize environments for effciency and assist the elderly. Omnidirectional cameras have a large field of view that allow them to cover more ground at the expense of resolution. Omnidirectional cameras can decrease setup, maintenance and computational costs by reducing the number of cameras and the bandwidth required. Computer vision methods developed for conventional cameras usually fail for omnidirectional cameras due to their di erent image formation geometry. In this thesis, rst, a novel dataset for person tracking in omnidirectional cameras is introduced. The dataset, namely BOMNI, contains 46 videos of persons moving inside a room; where the bounding boxes and the identity of the persons are annotated at every frame. Second, a generative Bayesian framework is developed for coupling person tracking and fall detection. The method is evaluated on BOMNI dataset, producing 93% tracking accuracy and fall detection within a few frames of the event. Third, a similar method for multiple person tracking is developed and evaluated on BOMNI dataset. The method reaches 86% tracking accuracy, increasing a previous approach by 18%. Fourth, a discriminative method for person detection is presented. Also a novel structure called Radial Integral Image that speeds up feature extraction step is introduced. This method achieves state of the art detection performance on IYTE dataset: 4.5% miss rate for one false positive per image. Finally, the problem of representing a shape with multiple rectangles, Rectangle Blanket Problem, is formulated as an integer programming problem and a branch-and-bound scheme is presented along with a novel branching rule to solve it optimally. This problem is encountered in the earlier sections of this thesis, but it is a general problem that is present in the literature.Item Real-time human hand pose estimation and tracking using depth sensors(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2013., 2013.) Kıraç, Mustafa Furkan.; Akarun, Lale.The human hand has become an important interaction tool in computer systems. Using the articulated hand skeleton for interaction was a challenge until the development of input devices and fast computers. In this thesis, we develop model-based super real-time methods for articulated human hand pose estimation using depth sensors. We use Randomized Decision Forest (RDF) based methods for feature extraction and inference from single depth image. We start by implementing shape recognition using RDFs. We extend the shape recognition by considering a multitude of shapes in a single image representing di erent hand regions centered around di erent joints of the hand. The regions are utilized for joint position estimation by running mean shift mode nding algorithm (RDF-C). We combine shape recognition and joint estimation methods in a hybrid structure for boosting the quality. RDFs, when used for pixel classi cation are not resistant to self-occlusion. We overcome this by skipping the classi cation, and directly inferring the joint positions using regression forests. These methods assume joints are independent, which is not realistic. Therefore, we conclude our single image based framework by considering the geometry constraints of the model (RDF-R+). The accuracies at 10 mm acceptance threshold are acquired for synthetic and real datasets. Comparing RDF-C and RDF-R+ methods respectively, we report signi cant accuracy increase. We nally extend single image methods to tracking dynamic gestures. We learn the grasping motion from synthetic data by extracting a manifold, and x RDF estimations by projecting them onto the manifold. We then track the projections by using a Kalman Filter.Item Three dimensional face recognition(Thesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2006., 2006.) Gökberk, Berk.; Akarun, Lale.In this thesis, we attack the problem of identifying humans from their three dimensional facial characteristics. For this purpose, a complete 3D face recognition system is developed. We divide the whole system into sub-processes. These sub-processes can be categorized as follows: 1) registration, 2) representation of faces, 3) extraction of discriminative features, and 4) fusion of matchers. For each module, we evaluate the state-of-the art methods, and also propose novel ones. For the registration task, we propose to use a generic face model which speeds up the correspondence establishment process. We compare the benefits of rigid and non-rigid registration schemes using a generic face model. In terms of face representation schemes, we implement a diverse range of approaches such as point clouds, curvature-based descriptors, and range images. In relation to these, various feature extraction methods are used to determine the discriminative facial features. We also propose to use local region-based representation schemes which may be advantageous in terms of both dimensionality reduction and for determining invariant regions under several facial variations. Finally, with the realization of diverse 3D face experts, we perform an in-depth analysis of decision-level fusion algorithms. In addition to the evaluation of baseline fusion methods, we propose to use two novel fusion schemes where the first one employs a confidence-aided combination approach, and the second one implements a two-level serial integration method. Recog- nition simulations performed on the 3DRMA and the FRGC databases show that: 1) generic face template-based rigid registration of faces is better than the non-rigid variant, 2) principal curvature directions and surface normals have better discriminative power, 3) representing faces using local patch descriptors can both reduce the feature dimensionality and improve the identification rate, and 4) confidence-assisted fusion rules and serial two-stage fusion schemes have a potential to improve the accuracy when compared to other decision-level fusion rules.Item Three dimensional face recognition under occlusion variance(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2013., 2013.) Alyüz, Neşe.; Akarun, Lale.With advances in sensor technology, three dimensional (3D) face has become an emerging biometric modality, preferred especially in high security applications. However, dealing with occlusions covering the facial surface is a great challenge. In this thesis, we propose a fully automatic 3D face recognition system, attacking three sequential problems: (i) Registration of occluded surfaces, (ii) detection of occluded regions, and (iii) classification of occlusion-removed faces. For the alignment problem, we propose an adaptively selected model based registration scheme, where a model is selected for an occluded face such that only the valid non-occluded patches are utilized in correspondence establishment. After registration, occlusions are detected, where we propose two different occlusion detection approaches. In the first detector, fitness to a pixelwise statistical model of the facial surface is used. In the second approach, in addition to the facial model, neighborhood information is incorporated. For occlusion handling, two different strategies are evaluated: (i) Removal of occlusions, and (ii) restoration of missing parts. In the classification stage, a masking strategy, which we call masked projection, is proposed to enable the use of subspace analysis techniques with incomplete data. Experimental results on two databases with realistic facial occlusions, namely, the Bosphorus and the UMB-DB, confirm that: (i) The proposed registration technique based on the adaptively selected model is a good alternative to obtain occlusion robustness; (ii) in occlusion detection, use of a statistical facial model is beneficial to make a pixelwise decision, which can further be improved by incorporating neighborhood relations to model coherency of surfaces; (iii) restoration provides only an approximation of the surface and is not suitable for classification purposes, (iv) masked projection serves as a viable approach to apply subspace techniques on incomplete data.Item Transfer learning for sign language recognition(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Kındıroğlu, Ahmet Alp.; Akarun, Lale.Sign languages are visual languages that use hands, arms, and faces to communicate concepts. In the last decade, sign language recognition (SLR) research has made significant progress but still requires massive amounts of data to recognize signs. Despite efforts to create large annotated sign language datasets, applications that can translate for ordinary users in daily settings are yet to be produced. Most SLR research focuses on a few popular sign languages, leaving most sign languages, especially Turkish Sign Language (TID), under- resourced for sign language technology development. This dissertation addresses several open research questions about the development of SLR technology for TID from several perspectives. We generated BosphorusSign22k, an isolated SLR dataset for TID with 22k videos, and benchmarked state-of-the- art techniques on it. We proposed aligned temporal accumulative features (ATAF) to efficiently model sign language movements as dynamic and static subunits. Combined with methods using other modalities, the method achieves state-of-the-art performance on BosphorusSign22k. We then used regularized regression-based multi-task learning and presented task-aware canonical time warping for isolated SLR. The technique aligned and grouped signs to minimize discrepancies across different sources and emphasize class differences. Finally, we established a benchmark for cross-dataset transfer learning in isolated SLR. We evaluated supervised transfer learning algorithms using a temporal graph convolution-based SLR method. Experiments with closed and partial-set cross-dataset transfer learning reveal a substantial improvement over combined training and fine-tuning- based baseline techniques. NOTE Keywords : Convolutional neural networks, Image processing- computer assisted.Item Vision based sign language recognition: modeling and recognizing isolated signs with manual and non-manual components(Thesis (Ph.D.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008., 2008.) Aran, Oya.; Akarun, Lale.This thesis addresses the problem of vision based sign language recognition and focuses on three main tasks to design improved techniques that increase the perfor- mance of sign language recognition systems. We first attack the markerless tracking problem during natural and unrestricted signing in less restricted environments. We propose a joint particle filter approach for tracking multiple identical objects, in our case the two hands and the face, which is robust to situations including fast move- ment, interactions and occlusions. Our experiments show that the proposed approach has a robust tracking performance during the challenging situations and is suitable for tracking long durations of signing with its ability of fast recovery. Second, we at- tack the problem of the recognition of signs that include both manual (hand gestures) and non-manual (head/body gestures) components. We investigated multi-modal fu- sion techniques to model the different temporal characteristics and propose a two-step sequential belief based fusion strategy. The evaluation of the proposed approach, in comparison to other state of the art fusion approaches, shows that our method models the two modalities better and achieves higher classification rates. Finally, we pro- pose a strategy to combine generative and discriminative models to increase the sign classification accuracy. We apply the Fisher kernel method and propose a multi-class classification strategy for gesture and sign sequences. The results of the experiments show that the classification power of discriminative models and the modeling power of generative models are effectively combined with a suitable multi-class strategy. We also present two applications, a sign language tutor and an automatic sign dictionary, developed based on the ideas and methods presented in this thesis.