Browse

Recent Submissions

Now showing 1 - 20 of 121
  • Item
    Transfer learning for sign language recognition
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Kındıroğlu, Ahmet Alp.; Akarun, Lale.
    Sign languages are visual languages that use hands, arms, and faces to communicate concepts. In the last decade, sign language recognition (SLR) research has made significant progress but still requires massive amounts of data to recognize signs. Despite efforts to create large annotated sign language datasets, applications that can translate for ordinary users in daily settings are yet to be produced. Most SLR research focuses on a few popular sign languages, leaving most sign languages, especially Turkish Sign Language (TID), under- resourced for sign language technology development. This dissertation addresses several open research questions about the development of SLR technology for TID from several perspectives. We generated BosphorusSign22k, an isolated SLR dataset for TID with 22k videos, and benchmarked state-of-the- art techniques on it. We proposed aligned temporal accumulative features (ATAF) to efficiently model sign language movements as dynamic and static subunits. Combined with methods using other modalities, the method achieves state-of-the-art performance on BosphorusSign22k. We then used regularized regression-based multi-task learning and presented task-aware canonical time warping for isolated SLR. The technique aligned and grouped signs to minimize discrepancies across different sources and emphasize class differences. Finally, we established a benchmark for cross-dataset transfer learning in isolated SLR. We evaluated supervised transfer learning algorithms using a temporal graph convolution-based SLR method. Experiments with closed and partial-set cross-dataset transfer learning reveal a substantial improvement over combined training and fine-tuning- based baseline techniques. NOTE Keywords : Convolutional neural networks, Image processing- computer assisted.
  • Item
    Towards trustworthy personal assistants for privacy
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Aycı, Gönül.; Özgür, Arzucan.; Yolum, Pınar.
    Many software systems, such as online social networks, enable their users to share information about themselves online. However, users worry about the privacy implications of sharing content. It’s a tedious process to make privacy decisions and it makes managing privacy difficult. Recent approaches to help users manage their privacy involve building personal assistants that can recommend whether a user’s content is private or not. However, privacy’s ambiguous nature and difficulties in explaining assistants’ decision-making are challenges hampering users’ trust in these systems and therefore also widespread user adoption. In this dissertation we design trustworthy privacy assistants that can help tackle both challenges. We first propose a personal assistant called PURE that integrates machine learning to make predictions on whether a user would identify an image as private or not. An important characteristic of PURE is its ability to model uncertainty in its decisions explicitly. When uncertainty is high, no prediction is made and the decision is delegated to the user. By factoring in user’s own understanding of privacy, PURE is able to personalize its recommendations. A second crucial factor in fostering trust in personal assistants is their ability to explain their decision-making processes. Our second assistant PEAK is capable of generating such explanations for its recommendations, using latent topics and predefined explanation categories to do so. A user study shows users find PEAK’s explanations useful and easy to understand. Additionally, privacy assistants can use the explanations to improve their own decision-making, with the incorporation of PEAK into PURE resulting in less uncertain images delegated to the user whilst model performance is not compromised. Overall, our work makes an important contribution towards the development of trustworthy personal assistants capable of preserving users’ privacy. NOTE Keywords : Personal information management, Artificial intelligence, Right of privacy and its protection, Breach of confidentiality, Handling uncertainty, Explainable artificial intelligence.
  • Item
    Source separation via weakly-supervised and unsupervised deep learning
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Karamatlı, Ertuğ.; Say, Ahmet Celal Cem.; Kırbız, Serap.
    Source separation has been a key research area over the past several decades, and the emergence of deep learning approaches has revolutionized the field. Although supervised methods have been the pillars of this revolution, training such models often requires synthetic mixture datasets that may not represent real-world mixture signals adequately. In this thesis, we focus on single-channel source separation methods that are trained without having access to the underlying isolated source signals. This enables the training of source separation models solely on real-world mixture recordings that do not have corresponding source signals at hand. Therefore, it enables the models to be trained on a large amount of unlabeled or weakly- labeled data without additional labeling effort. We approach this problem in several different ways. First, we start with developing a decomposition-based weakly-supervised model that utilizes the class labels of the sources that are present in mixtures. We apply this weak class supervision approach to superimposed handwritten digit images using both non-negative matrix factorization (NMF) and generative adversarial networks (GANs). Second, we introduce another decomposition- based model that employs variational autoencoders (VAEs) to apply our weak class supervision approach to audio signals. Third, we introduce two purely unsupervised methods, which are trained exclusively on the mixture signals in a self-supervised fashion. The results of our experiments demonstrate that the proposed weakly-supervised and unsupervised methods are viable and mostly on par with the fully supervised baselines. We conclude that it is possible to replace supervised training with weakly-supervised and unsupervised methods in compatible real-world applications for better results.
  • Item
    Utilizing nonnegative tensor factorization methods for inference, model selection, and analysis in supervised learning
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Barsbey, Melih.; Özgür, Arzucan.; Cemgil, Ali Taylan.
    This thesis focuses on utilizing nonnegative tensor factorization (NTF) methods in various areas of supervised learning. We start with the introduction of a probabilistic NTF framework that can accommodate a wide range of modeling assumptions while maintaining algorithmic efficiency during inference. The flexibility provided by this framework is then utilized for inference, model selection, and analysis in various supervised learning problems. In the first of these scenarios, we use this approach to effectively model time series with nested, complex seasonalities, ensuring accuracy and interpretability. We then propose a novel method for learning to defer to an expert based on the output of a machine learning model in classification problems, and show that NTF can be utilized to extend this method to arbitrarily complex settings. Afterwards, we investigate when and why deep neural networks’ parameters become compressible, and use the aforementioned NTF framework to help analyze how these dynamics are reflected in the representation space. In addition to making independent contributions to various areas of supervised learning, our work shows that, coupled with a convenient modeling approach, NTF can be beneficial for a wide range of supervised learning problems. NOTE Keywords : Machine learning, Nonnegative tensor factorization, Graphical models, Deep learning.
  • Item
    Abstractive text summarization for morphologically rich languages
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Baykara, Batuhan.; Güngör, Tunga.
    The exponential growth in the number of documents available on the Web has turned finding the relevant piece of information into a challenging, tedious, and timeconsuming activity. Accordingly, automatic text summarization has become an important field of study by gaining significant attention from the researchers. Recent progress in deep learning shifted the research in text summarization from extractive methods towards more abstractive approaches. The research and the available resources are mostly limited to the English language, which prevents progress in other languages which especially differ in terms structure and characteristics such as the morphologically rich languages (MRLs). In this thesis, we mainly focus on abstractive text summarization on two MRLs, Turkish and Hungarian, and address their important challenges. Firstly, we tackle the resource scarcity problem by curating two large- scale datasets for Turkish (TR-News) and Hungarian (HU-News) aimed for text summarization, but are also suitable for other tasks such as topic classification, title generation, and key phrase extraction. Then, we utilize the morphological properties of these languages and adapt them to summarization where we show improvements upon the existing models. Later, we make use of pretrained multilingual sequence-to-sequence models and provide state-of-the-art models for abstractive text summarization and title generation tasks. Evaluation of text summarization for MRLs is very limited. Thus, we show how preprocessing can drastically influence the evaluation results through a case study in Turkish. Finally, morphosyntactic methods are proposed for text summarization evaluation and a human judgement dataset is curated. It is shown that morphosyntactic tokenization processes during evaluation increase correlation with human judgements. All the work and the curated datasets are made publicly available. NOTE Keywords : Abstractive text summarization, Morphologically rich languages, Text summarization evaluation.
  • Item
    Multi-objective task scheduling in heterogeneous fog environments
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2023., 2023) Altın, Lokman.; Gürgen, Fikret.; Topçuoğlu, Haluk Rahmi.
    Limitations of the conventional cloud computing have surfaced as the internet evolves towards Future Internet where billions of devices connected to the global network producing enormous volume of data. Fog computing is proposed to overcome the drawbacks of the cloud computing which brings the computation and storage towards the edge of the network. Task scheduling on fog environments surges new challenges compared to scheduling on conventional cloud computing. Although there are a few recent work on task scheduling in fog computing, they are very limited and they do not represent most of the major challenges in fog computing. Various levels of heterogeneity and dynamism cause task scheduling problem to be more challenging for fog computing. In this thesis, we present a multi-objective task scheduling model with total of five objectives; and we propose two multi-objective multi rank scheduling algorithms for fog computing, the MOMRank and the LAMOMRank algorithms. The performance of the proposed strategies is assessed with well- known multi objective metaheuristics (the NSGA-II and the SPEA2 algorithms) and a widely used algorithm from the literature (the MOHEFT algorithm) using three common multi-objective metrics. Furthermore, set of highlighted individual metrics are also measured to address open issues in fog environments. We populated our workloads with the Pegasus workflows with dependent tasks that will produce network traffic and the DeFog applications that will demand real-time requirements. Additionally, we incorporate two task clustering schemes to the algorithms in order to improve data transmissions on interconnection networks. Results of empirical evaluations given in performance profiles over all instances validate significance of our algorithms in terms of multiobjective metrics, diminishing fog cluster network and reducing latency for real time applications.
  • Item
    Using machine learning to improve automated test generation
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Köroğlu, Yavuz.; Şen, Alper.
    Underestimating the value of software testing had catastrophic results in recent history. Automated Test Generation (ATG) is an approach that aims to minimize the manual effort required for testing. This thesis aims to improve the effectiveness and performance of ATG approaches via Machine Learning (ML) based guidance, and focuses on Android Graphical User Interface (GUI) testing using Reinforcement Learning (RL), specifically. We propose four solutions, Q-learning Based Exploration (QBE), Test Case Mutation (TCM), Fully Automated Reinforcement LEArning Driven (FARLEAD), and FARLEAD2 test generators. QBE uses RL to crawl a set of applications and learns an action generation policy while exploring. Then, it uses this learned policy to either detect more unique crashes or cover more activities in new applications. TCM takes the tests QBE generates and replaces the well-behaving actions in those tests with bad-behaving ones to detect even more crashes. FARLEAD uses RL to learn how to verify a functional behavior that is given as a high-level test scenario in the form of a monitorable formal specification. FARLEAD learns by trial-and- error like QBE but it learns app-specific patterns instead of QBE’s app-generic patterns. To the best of out knowledge, FARLEAD is the first engine fully automating the functional testing of GUI applications. Finally, FARLEAD2 improves FARLEAD with Generalized Experience Replay (GER) and human-readable Staged Test Scenario (STS) language. Experimental results show that, QBE outperforms state-of-the-art test generators in crash detection and coverage. Furthermore, executing QBE first and then switching to TCM detects even more unique crashes. FARLEAD and FARLEAD2 expand the scope of automated testing to verifying functional behavior. Overall, these test generators elevate automated GUI testing closer to replacing manual GUI testing.
  • Item
    Algorithms for learning from online human behavior and human interaction with learning algorithms
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Çapan, Gökhan.; Özgür, Arzucan.; Cemgil, Ali Taylan.
    In modern digital systems, algorithms that deliver personalized content shape the user experience and affect user satisfaction, hence long-term engagement with the system. What the system presents also influences the parties providing content to the system since visibility to the user is vital for reachability. Such algorithms learn to deliver personalized content using data on previous user behavior, e.g., their choices, clicks, ratings, etc., interpreted as a proxy for user preferences. In the first part of this work, we review prevalent models for learning from user feedback on content, including our contributions to the literature. As such data is ever-growing, we discuss computational aspects of learning algorithms and focus on software libraries for scalable implementations, including our contributions. The second part is on learning from user interactions with algorithmic personalization systems. Albeit helpful, human behavior is subject to cognitive biases, and data sets comprising their item choices are subject to sampling biases, posing problems to learning algorithms that rely on such data. As users interact with the system, the problem worsens—the algorithms use biased data to compose future content. Further, the algorithms self-reinforce their inaccurate beliefs on user preferences. We review some of the biases and investigate a particular one: the user’s tendency to choose from the alternatives presented by the system, putting the least effort into exploring further. To account for it, we develop a Bayesian choice model that explicitly incorporates in the inference of user preferences their limited exposure to a systematically selected subset of items by an algorithm. The model leads to an efficient online learning algorithm of user preferences through interactions.
  • Item
    Stress measurement and regulation in real-life using affective technologies
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Chalabianloo, Niaz.; Ersoy, Cem.
    Stress has become one of the main contributors to serious mental and physical health issues in today’s world. Existing works in the literature have used Psychophysi ological measures and proposed numerous mechanisms to detect stress and administer feedback to help users regulate it. Unobtrusive wearables’ popularity is increasingly growing, intertwined with digital health notions, making them efficient, inexpensive, and easily accessible affective self-help technologies. This thesis first aims to investigate and implement stress detection mechanisms in the laboratory and everyday environ ments using unobtrusive wearable devices. In this regard, we investigate various sce narios, such as how to design and deploy stress measurement models that can efficiently use multi-modal data coming from different types of wearables used in the laboratory and real- life settings. We also study low-cost and practical methods for emotion regula tion in stressful conditions of everyday life. In the next step, a mixed-methods study is conducted. For this, signals from multiple wearables and users’ subjective opinions re garding different aspects of wearability were analyzed quantitatively and qualitatively. The next step is an in-depth study in cooperation with HCI researchers, in which we demonstrate the effects of haptic feedback on emotion regulation. As a next step for helping users choose the right device, we evaluate several wearables under completely identical conditions to compare the stress detection quality in wearables with differ ent technologies. Finally, we utilize Explainable AI (XAI) to make our models more understandable for the end users, and in particular for the psychology and clinical experts. The results of our studies indicate that an integrated detection, notification, and intervention cycle is required to ensure a reliable system for regulating stress in daily life.
  • Item
    Deep learning-based dependency parsing for Turkish
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Bilgin, Şaziye Betül.; Özgür, Arzucan; Güngör, Tunga.
    Dependency parsing is an important step for many natural language processing (NLP) systems such as question answering and machine translation. Turkish, being a morphologically rich language and having a complex grammar, is challenging for au tomatic processing. Limited NLP tools and resources for Turkish make the task even more challenging. Data-driven deep learning models show promising performance in dependency parsing. Yet, the amount of data to train a data-driven dependency parser directly affects performance, and deep learning-based systems require extensive data to achieve good performance. In this thesis, we focused on Turkish dependency parsing and proposed two solutions to the challenges this task poses. First, we increased the size and quality of labeled data for Turkish dependency parsing. In this respect, we cre ated the BOUN Treebank by annotating 9,761 sentences. In addition, we re- annotated the IMST and PUD treebanks using the same annotation scheme. As a result, we presented the largest collection of Turkish treebanks with consistent annotation. Sec ond, we developed novel state-of-the-art dependency parsing models for Turkish as well as other low-resource languages. As our first parsing approach, we introduced a hybrid dependency parser where Turkish grammar rules and morphological features of words are integrated into the deep learning model. Despite the limited training data, the hybrid parser achieved higher success than the current methods for Turkish dependency parsing. As our second parsing approach, we proposed a deep dependency parser with semi-supervised enhancement. By conducting experiments on a number of low-resource languages besides Turkish, we achieved state-of-the-art results on all datasets. We have shown that deep learning-based models can be improved not only by additional training data, but also by integrating intelligently extracted information.
  • Item
    Parallel network flow algorithms
    (Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Kara, Gökçehan.; Özturan, Can.
    Network flows is an active area of research that has applications in a wide variety of fields. Several network flow problems are reduced to either the maximum flow problem or the minimum cost flow problem. Maximum flow problem involves finding the maximum possible amount of flow between two designated nodes on a network with arcs having flow capacities. Minimum cost flow problem tries to determine a flow with the minimum cost on a network with supply and demand nodes. In this thesis, we propose two parallel algorithms for the maximum flow and the minimum cost flow problems respectively. First, we present a shared memory parallel push-relabel algorithm for the maximum flow problem. Graph coloring is used to avoid collisions between threads for concurrent push and relabel operations. In addition, excess values of target nodes are updated using atomic instructions to prevent race conditions. The experiments show that our algorithm is competitive for wide graphs with low diameters. Second, we contribute a parallel implementation of the network simplex algorithm that is used for the solution of minimum cost flow problem. We propose finding the entering arc in parallel as it often takes the majority of the execution time. Scanning all arcs can take quite some time, so it is common to consider only a fixed number of arcs which is referred as the block search pivoting rule. Arc scans can easily be done in parallel to find the best candidate as the calculations are independent of each other. We used shared memory parallelism using OpenMP along with vectorization using AVX instructions. We also tried adjusting block sizes to increase the parallel portion of the algorithm. Our experiments show speedups up to 4 are possible, though they are typically lower.
  • Item
    Extended models of finite automata
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Salehi, Özlem.; Say, Ahmet Celal Cem.
    Many of the numerous automaton models proposed in the literature can be regarded as a nite automaton equipped with an additional storage mechanism. In this thesis, we focus on two such models, namely the nite automata over groups and the homing vector automata. A nite automaton over a group G is a nondeterministic nite automaton equipped with a register that holds an element of the group G. The register is initialized to the identity element of the group and a computation is successful if the register is equal to the identity element at the end of the computation after being multiplied with a group element at every step. We investigate the language recognition power of nite automata over integer and rational matrix groups and reveal new relationships between the language classes corresponding to these models. We examine the e ect of various parameters on the language recognition power. We establish a link between the decision problems of matrix semigroups and the corresponding automata. We present some new results about valence pushdown automata and context-free valence grammars. We also propose the new homing vector automaton model, which is a nite automaton equipped with a vector that can be multiplied with a matrix at each step. The vector can be checked for equivalence to the initial vector and the acceptance criterion is ending up in an accept state with the value of the vector being equal to the initial vector. We examine the e ect of various restrictions on the model by con ning the matrices to a particular set and allowing the equivalence test only at the end of the computation. We de ne the di erent variants of the model and compare their language recognition power with that of the classical models.
  • Item
    Person detection and tracking using omnidirectional cameras, and rectangle blanket problem
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Demiröz, Barış Evrim.; Akarun, Lale.; Salah, Albert Ali.
    Person detection and tracking can provide the crucial analysis needed to avoid accidents with autonomous machinery, optimize environments for effciency and assist the elderly. Omnidirectional cameras have a large field of view that allow them to cover more ground at the expense of resolution. Omnidirectional cameras can decrease setup, maintenance and computational costs by reducing the number of cameras and the bandwidth required. Computer vision methods developed for conventional cameras usually fail for omnidirectional cameras due to their di erent image formation geometry. In this thesis, rst, a novel dataset for person tracking in omnidirectional cameras is introduced. The dataset, namely BOMNI, contains 46 videos of persons moving inside a room; where the bounding boxes and the identity of the persons are annotated at every frame. Second, a generative Bayesian framework is developed for coupling person tracking and fall detection. The method is evaluated on BOMNI dataset, producing 93% tracking accuracy and fall detection within a few frames of the event. Third, a similar method for multiple person tracking is developed and evaluated on BOMNI dataset. The method reaches 86% tracking accuracy, increasing a previous approach by 18%. Fourth, a discriminative method for person detection is presented. Also a novel structure called Radial Integral Image that speeds up feature extraction step is introduced. This method achieves state of the art detection performance on IYTE dataset: 4.5% miss rate for one false positive per image. Finally, the problem of representing a shape with multiple rectangles, Rectangle Blanket Problem, is formulated as an integer programming problem and a branch-and-bound scheme is presented along with a novel branching rule to solve it optimally. This problem is encountered in the earlier sections of this thesis, but it is a general problem that is present in the literature.
  • Item
    Utilizing weakly-supervised learning for hashtag segmentation and named entity disambiguation
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020., 2020.) Çelebi, Arda.; Özgür, Arzucan.
    Today’s high-performing machine learning algorithms learn to predict by the supervision of large amounts of human-labeled data. However, the labeling process is costly in terms of time and effort. In this thesis, we design weakly-supervised ap proaches, which are based on automatically labeling raw data, for two different Natural Language Processing (NLP) tasks, namely hashtag segmentation and Named Entity Disambiguation (NED). Hashtag segmentation’s aim is to identify the words in the hashtags, so as to process and understand them better. We propose a heuristic to ob tain automatically segmented hashtags using a large tweet corpus and use these data to train a maximum entropy classifier. State-of-the-art accuracy is achieved for hashtag segmentation without using any manually labeled training data. The target of NED, which is the second task that we address, is to link the named entity (NE) mentions in text to their corresponding records in the Knowledge Base. We hypothesize that the types of the NE mentions may provide useful clues for their correct disambigua tion. The standard approaches for identifying mention types require a type taxonomy and large amounts of mentions annotated with their types. We propose a cluster-based mention typing approach, which does not require a type taxonomy or labeled mentions. This weakly-supervised approach is based on clustering the NEs in Wikipedia by using different levels of contextual information and automatically generating data for train ing a mention typing model. The mention type predictions lead to significant F-score improvement when incorporated to a supervised NED model. This thesis shows that designing weakly-supervised approaches by considering the underlying characteristics of the addressed problem can be an effective strategy for NLP
  • Item
    Distance approximations between high and multi-dimensional structures
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Semerci, Murat.; Cemgil, Ali Taylan.
    In this thesis, we focus on distance approximation methods between high and multi-dimensional structures and their applications. Two novel methods using distance approximations are proposed and they are applied to anomaly detection in cyber security (Distributed Denial of Service -DDoS- attack and attacker detection) and tensor decomposition in object retrieval (image and video classi cation on scarce data). At rst, we consider an autonomous cyber security system that consists of two components: A monitor for detection of DDoS attacks and a discriminator for detection of users in the system with malicious intents. A novel adaptive real time change-point detection model that tracks the changes in the Mahalanobis distances between sampled feature vectors in the monitored system accounts for possible DDoS attacks. A clustering model that runs over the similarity scores of behavioral patterns between the users is used for segregating the malicious from the innocent. Secondly, we propose a discriminative tensor decomposition with large margin (LMTD), which is a distance based model that nds the projection directions where the nearest neighbor classi - cation accuracy is improved over the projected instances. We experiment the cyber security system in a simulated SIP communication environment. Both the attack and attacker detection components are compared with some competitors in the literature. The tensor decomposition is applied to the image and video retrieval problem, where the data is scarce, and its performance also is compared with other decomposition methods. The experimental results are reported for both applications. It is shown that the proposed methods perform higher accuracy rates than their competitors.
  • Item
    Bayesian methods for network traffic analysis
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Kurt, Barış.; Cemgil, Ali Taylan.
    Statistical information about tra c patterns help a service provider to characterize its network resource usage and user behavior, infer future tra c demands, detect tra c and usage anomalies, and possibly provide insights to improve the performance of the network. However, the increasingly high volume and speed of data over modern networks make collecting these statistics di cult. Moreover, smarter network attacks require sophisticated detection methods that are able to fuse many network and hardware signals. Fortunately, Bayesian statistical methods are powerful tools that can infer such information under the harsh network environments. In this thesis we apply two Bayesian methods for two speci c network problems. First, we use the Bayesian multiple change models to detect DDoS attacks in SIP networks by fusing the observations coming from the network tra c and the networking hardware. We show that our method is superior to classic DDoS detection methods and using hardware signals improve the detection rate. For this work, we developed a probabilistic SIP network simulator and a monitoring system, and published it as an open-source software. In our second work, we estimated network statistics from a high speed network where we can only observe a fraction of the network tra c. For this problem we develop a generic novel method called ThinNTF, based on non-negative tensor factorization. This method can work with di erent network sampling schemes and recovers original network statistics by detecting the periodic network tra c patterns from the sampled network data and gives better estimates compared to the state of the art.
  • Item
    Urban cellular wireless network planning with 3D geographical grid structures
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Özyurt, Murat.; Tuğcu, Tuna.
    The largest traffic on the Internet is generated by multimedia content, and the share of wireless mobile access to the multimedia content is ever increasing. The delivery process in the content depends on the type of the content, and whether it requires a strictly time sensitive live session, or it can be delivered on demand, irrespective of time of request. As a general solution in the IP networks, the delivery is taking place as multiple individual transmissions from the source to all receivers, replicating the same content at di erent times, or at the same time if the content is streaming live. In this thesis, we propose an advanced collaborative multicast routing model for delivering bandwidth hungry streaming content such as IPTV to multiple users having low quality wireless connection, with the help of other nearby users having a better connection quality utilizing the wireless mesh network topology. The model reduces the amount of data replication, which causes signifficant overhead in the network transmissions and intermediate computations, improving the overall throughput of the wireless network. An essential issue in wireless mobile networks is where to position the base stations on the network coverage area. We also propose an alternative adaptive mixed path signal propagation estimation model for planning the base station locations in new deployment areas, based on the signal characteristics of existing networks. In doing so, we utilize digital 3D maps of urban areas and signal measurements to train the adaptive model, and to exploit local similarities in cities in order to estimate the signal channel characteristics and calculate the potential base station locations for covering the new area with a speciffic wireless communication technology.
  • Item
    Efficient personalized learning to rank from implicit feedback for time-sensitive recommendations
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Yağcı, Arif Murat.; Gürgen, S. Fikret.; Aytekin, Tevfik.
    This thesis focuses on the problems at the intersection of time-sensitive recommendations, implicit user feedback, and learning to rank. Major challenges for achieving time sensitivity are distinguished, the importance of handling implicit feedback is emphasized, and an overview of learning to rank methods is presented with an emphasis on the models that can learn from implicit feedback for time-sensitive recommendations. Subsequently, novel and improved personalized learning to rank methods are proposed to handle large-scale implicit feedback datasets and streams as well as to defeat the different challenges for achieving time-sensitive recommendations. These proposals comprise: (i) Mining the user feedback stream for collaborative filtering and the SASCF algorithm, (ii) Parallel personalized pairwise learning to rank and the PLtR family of algorithms, (iii) Improving the efficiency of top-N predictions from matrix factorization models and the MMFNN meta-algorithm, (iv) Learning intention in user sessions and the BRF family of algorithms, and finally (v) Timely push recommendations in a cold start setting and a hybrid learning to rank approach. Theoretical as well as extensive empirical analyses of the proposed methods on real-life data show significant performance and trade-off improvements with respect to ranking accuracy, adaptivity, diversity, efficiency, and scalability.
  • Item
    Text-based machine learning methodologies for modelling drug-target interactions
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Öztürk, Hakime.; Özgür, Arzucan.; Özkırımlı, Elif.
    The identi cation of novel interactions between proteins and drugs with computational methodologies constitutes a signi cant area of research. Most often, a drug can be re-purposed to target a novel protein which enables machine learning algorithms to learn from existing interactions to predict unknown interactions. The main goal of this thesis is to model the interactions between proteins and ligands (drug candidates) using their textual representations via machine/deep learning techniques. With that aim, we introduce a novel ligand representation approach and a novel protein representation approach as well as two prediction systems for identifying the strengths of the interactions between proteins and compounds (i.e., their binding a nities). The common theme of these studies is the use of textual representations of proteins (i.e., amino-acid sequences) and compounds (i.e., SMILES). A major advantage of textbased representations is that they are experimentally easier to obtain compared to the three-dimensional (3D) representations and therefore there are more protein/ligand text-based representations available than 3D representations. Furthermore, processing text-based representations is computationally less expensive compared to processing two-dimensional (2D) and 3D representations. We hypothesize that, much like natural languages, bio-chemical sequences have their own languages and processing these languages might reveal important insights about their characteristics. The application of Natural Language Processing (NLP) based approaches in tasks such as protein family/super-family clustering and protein-ligand binding a nity prediction achieved state-of-the-art performance. These results indicate that the textual forms of proteins and ligands can be used to formulate e ective solutions to address di erent bioinformatics and cheminformatics problems.
  • Item
    Ontology-based entity tagging and normalization in the biomedical domain
    (Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2019., 2019.) Karadeniz Erol, Zeynep İlknur.; Özgür, Arzucan.
    One of the challenges for scientists in the biomedical domain is the huge amount and the rapid growth of information buried in the text of electronic resources. Developing text mining methods to automatically extract biomedical entities from the text of these electronic resources and identifying the relations between the extracted entities is crucial for facilitating research in many areas in the biomedical domain. Two main problems, which have to be solved to accomplish this goal, are the extraction and normalization of entities, and the identi cation of the relations between them from a given text. In this thesis, we proposed two approaches with two di erent perspectives for the extraction and normalization of biomedical named entities. The rst approach makes use of shallow linguistic knowledge to extract entities and normalize them through an ontology. On the other hand, the second approach makes use of word embeddings, which convey semantic information, for the normalization of the entities in a given text. The word-embedding based approach obtained the state-of-the-art results on the BioNLP Shared Task 2016 Bacteria Biotope data set. Both of the proposed methods are unsupervised and can be adapted to di erent domains. We also developed two applications, one of which is a pipeline, which is composed of modules based on the approaches that we proposed in this thesis, for the extraction of bacteria biotope information from scienti c abstracts. The other application is developed for extracting Brucella-host interaction relevant data from the biomedical literature, whose results reveal the importance of using a wider context than a sentence for biomedical relation extraction.