BOUN Repository :: Browsing by Author "Yurdakul, Arda."

Browsing by Author "Yurdakul, Arda."

Now showing 1 - 17 of 17

A common subexpression elimination-based compression method for the constant matrix multipication
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Bilgili, Emre.; Yurdakul, Arda.
The execution time, resource and energy costs of deep learning applications become much more important as their popularity grows. The Constant Matrix Multi plication has been studied for a long time and takes place in deep learning applications. Reducing the computation cost of those applications is a highly active research topic. The weights are pruned or quantized while satisfying the desired accuracy requirement. The pruned matrices are compressed into one-dimensional arrays without data loss. Matrix multiplication is performed by processing those arrays without decompression. Processing one-dimensional arrays to perform matrix multiplication is deployed on vari ous hardware platforms that employ Central Processing Unit, Graphics Processor Unit and Field-Programmable Gate Array. The deployments can also be supported with common subexpression elimination methods to reduce the number of multiplications, additions and storage size. However, the state-of-the-art methods do not scale well for the large constant matrices as they reach hours for extracting common subexpressions in a 200 × 200 matrix. In this thesis, a random search-based common subexpression elimination method is constructed to reduce the run-time of the algorithm. The algo rithm produces an adder tree for a 1000 × 1000 matrix in a minute. The Compressed Sparse Row format is extended to build a one-dimensional compression notation for the proposed method. Simulations for a single-core embedded system show that the latency is reduced by 80% for a given 100×100 matrix compared to the state-of-the- art methods. The storage size of the sparse matrices is also reduced by more than half in the experiments compared to the Compressed Sparse Row format.
A systematic approach for register file design in FPGAs
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2014., 2014.) Yantır, Hasan Erdem.; Yurdakul, Arda.
For the future of computing, wide usage of heterogeneous and parallel architectures is indispensable since advances in technology scaling cannot satisfy the expected increase in performance of computational platforms anymore. FPGA is a promising platform for such computing systems due to its configurable structure. Each part of an FPGA can be configured to perform a different task that it is best suited for. Multiport and fast register files are very essential for this type of data intensive computational systems. Otherwise, available computational power cannot be utilized properly. When the characteristics of processing elements are different, such a system needs a heterogeneous register file (RF) that can serve different parts of the FPGA with different characteristics in terms of running frequency, data consumption/production rate, required number of ports, data widths, address spaces and endianness. In this dissertation, we firstly propose a new multi-port RF design which exploits the banking and replication of BRAMs with efficient shift register based multi-pumping (SR-MPu) approach. We also model this register file for the use of HLS tools. Finally, we propose a heterogeneous register file (HRF) architecture for FPGA-based heterogeneous systems. In this RF, world length and address spaces of the processing elements are adjustable. For the power and area reduction, the design takes advantage of frequency differences between processing elements by an efficient multi-pumping system. According to the literature, this is the first study on FPGA-based heterogeneous RFs. Experimental results show that both RF architectures outperform conventional RFs.
Afronoc: an adaptive flexible network on chip router
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009., 2009.) Çoğal, Ömer.; Dündar, Günhan,; Yurdakul, Arda.
As the complexity of on-chip systems grows, scalability and re-configurability becomes an important issue in both system and interconnection levels for SoC systems. Flexible and configurable architectures bring the advantage of reusability of the same hardware in different regular topologies such as torus, mesh, tree and in custom irregular ones. Research in NoC design points the importance of scalability, configurability and flexibility of the routers and on chip interconnects. This thesis describes an adaptive and flexible router design for all Network on Chip topologies, which can be changed during runtime. In °exible NoCs, table updates are carried out by a central unit, which increases complexity and area of the overall system. In AFRONOC, a re-duced form of the link state routing is introduced for table updates so that the overall system can set up/change the topology by itself. Hence, the proposed adaptation algorithm makes the router stand-alone, that means it can adapt to the rest of the network without help of any external or central monitoring. The proposed adaptation process initializes the routing tables in a short time when compared with the reconfiguration based methods. Design-time configurability is achieved in terms of the number of channels, the number of nodes in the network, buffer size of each channel and physical data width. As a result, the router can be considered as a solution in ad-hoc NoCs for fast prototyping, which is necessary for filling the design productivity gap in NoC design. Area occupation of an example implementation with four I/O channels, eight bit data width, four bit address width on a Virtex-II pro xcvp70 device is 750 slice, which is 2 per cent of the total area of the FPGA.
An embedded RISC-V with fast modular multiplication
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020., 2020.) Irmak, Ömer Faruk.; Yurdakul, Arda.
While one of the biggest enabling factors of Internet of Things growth is cheap and capable hardware, maybe the biggest concern is privacy and security. Encryption and authentication need big power budgets, which battery-operated IoT end-nodes do not have. Hardware accelerators designed for specific cryptographic operations provide little to no flexibility for future updates. Custom instruction solutions are smaller in area and provide more flexibility for new methods to be implemented. One drawback of custom instructions is that the processor has to wait for the operation to finish. Eventually, the response time of the device to real-time events gets longer. In this work, we propose a processor with an extended custom instruction for modular multiplication, which blocks the processor, typically, two cycles for any size of modular multiplication. We adopted embedded and compressed extensions of RISC-V for our proof-of-concept CPU. Our design is benchmarked on recent cryptographic algorithms in the field of elliptic-curve cryptography. Our CPU with 128-bit modular multiplication operates at 136MHz on ASIC and 81MHz on FPGA. It achieves up to 13x speed up over software implementations while reducing overall power consumption by up to 95% with 41% average area overhead over our base architecture.
Approximate processor design with RISC-V ISA
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020., 2020.) Taştan, İbrahim.; Başkaya, Faik.; Yurdakul, Arda.
With the rise of the Internet of Things (IoT), low-cost resource-constrained devices have to be more capable than traditional embedded systems, which operate on stringent power budgets. To add new capabilities such as learning, power consumption planning has to be revised. Approximate computing is a promising paradigm for reducing power consumption at the expense of inaccuracy introduced to the computations. In this thesis, we propose a processor with approximate processing functionality for resource-constrained IoT devices. A microprocessor with a dual-datapath mechanism is described in C++ and synthesized with a High-Level Synthesis (HLS) tool. A standard datapath exists for the parts of applications where the calculation should be exact. Additionally, an approximate datapath, which includes approximate computing features that will be more likely to exist in the next generation, low-cost, resourceconstrained, and learning IoT devices, is introduced. Coarse-grain control for setting the accuracy of approximate operations is adopted to reduce the number of control signals by grouping the bits so that they can be turned on-o simultaneously. The size of the operands of the approximate operators is dynamically adjusted at the data path without a ecting the performance. Based on these features, we propose new approximate adder and multiplier designs and integrate these blocks with a CPU, which bene ts from RISC-V ISA. Targeting machine learning applications such as classi cation and clustering, we have demonstrated that our processor reinforced with approximate operations can save power up to 23% for ASIC implementation while at least 90% top-1 accuracy is achieved on the trained models and test datasets.
Automatic datapath and controller generation for reconfigurable ASIP
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2013., 2013.) Çulha, Ender.; Cerid, Ömer.; Yurdakul, Arda.
The need for complex designs that meet the desired application speci c criteria and time-to-market pressure increase the importance of High Level Synthesis (HLS) tools, which take high level behavioral representation of the desired functionality as the input and generate HDL description of hardware at RTL level for FPGA or ASIC targets. FPGAs are getting more popular than ASICs and microprocessors due to their architectural exibility, on-site upgradability and computing power. In this thesis, an HLS tool for FPGAs is proposed. This tool has the following capabilities: (i) generation of optimized RTL which consists of datapath and its controller. To achieve this, the tool extracts the clock period of the optimized RTL by using the optimization results and the delay models of the arithmetic operators. (ii) generation of Golden RTL where there is no optimization and resource sharing on the datapath. (iii) estimation of delay and area of the generated RTL speci cations by using the estimation models. This tool is integrated in RH(+) Design Automation Framework. The generated RTLs are tested in Xilinx Spartan-3 FPGA. The estimated delay and area of both the Golden RTL and Optimized RTL generated by the tool are compared with the results of Xilinx ISE tool set for di erent input applications.|Keywords : Microprocessors, Computer aided automation, Design automation, Hardware, HDL, VHDL, Electronic circuits, FPGA, Computer softwares, Computer arithmetic
Daily life oriented indoor localization by fusion of smartphone sensors and Wi-Fi
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2018., 2018.) Nurdağ, Ayşe Vildan.; Yurdakul, Arda.; Arnrich, Bert.
Smartphones are leading among the fastest-growing technologies. With their nu merous features, smartphones are the best assistants to users in their lives on several counts. However, a smartphone still requires an extensive conﬁguration to assist every user eﬃciently and eﬀectively. In this thesis, we are motivated to develop a system that makes a smartphone self-conﬁgure automatically depending on its place. This has been well established for outdoor environments with contributions of GPS (Global Positioning System). However, GPS does not provide accurate data in indoor envi ronments. Hence, in this thesis, we aim to determine the exact place of a smartphone in a room by exploiting on-device sensors and Wi-Fi services. The key point of our study is that it entirely works on the smartphone. In accordance with our motivation, sensors data and Wi-Fi RSSI values were collected from ﬁxed places via Data Collec tion Application which we developed on an Android smartphone. A fusion ﬁngerprint database was created. Five supervised machine learning algorithms were evaluated on the ﬁngerprint database in terms of classiﬁcation accuracy and process time. The best performance was obtained from Decision Tree Classiﬁer with 98% accuracy rate on 20% of training samples. Predictive power of used features were studied to specify which sensors are more meaningful for distinguishing indoor places from each other. Depending on model evaluation results, a Data Classiﬁcation Application was devel oped on the same Android smartphone to generate a dedicated decision tree for each diﬀerent room. Tests were carried out in three diﬀerent rooms to show that more than 80% accuracy was achieved in ﬁnding the correct place in each room.
Design and implementation of an on-line CFA demosaicking core
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2009., 2009.) Kabukcu, Gökhan.; Yurdakul, Arda.
The thesis introduces a low-cost algorithm for improving the demosaicking process in the texture areas such as one-pixel patterns. The algorithm rst detects di cult texture regions. After the detection process is completed, the algorithm demosaicks the texture areas using special demosaicking operations whereas non-texture regions are restored using some of the existing demosaicking approaches. In this way, the quality of the texture areas in demosaicked images can be improved up to 70% while the computational complexity of the original demosaicking solution is increased only slightly. The new algorithm is implemented as a core by using VHDL (Very High Speed Integrated Circuit Hardware Description Language) language. The operational veri cation of the VHDL implementation is performed on FPGA (Field Programmable Gate Array). The Virtex-II XC2V500 device is selected in the implementation. The core is capable of processing 1000 x 1000 pixels real-time digital video and 1000 x n pixels digital still images. The system operates at 25 MHz frequency and can process 25 images per second which is a su cient speed for video processing.
Development of a high level synthesis tool specialized on fir-based multirate systems
(Thesis (Ph.D.)- Bogazici University. Institute for Graduate Studies in Science and Engineering, 1999., 1999.) Yurdakul, Arda.; Dündar, Günhan,; Tansal, Sabih.
Digital Signal Processing (DSP) is the most studied area in design automation, because it is one of the most well-established branches of electrical engineering for several years. In the last few years, it is stimulated by the progression of multirate techniques. The key property of multirate algorithms is their computational efficiency. In this thesis, a silicon compiler is developed to reduce design time for the hardware realization of FIR-based multirate DSP algorithms. This is a brand new study, because there does not exist a silicon compiler of this type according to our knowledge. Although multirate algorithms contain decimators and interpolators changing the effective sample rate, the design of synchronous systems using a single-clock signal is possible by this newly developed tool. The designer can achieve this by folding nodes of similar type into a single node. Additionally, the FIR filters followed by a decimator or following an interpolator can be entered as a single node while defining a system at the input of the tool. Also multiplications with the tap coefficients in FIR-based nodes in a fold are handled at the same time to exploit common terms so as to realize those multiplications without multipliers. As a result, the tool produces very efficient layouts in terms of area, power and clock signals. It can also determine the quantization levels of tap coefficients in FIR-based nodes and fractional parts of data bus if the system output error is specified. It also handles module selection under given power, area and delay constraints and scheduling like other well-known silicon compilers. The compiler is programmed to process bit-parallel-digit-serial architectures.
Electanon : a blockchain-based, anonymous, robust and scalable ranked-choice voting protocol
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2022., 2022) Onur, Ceyhun.; Yurdakul, Arda.
Remote voting has become more critical in recent years, especially after the Covid-19 outbreak. Blockchain technology and its benefits like decentralization, se curity, and transparency have encouraged remote voting systems to use blockchains. Analysis of existing solutions reveals that anonymity, robustness, and scalability are common problems in blockchain-based election systems. In this thesis, we propose ElectAnon, a blockchain-based, ranked-choice election protocol focusing on anonymity, robustness and scalability. ElectAnon achieves anonymity via zero-knowledge proofs. Robustness is realized by removing the direct control of the authorities in the voting process. Scalability is ensured by treating each ranked-choice ballot as a permutation list, then encoded into a single integer that can be efficiently stored. The proposed protocol includes a candidate proposal system to provide an end-to-end election solu tion. We also discuss three different extensions in this thesis. The Multiple Elections extension provides a mechanism to use the same set of voters for multiple elections. The Merkle Forest extension minimizes the trust assumption on election authorities in exchange for a decrease in scalability. The Assisted Merkle Tree extension offers just the opposite tradeoff by increasing scalability in favor of requiring external assistance from authorities. ElectAnon is implemented using Ethereum smart contracts and a zero-knowledge gadget, Semaphore. The implementation includes two different sophis ticated tallying methods, Borda Count and Tideman. Results show that ElectAnon is capable of running feasibly with up to 100,000 voters and reduces the gas consumption up to 89% compared to previous works.
GAIA: a general application instruction set and architecture explorer
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2008., 2008.) Soykök, Ayşe Gaye.; Yurdakul, Arda.
Embedded Systems are dedicated to a task for their life time with no or slight modi cations. These systems are necessary in a wide range of industrial areas from entertainment industry to cryptography and from house appliances to army equipment. The emerging of processors with customizable instruction sets and customizable architectures has enabled the embedded processors to be tailored for the application they are dedicated to. Tailoring stands for improving incompetent parts of an application by modifying the processor. Development of design automation tools have been a new research era for embedded processors. They enable customization either by partial automation which requires human assistance at varying levels or by full automation. In this thesis, an automation tool GAIA that selects custom instructions (CI) and Single Instruction Multiple Data (SIMD) style processing elements (PEs) has been developed. The system achieves customization by examining the intermediate representation (IR) of an application. It is a fuzzy expert system that acts as a voting mechanism evaluating the attributes of the application components. The work of this thesis contributes to the stage between the front-end and back-end compilation, with the aim of assisting back-end compilation at customization process.
Model-based design of a roadside unit for emergency and disaster management
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2020., 2020.) Hilal, Nur.; Yurdakul, Arda.
Every year, a massive number of deaths happen because of traffic accidents. In order to increase the traffic victim’s survival rates, it is important to reduce the arrival time of trauma intervention teams to accidents’ sites. Automatic incident detection provides faster incident reporting, in which it decreases the delay of arrival time of first responders. In this thesis, we propose a system where traffic incidents can be detected, verified and reported using multiple detection mechanisms and communication technologies to provide faster response and allow for first responders to arrive as quickly as possible. Our proposed system contains a Roadside Unit (RSU), which is responsible for inci dent detection using two automatic incident detection algorithms depending on traffic parameters collected by Inductive Loops sensors and communication technology called VANET. The proposed RSU, also, listens and receives two incident reporting signals sent by eCall and WreckWatch solutions. This thesis provides the internal architecture of the proposed RSU using Model-Based Engineering concept, where the RSU is mod eled in Architecture Analysis and Design Language (AADL). We provide an AADL model that captures the internal structure of the RSU, its components and their inter actions. We provide experiments on the model’s tasks execution using gem5 simulator depending on different configurations. We used gem5 simulation results for scheduling properties of the AADL model in order to present scheduling tests and latency analy sis. In addition, we present scheduling simulations using AADL Inspector. Test results show that our RSU model is schedulable with low processor utilization factors, and provides incident detection and reporting in under three minutes.
Real-time embedded system modeling by introducing hardware-in-the-loop concept to SystemC
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2010., 2010.) Fennibay, Doğan.; Yurdakul, Arda.
As the demand for interaction of embedded systems with other systems is constantly increasing, the need to extend the model of the embedded system to include the other systems that are being interacted with is increasing, too. This results in degraded accuracy of the whole model and increased modeling e ort. New modeling techniques have to be developed to reduce design e ort without decreasing overall system accuracy. On the other hand, complexity and time-to-market constraints demand early simulation, veri cation, and architectural exploration of systems. Hence, in this dissertation, a new design concept and new methods have been proposed to apply the hardware-in-the loop technique to the eld of hardware/software co-design of industrial embedded systems using SystemC as the modeling environment. First of all, the hybrid channel has been conceptualized to clearly de ne the communication between real and virtual (modeled) subsystems. For real to virtual communication, novel methods have been developed for incorporating external events to the SystemC simulation. Additionally, a method has also been proposed for generating concurrent outputs from virtual to real subsystems as timely as possible. SystemC kernel has been patched for hard real-time execution and the underlying operating system has been ameliorated to guarantee an upper bound for the overall system latency. Furthermore, a mathematical model has been set up to estimate the execution performance of a given model. The performance of the proposed set of methods has been experimented on some industrial embedded systems. A stable operating frequency of 10 KHz and an I/O performance of sub-millisecond round-trip time over Ethernet have been observed. In an experiment to observe the method's performance in a real-life environment, a non-timed transactionlevel model of a BACnet Broadcast Management Device (BBMD) interacting with real devices outperformed a competing real system up to 80 times in maximum response time.
Reconfigurable network-on-chip (NoC) architectures for embedded systems
(Thesis (Ph.D.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2015., 2015.) Bayar, Salih.; Yurdakul, Arda.
Communication architectures such as Point-to-Point (P2P) and shared bus are poorly scalable as the number of cores or the communication volume increase. Networkon- Chip (NoC) has been proposed to reduce power consumption and has been widely adopted by the System-on-Chip (SoC) community. Yet, NoCs occupy more area and consume more power as the size of network increases. In this thesis, we propose a novel dynamic reconfigurable P2P (DRP2P) communication architecture for reconfigurable embedded systems, which is an alternative to the conventional NoC architectures. In DRP2P, interconnects are reconfigured on-the-fly as new communication requests arrive at the system. In embedded applications running on the multi-core systems, the traffic flow is usually known. Hence, DRP2P is very suitable for embedded systems. DRP2P is inspired from both P2P interconnects and NoC architecture. If the traffic flow is known in advance, it works as fast as P2P while reconfiguration process is done at the time of computation. Thus, next communication scenario can be established before communication starts. Since the reconfigurable wiring area in DRP2P is proportional to the network size, it is as scalable as NoC. In order to achieve reconfiguration efficiently, we developed three different dedicated self reconfiguration engines. The latest version of these engines is exploited in DRP2P architecture. DRP2P gives better results than conventional NoCs if the physical placement of cores on the embedded system is done properly by utilizing mapping and routing algorithms. Hence, fast and heuristic mapping and routing algorithms are also designed in the scope of this thesis. Experimental evaluations have shown that DRP2P outperforms conventional NoCs even in the worst case scenario as the amount of data in on-chip communication increases.
RH (+): the model for high-level embedded system design on run-time reconfigurable hardware
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2007., 2007.) Kurumahmut, Bayram.; Yurdakul, Arda.
The process of embedded system design on recon gurable architectures needs smart solutions to reduce the cost of development life-cycle and to use resources ef- ciently at run-time. However, the current solutions (SystemC/xtUML), which are extended from the traditional languages (C++/UML), are insu cient for that. Ine ciency occurs due to: detailed operator de nition requirement, forcing user to pay attention low-level design problems at higher levels, complex hardware abstraction procedures, misguiding user during mapping software to hardware, not permitting user to de ne constraints at the level having software intermediate representation, and outputs lacking of performance from high levels to lower levels. Therefore, the traditional methods must only be used for what they are designed, in order to bene t from them e ciently. In this thesis, we propose: (1) RH(+); a brand new high level embedded system design model for run-time recon gurable architectures, solving the aforementioned ine ciency problems, (2) LRH(+); a brand new design language which is not extended from any traditional languages, (3) FRH(+); the framework meeting RH(+) requirements. In our work, we have the tools for developing board support package, de ning miscellaneous operators, generating graphs for user interactions, pro ling, resource scheduling, nding possible paths with their execution delays, and run-time emulation of recon gurable hardware.
SIMDify :|framework for application specific SIMD-processing with RISC-V scalar instruction set
(Thesis (M.S.) - Bogazici University. Institute for Graduate Studies in Science and Engineering, 2021., 2021.) Şarkışla, Mehmet Alp.; Yurdakul, Arda.
Most of the hardware accelerators communicate with the processor via custom instructions. Since custom instructions are not standardized, each accelerator requires a di erent compiler and user code, which can be a tedious process for the user. To reduce the user burden, we propose a parallel programming framework called SIMDify, which generates single-instruction-multiple-data (SIMD) processors that can achieve SIMD processing without using custom instructions. SIMDify takes an application machine code compiled for scalar RISC-V ISA and simulates it to determine the SIMD processing regions. Then, SIMDify con gures and generates the application-speciffic SIMD processor that executes scalar RISC-V instructions concurrently on the SIMD datapath. SIMD processor consists of a single master and multiple slave processing elements (PE). Slaves focus on SIMD level tasks, whereas the master is responsible for the central control. Proposed architecture is the first SIMD capable RISC-V processor designed in HLS and can operate with a faster clock frequency than the existing SISD RISC-V HLS cores. SIMDify relieves the user from using custom instructions with rigid programming models and o ers a exible solution. The processor is designed and tested in Vivado High Level Synthesis 19.2. It operates at 78 MHz on Zynq Zedboard FPGA. Master PE uses 5% and each slave uses 3.5% of FPGA resources. Test results show that execution time can be improved by 8.5x with 9 slaves and 19x with 29 slaves.
SIxD: |a configurable and customizable SISD/SIMD microprocessor soft core
(Thesis (M.S.)-Bogazici University. Institute for Graduate Studies in Science and Engineering, 2006., 2006.) Sönmez, Nehir.; Yurdakul, Arda.
The demand for FPGA-based processor cores increases as more embedded systems are built on FPGA platforms. The flexible choice is the "soft" processor IP core, a processor implemented in the reconfigurable logic of the FPGA. Commercial and academic soft processors have been widely deployed, but most are synthesized implementations of legacy instruction sets that fill up large and costly FPGAs. With high performance media processing applications dominating the embedded scene, and many modern microprocessors adopting the SIMD technology, it is a fact that soft cores could also make use of array and vector processing functionality. This thesis presents the SIxD, a configurable CPU soft core designed to combine computer architecture basics to exploit instruction level parallelism with the flexibility and customizability advantages of soft cores realized on reconfigurable fabric. With run-time configuration options such as variable data space, customizable instruction set, and array processing capabilities, the SIxD is a novel soft core that can be configured to fit in as low as a forty thousand system gate FPGA, or offer higher performance array processing on bigger FPGAs.

Browsing by Author "Yurdakul, Arda."

Results Per Page

Sort Options