01/02/2019 – Ongoing
- Flavio Vella (PI), Faculty of Computer Science at Free University of Bozen-Bolzano, Bolzano, Italy.
- Marco Cianfriglia, Department of Mathematics at Roma Tre University, Rome, Italy.
- Anton Lokhmotov, Dividiti, Cambridge, UK.
- Grigori Fursin, Dividiti and cTuning foundation, Paris, France.
- Cedric Nugteren, Tom Tom, Amsterdam, Netherlands
Funded by: Free University of Bozen-Bolzano
Efficient high-performance libraries often expose multiple tunable parameters to provide highly optimized routines. These can range from simple loop unroll factors or vector sizes to algorithmic choices, given that some implementations can be more suitable for certain devices by exploiting hardware characteristics such as local memories and vector units.
Traditionally, such parameters and algorithmic choices are tuned and then hard-coded fora specific architecture and for certain characteristics of the inputs. However, emerging applications such as Deep Learning, Graph Analytics, Scientific Simulation are often data-driven, thus traditional approaches are not effective across the wide range of inputs and architectures used in practice.
This project aims at investigating a new perspective of adaptive and auto-tunable libraries for data-driven applications by applying Machine Learning techniques. Specifically, the goal is building predictive models in order to accelerate one of the most ubiquitous routines: GenericMatrix Multiplication (GEMM). The problems related to the dataset generation, the quality of the models in terms of accuracy and performance as well as the code generation, which correspond to the model implementation to plug-in into a target library, will be the object of studies.
As a use-case, we are going to focus on a multi-platform BLAS library (CLBlast) since it provides two different implementations of GEMM and several tunable parameters. A sfor experimental setup, we are planning to validate the identified approach on two differentGraphics Processing Unit (GPU) architectures: a high-end NVIDIA GPU and an embedded and power-efficient ARM Mali GPU.
Keywords: Embedded systems, Graphics processors, Machine learning, Parallel algorithms
Smart and Parallel Graph Analytics System
01/08/2019 – 31/07/2021
- Flavio Vella, Free University of Bozen (PI)
- Ognjen Savkovic ́, Free University of Bozen (co-PI)
- Aydın Buluç, Lawrence Berkeley National Lab
- Raffaello Potestio, University of Trento
- Filippo Spiga, ARM Research
- Bruno Carpentieri, Free University of Bozen
- Gohui Xiao, Free University of Bozen
- Julien Corman, Free University of Bozen
Funded by: Free University of Bozen-Bolzano
Graphs are a flexible tool to model interactions between discrete entities in a variety of networks from different areas, including Social Network Analysis, Internet of Things, Artificial Intelligence or Biology. Network analysis nowadays requires potentially complex computations (traversal, connectivity, centrality or community detection) over graphs which may contain billions of edges and vertices. Such procedures can in theory be executed in a reasonably efficient way, on massively parallel computing systems, thanks to ad-hoc and highly-tuned solutions. But such solutions lack both flexibility and a higher-level interface, which is why they are hardly used by end-users, in particular by data scientists.
As a result, a number of graph analytics frameworks (GraphX, CombBLAS, Giraph, and Galois among others) have emerged, which provide more flexible solutions, namely implementations of primitive operations that can be combined to design complex graph analytics algorithms. However, for intensive needs, designing efficient procedures in such frameworks still requires highly technical skills in graph algorithmics, beyond the background of most data scientists. On the other hand, graph databases (e.g., Neo4j or Tiger Graph) provide user-friendly libraries of graph analytics algorithms, but these offer very limited optimization opportunities. Hence the need for a framework that: (i) provides a user-friendly analytics query language, improving productivity for non-experts, (ii) is flexible enough to allow (manual and automatic) optimization of sequences of operations executed over the same graph, (iii) can be efficiently implemented exploiting modern parallel architectures.
The goal of this project is to fill this gap. More specifically, we plan to:
(i) design a user-friendly declarative language that can express complex graph analytics procedures, and is amenable to automatic algebraic optimization (in the spirit of SQL query optimization);
(ii) select (and extend) a graph programming model (e.g., GraphBLAS) to which operators of our declarative language can be mapped;
(iii) provide efficient implementations for building blocks of this graph programming model, using emerging energy-efficient parallel computing systems (namely recent ARM architectures);
(iv) validate our framework thanks to a use case in biophysics, enabling data scientists to analyze more efficiently properties of large biomolecules represented as graphs.
Keywords: Graph Analytics, Knowledge Graphs, GraphBLAS, Parallel algorithms, ARM