Nos tutelles

CNRS

Nos partenaires


Accueil > Publications > Thèses > Archives Thèses > Thèses 2013 - 2014

ARIDHI Sabeur


Distributed Frequent Subgraph Mining in the Cloud

Vendredi 29 novembre 2013 - 9 h 00 - Amphi Bruno Garcia - ISIMA

Recently, graph mining approaches have become very popular, especially in certain domains such as bioinformatics, chemoinformatics and social networks. One of the most challenging tasks in this setting is frequent subgraph discovery. This task has been highly motivated by the tremendously increasing size of existing graph databases. Due to this fact, there is urgent need of efficient and scaling approaches for frequent subgraph discovery especially with the high availability of cloud computing environments.
This thesis deals with distributed frequent subgraph mining in the cloud. First, we provide the required material to understand the basic notions of our two research fields, namely graph mining and cloud computing. Then, we present the contributions of this thesis.
In the first axis, we propose a novel approach for large-scale subgraph mining, using the MapReduce framework. The proposed approach provides a data partitioning technique that consider data characteristics. It uses the densities of graphs in order to partition the input data. Such a partitioning technique allows a balanced computational loads over the distributed collection of machines and replace the default arbitrary partitioning technique of MapReduce. We experimentally show that our approach decreases significantly the execution time and scales the subgraph discovery process to large graph databases.
In the second axis, we address the multi-criteria optimization problem of tuning thresholds related to distributed frequent subgraph mining in cloud computing environments while optimizing the global monetary cost of storing and querying data in the cloud. We define cost models for managing and mining data with a large scale subgraph mining framework over a cloud architecture. We present an experimental validation of the proposed cost models in the case of distributed subgraph mining in the cloud.

Jury :
Anne LAURENT - Professeur, LIRMM - Université de Montpellier 2, Rapporteur
Takeaki UNO, Professeur, National Institute of Informatics, Japan, Rapporteur
Jérome DARMONT - Professeur, ERIC - Université de Lyon 2, Examinateur
Mohamed Mohsen GAMMOUDI - Professeur, Université de Manouba, Tunisie, Examinateur
Laurent D’ORAZIO - Maître de conférences, LIMOS - Université Blaise Pascal - Clermont-Ferrand, Co-encadrant
Mondher MADDOURI - Professeur, LIPAH - Université de Manouba - Tunisie, Co-Directeur de thèse
Engelbert MEPHU NGUIFO - Professeur, LIMOS - Université Blaise Pascal - Clermont-Ferrand, Directeur de thèse