📰 Publications#

International journals#

Demko et al. [DBV+24]

In this article, we present a new approach to tackle lattice generation for complex and heterogeneous data using the concept of convexity. This is a work that we have already carried out, albeit intuitively where we proposed the NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ algorithm for generating a meet-semilattice of concepts based on suitable descriptions and strategies. Now, we revisit the essential properties of our description spaces using a stronger formalism based on the properties of closure operators.

Demko et al. [DBF+20]

In this article, we present a new data type agnostic algorithm calculating a concept lattice from heterogeneous and complex data. Our NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ algorithm is first introduced and proved in the binary case as an extension of Bordat’s algorithm with the notion of strategies to select only some predecessors of each concept, avoiding the generation of unreasonably large lattices. The algorithm is then extended to any type of data in a generic way. It is inspired by the pattern structure theory, where data are locally described by predicates independent of their types, allowing the management of heterogeneous data.

Book chapters#

Bertet et al. [BDB+22a]

In this article, we recall the NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ algorithm we developed to study concept lattices using first-order monadic predicates. This new approach unifies and simplifies the pattern structure theory proposing to immerse context objects in a dedicated predicate space having the properties of an inference system. This way of managing objects and attributes (monadic predicates) joins the concepts developed in the theory of generalized convex structures, in particular that of half-spaces. We show how this paradigm can be used for boolean, categorized, numerical, character string and sequential data on well-known examples of literature in order to generate lattices whose size is controlled by the user’s choices.

International conferences#

Amoury et al. [ABM25]

Extracting common behaviours from a group of players in a serious game is interesting for game designers to understand how said players perform during the game, and also indicates some points for game improvement. Process mining is an interesting field for generating models based on the users traces, models that can be used to improve the studied process. However, when there is a large number of logs, the generated models are incomprehensible, which means that the logs need to be separated into different groups in order to obtain understandable models. In this article, we propose a method based on Formal Concept Analysis, the NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ and the Galactic framework to generate concepts that represent clusters of the data based on the prefix of the trace. Traces are formalized into sequences of actions then prefix common subsequences are computed to group players. Experimentation shows that our method successfully extracted common behaviour shared between players, but also strictly identical behaviour. The knowledge extracted from the traces can then be used to improve the game.

Wannous et al. [WBV24]

We define the term semantic trajectories of moving objects as a sequence of spatio-temporal points with associated semantic information. Spatio-temporal points are directly generated by sensors that capture the position of a moving object in time.

In this paper, we work on a marine mammal trajectory case study. We consider the seals’ trajectories to understand their behavior within groups and identify their activities simultaneously in the same place. We define the activities of mobile objects in the form of rules given by the domain expert. To gain knowledge of trajectories, we use the GALACTIC platform, which is a new platform based on Formal Concept Analysis (FCA) for computing a concept lattice from heterogeneous and complex data. Data in GALACTIC are described by predicates according to their types. Here, we will use interval sequence plugins to analyse the trajectories of seals, where interval sequences are represented by a set of time intervals in which a seal performs an activity in a geographical zone. Finally, the results show the simultaneous activities of a group of seals.

Boukhetta and Trabelsi [BT23]

Modeling user interaction in information systems (IS) using Process Mining techniques is an intriguing requirement for designers looking to optimize the use of various IS functionalities and make stored resources more accessible. Discovered models can thus be used in future work to present a set of recommendations to IS users. However, the large number of generated logs or user’s traces result in complex models. To address this problem, in this paper, we propose a new methodology for grouping user traces prior to modeling using Formal Concept Analysis. The clustering method relies on the GALACTIC framework to generate relevant concepts, which are then used to select a specific concept for each trace using a distance measure. Considering a trace as a sequence, the proposed method generate concepts based on maximal common sub-sequences. The experimental part shows that our method successfully found the original clusters on a simulated dataset.

Richard et al. [RSB+22]

The rising number of different kinds of data that can be used to describe a human trajectory (Such as GPS Coordinate, GSM, RFID, RSSI…) put in the spotlight the semantically rich trajectory. A semantic trajectory annotates semantic knowledge directly into raw data based on features of the studied area such as point of interest or weather conditions. One of the challenges of mobility studies nowadays is to find the right data model to shape all those data coming from different source into a framework flexible enough to multiply the contextual data that can be used; where contextual data are knowledge coming from external data source (public city dataset, web pages, national weather services etc….). Such data models are the key component of mobility studies, but oftentimes lose the computational aspect of trajectories. In this paper, we will use the semantically rich trajectory as a way to analyse behavioral data enriched by contextual knowledge as this issue has rarely been addressed in the state of the art. We will study the use of formal concept analysis and pattern mining as a way to compute complex sequential patterns in a dataset of semantic trajectories by using the NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ algorithm. This kind of formal concept analysis allows an interactive analysis between individuals path and contextual data resulting in a hierarchy of spatio-temporal clusters where each cluster contains a specific pattern depicting the trajectories within.

Boukhetta et al. [BDB+21]

In this paper, we are interested in temporal sequential data analysis using GALACTIC, a new framework based on Formal Concept Analysis (FCA) for calculating a concept lattice from heterogeneous and complex data. Inspired by pattern structure theory, GALACTIC mines data described by predicates and is composed of a system of plugins for an easy integration of new data characteristics and their descriptions. Here we use the GALACTIC library to analyse temporal sequential data, where each item in the sequence has an associated timestamp \(t_i: s=\left\langle(t_i,x_i)\right\rangle_{i\leq n}\). We introduce descriptions and strategies dedicated to temporal sequences, and new unsupervised measures. We show on some datasets that the distance constraint subsequence strategy allows to generate good concept lattices.

Boukhetta et al. [BDRB20]

In this paper, we are interested in sequence mining using GALACTIC, a new library based on Formal Concept Analysis (FCA) for generating a concept lattice from heterogeneous and complex data. Inspired from pattern structure theory, data in Galactic are described by predicates according to their types and a system of plugins which allows an easy integration of new characteristics, descriptions and strategies. We present new plugins to describe and analyze a sequential dataset with predicates: the KCS description (where a set of sequences are described by the set of K-Common Subsequences), the MCS description (where Maximal Common Subsequences are used to describe a sequential dataset) and the PCS description (where the Prefix Common Subsequence is used). Experimentation on both real and synthetic datasets shows the effectiveness of our plugins in terms of the number of generated predicates and concepts.

National conferences#

Savarit et al. [SDB24]

Dans cet article, nous présentons une chaîne de traitement générique pour l’analyse interactive de séries temporelles. En définissant des propriétés temporelles, nous mettons en place une forme de représentation basée sur la logique temporelle appelée chronogramme, pouvant être traduit en séquence d’intervalle temporelle pour une analyse explicable et hétérogène par l’Analyse Formelle des Concepts en utilisant l’outil GALACTIC.

Savarit et al. [SBD23]

Dans ce papier nous nous intéressons à la possibilité de transcrire les séries temporelles en séquences temporelles en conservant des caractéristiques issue de la série, de manière à réduire la volumétrie des données et profiter de l’analyse multiséquentielle et hétérogène apportée par des outils déjà existant de l’analyse de séquence. Nous utilisons en exemple d’application les données issues des capteurs de niveau de la marée du Port

Boukhetta et al. [BDB+22b]

In this paper, we are interested in sequence mining using GALACTIC, a new library based on Formal Concept Analysis (FCA) for generating a concept lattice from heterogeneous and complex data. Inspired from pattern structure theory, data in Galactic are described by predicates according to their types and a system of plugins which allows an easy integration of new characteristics, descriptions and strategies. We present new plugins to describe and analyze a sequential dataset with predicates: the KCS description (where a set of sequences are described by the set of K-Common Subsequences), the MCS description (where Maximal Common Subsequences are used to describe a sequential dataset) and the PCS description (where the Prefix Common Subsequence is used). Experimentation on both real and synthetic datasets shows the effectiveness of our plugins in terms of the number of generated predicates and concepts.

Workshops#

Demko et al. [DBR+22]

In a recent paper, we presented a new pattern discovery algorithm, NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ, in order to take into account complex and heterogeneous data using Formal Concept Analysis. We implemented this algorithm and developed a python 3 library whose acronym GALACTIC means GAlois LAttices, Concept Theory, Implicational systems and Closures. It is opened to the community using a BSD-3 license and its architecture allows the writing of plugins to take into account new datatypes. In this article we will present the architecture of our software solution, we will explain how to add new plugins to the core of our system by giving the UML diagram of each kind of plugins and we will give some examples of plugins developed within our team.

Bertet and Demko [BD22]

Formal Concept Analysis is a mathematical formalism offering many methods that can be used in various fields of computer science. FCA highlights the structure of lattice or concept lattice, defined for binary or categorical data, where a concept is a pair composed of a maximal subset of objects together with their shared data. The whole set of concepts is naturally organized in a hierarchical graph, called the concept lattice. The fundamental theorem in FCA establishes a bijection between lattices and reduced binary datasets via concept lattice generation, called the Galois connection. A nice result establishes that the composition of the two operators of a Galois connection is a closure operator. The notion of closure operator is central in lattice theory. From the original binary formalism of FCA, different extensions to non-binary data have been proposed, by establishing that the Galois connection between a data space and a description space is maintained, as long as the description space verifies the lattice property. We recently introduced a new algorithm, NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ, that is capable of generating concepts from complex and heterogeneous data with a generic description to provide predicates describing a subgroup of objects. We have observed that the generic use of predicates describing subgroups correspond to generalized convex hulls. In this talk, we will present some links between our algorithm and the theory of convex structures.

Bertet et al. [BAD+21]

In this tutorial, we pay tribute to the memory of Vincent Duquenne, who left us so prematurely. Vincent is one of the two men of the canonical basis.

Boukhetta et al. [BRDB20]

In this paper, we are interested in sequential data analysis using GALACTIC, a new library based on Formal Concept Analysis (FCA) for calculating a concept lattice from heterogeneous and complex data. Inspired by the pattern structure theory, data in GALACTIC are described by predicates according to their types and a system of plugins allows an easy integration of new characteristics and new descriptions. We present new ways to analyse interval-based sequences, where items persist in time. Here we address the question of mining relevant sequential patterns, describing a set of sequences, by maximal common subsequences, or shortest supersequences. Experimentation on two real sequential datasets shows the effectiveness of our plugins in term of size of the lattice and of running time.

Oral communications#

Bertet and Demko [BD23]

Présentation de l’éco-système GALACTIC à un groupe de travail.

phD thesis#

Richard [Ric23]

This thesis focuses on the behavioral study of tourist activity using a generic and interactive analysis approach. The developed analytical process concerns the tourist trajectory in the city and museums as the study field. Experiments were conducted to collect movement data in the tourist city using GPS signals, thus enabling the acquisition of a movement trajectory. However, the study primarily focuses on reconstructing a visitor’s trajectory in museums using indoor positioning equipment, i.e., in a constrained environment. Then, a generic multi-aspect semantic enrichment model is developed to supplement an individual’s trajectory using multiple context data such as the names of neighborhoods the individual passed through in the city, museum rooms, weather outside, and indoor mobile application data. The enriched trajectories, called semantic trajectories, are then analyzed using formal concept analysis and the GALACTIC platform, which enables the analysis of complex and heterogeneous data structures as a hierarchy of subgroups of individuals sharing common behaviors. Finally, attention is paid to the RᴇᴅᴜᴄᴇᴅCᴏɴᴛᴇxᴛCᴏᴍᴘʟᴇᴛɪᴏɴ a lgorithm that allows for interactive navigation in a lattice of concepts, allowing the data analyst to focus on the aspects of the data they wish to explore.

Boukhetta [Bou22]

A sequence is a sequence of ordered elements such as travel trajectories or sequences of product purchases in a supermarket. Sequence mining is a domain of data mining that aims an extracting frequent sequential patterns from a set of sequences, where these patterns are most often common subsequences. Support is a monotonic measure that defines the proportion of data sharing a sequential pattern. Several algorithms have been proposed for frequent sequential pattern extraction. With the evolution of computing capabilities, the task of frequent sequential pattern extraction has become faster. The difficulty then lies in the large number of extracted sequential patterns, which makes it difficult to read and therefore to interpret. We speak about deluge of patterns. Formal Concept Analysis (FCA) is a field of data analysis for identifying relationships in a set of binary data. Pattern structures extend FCA to handle complex data such as sequences. The GALACTIC platform implements the NᴇxᴛPʀɪᴏʀɪᴛʏCᴏɴᴄᴇᴘᴛ algorithm which roposes a pattern extraction approach for heterogeneous and complex data. It allows a generic pattern computation through specific descriptions of objects by monadic predicates. It also proposes to refine a set of objects through specific exploration strategies, which allows to reduce the number of patterns. In this work, we are interested in the analysis of sequential data using GALACTIC. We propose several descriptions and strategies adapted to sequences. We also propose unsupervised quality measures to be able to compare between the obtained patterns. A qualitative and quantitative analysis is conducted on real and synthetic datasets to show the efficiency of our approach.