## Lectures

#### Machine Learning and Visual Knowledge Discovery

*Boris Kovalerchuk, Computer Science Department, Central Washington University, USA*

This module will present methods of combining Machine Learning and Visual Knowledge Discovery techniques that enhance both analytical and visualization methods for discovering hidden patterns in multidimensional data. The fundamental challenge for visual discovery in multidimensional data is that we cannot see n-D data with a naked eye and need visual analytics tools (“n-D glasses”). Often multidimensional data are visualized by non-reversible, lossy dimension reduction methods such as Principal Component Analysis (PCA). While these methods are very useful, they can eliminate important information critical for knowledge discovery in n-D data before starting discovering n-D patterns. The hybrid methods that combine machine learning methods with reversible with non-reversible visualization methods open new wide opportunities for knowledge discovery in n-D data. This module will demonstrate the theory and applications of such hybrid methods.

#### Hyperparameter Optimization in Machine Learning

*Razvan Andonie, Computer Science Department, Central Washington University, USA*

Nearly all model algorithms used in machine learning involve two different sets of parameters: the training parameters and the meta-parameters (also known as hyperparameters). While the training parameters are learned during the training phase, the values of the hyperparameters have to be specified before the learning phase. For instance, the hyperparameters of neural networks typically specify the architecture of the network (number and type of layers, number and type of nodes, etc). We would like to find a set of hyperparameter values which gives us the best model for our data in a reasonable amount of time. This process is called hyperparameter optimization. Lately, there has been high interest in the area of hyperparameter optimization, especially since the rise of deep learning. In deep learning, the number of hyperparameters and the magnitude of the training sets have increased dramatically. This made hyperparameter optimization a challenging task. The module will present methods for hyperparameter optimization, in theory and practice.

#### Text Analytics

*Parisa Rastin, Computer Science Laboratory of Paris Nord (LIPN), Paris 13 University, France*

Text Mining or Text Analytics is one of the automatic language processing tasks. It involves processing large volumes of textual data so that the content of these data can be analyzed without having to read it. Different types of applications have emerged, such as text segmentation or opinion mining. In this lecture, we give an introduction of text mining with the principal notions like BOW or TF/IDF. We also give a quick presentation of word embedding approaches (Word2vec and FastText). Moreover, introducing some supervised and unsupervised algorithms adapted to text is planned. At the end of this lecture we will present a real application in marketing.

The practical work will illustrate these notions using specialized Python packages.

The module will cover the following notions:

- Introduction to text mining
- Bag of words
- Text Preprocessing
- Feature Extraction (TF/IDF)
- Word embedding (Word2vec, fastText)
- Which Distance metrics?
- Example of supervised and unsupervised algorithms adapted to text
- Application in marketing: Users Profiling

#### An introduction to unsupervised learning

*Jérémie Sublime, ISEP - Ecole d'ingénieurs du numérique, Paris, France*

This lesson is an introductory class to unsupervised learning. It first aims at showing the basic differences between supervised and unsupervised learning and introducing the key notions of similarity and clusters of similar data. Then it details the main families of clustering algorithms with their advantages and inconveniences in term of complexity, efficiency and types of data they can tackle. Finally, the course mentions a few of the many open problem in clustering such as evaluating and comparing clustering results.

This class covers: density based clustering methods, hierarchical methods, prototype based methods, probabilistic methods, spectral clustering and open problems in clustering.

#### Methods for dimensionality reduction and data projection including linear and non-linear approaches

This module (2h) will present the methods for dimensionality reduction and data projection including linear and non-linear approaches.

In the exploratory data analysis of high dimensional data one of the main tasks is the formation of a simplified, usually visual, overview of data sets. This can be achieved through simplified description or summaries, which should provide the possibility of discovery or identification of features or patterns of most relevance. Clustering and projection are among the examples of useful methods to achieve this task. On one hand classical clustering algorithms produce a grouping of the data according to a chosen criterion. Projection methods, on the other hand, represent the data points in a lower dimensional space in such a way that the clusters and the metric relations of the data items are preserved as faithfully as possible.

The practical lecture (3h) consists in the use of these models in R language.

The outline of the lecture is the following:

- Introduction
- Data visualization methods and theirs application domain
- Clustering in high-dimensional data
- Methods for dimensionality reduction (features selections approaches vs features extraction)
- Linear models for dimensionality reduction :
- Principal Component Analaysis (PCA)
- Linear Discriminant Analysis (LDA)
- Multi-Dimensional Scaling (MDS)
- Non-linear models for dimensionality reduction
- Isometric feature mapping (Isomap)
- Locally Linear Embedding (LLE)
- Self-Organizing Map (SOM)

#### Topological machine learning including deterministic and probabilistic approaches

*Nistor Grozavu, Computer Science Laboratory of Paris Nord (LIPN), Paris 13 University, France*

This lecture introduces the topological machine learning including deterministic and probabilistic approaches. The topological learning is one of the most known technique, which allow clustering, and visualization simultaneously. At the end of the topographic learning, the "similar" data will be collect in clusters, which correspond to the sets of similar observations.

Another discussion in the lecture will be use of the Topological approaches for models combinations and collaboration between the models. The aim of collaborative clustering is to reveal the common underlying structure of data spread across multiple data sites by applying clustering techniques. Learners based on different paradigms can be combined for improved accuracy. Also, the effectiveness of these methods will be discussed considering the concepts of diversity and selection of these approaches.

The practical lecture consist on the use of these models in Matlab/Octave language (Python can also be used).

- Introduction: topological machine learning
- Self-Organizing Maps (SOM)
- Topological weighted clustering
- Generative Topographic Mapping (GTM)
- Relational Topological Mapping
- Ensemble Machine Learning
- Collaborative Machine Learning
- Diversity Analysis in Ensemble & Collaborative Machine Learning
- Applications

#### An overview of the main methods for matrix and tensor decompositions

*Basarab Matei, Computer Science Laboratory of Paris Nord (LIPN), Paris 13 University, France*

In many fields today, multiple sets of data are readily available. These might either refer to multimodal data where information about a given phenomenon is obtained through different types of acquisition techniques, or multiset data where the datasets are all of the same type but might be acquired from different subjects, at different time points, or under different conditions. Models based on matrix or tensor decompositions provide attractive solutions for fusion of both multi-modal and multiset data. These models minimize the assumptions—which is attractive as very little can be assumed about the relationship among multiple datasets—and at the same time, they can maximally exploit the interactions within and across the datasets.

This class will provide an overview of the main methods for matrix and tensor decompositions and the models that have been successfully applied for fusion of multiple datasets. An important focus is on the interrelated concepts of uniqueness, diversity, and interpretability. Diversity refers to any structural, numerical, or statistical property or assumption on the data that contributes to the identifiability of the model, which is key for interpretability, the ability to attach a physical meaning to the final decomposition. The relevance of these concepts as well as the challenges that remain are highlighted through a number of numerical and practical examples in various fields.

The practical lecture will be done in Matlab/Octave language.

Syllabus:

- Basic matrix/tensor decompositions, identifiability & uniqueness
- SVD
- NMF decomposition
- ICA and IVA
- Other relevant matrix/tensor decompositions

- Models for fusion of multiset and multimodal data
- Coupled matrix and tensor decompositions
- Nonlinear extensions

- Performance evaluation, model comparison/selection

Examples in medical imaging, video processing, and recommender systems among others.

#### Unsupervised learning with Artificial Neural Networks

*Guénaël Cabanes, Computer Science Laboratory of Paris Nord (LIPN), Paris 13 University, France*

This module introduces and develops several families of Unsupervised Neural Networks. These types of networks can be trained for different tasks: clustering, distribution learning, data generation... We will start with “simple” structures, develop topographic models and finish with unsupervised deep learning. Several examples of concrete applications will be presented.

The practical work will enable the participants to get hands-on experience with the different families of models seen in the module, using specialized Python packages.

The module will cover the following approaches:

- Neural Gas (NG)
- Growing Neural Gas (GNG)
- Self-Organizing Maps (SOM)
- Generative Topographic Mapping (GTM)
- Adaptive resonance theory (ART)
- Restricted Boltzmann machine (RBM)
- Auto-Encoders (AE)
- Generative Adversarial Network (GAN)