Marc P.


732 dollar
1 year

My experience


DataScientest.comFebruary 2019 - May 2019

Creating a recommendation engine for movies withon Python and Spark.
algorithm is based on the 20 million ratings awarded by 30,000 users to
100,000 movies as well as descriptive data on movies and users.

Main steps implemented:
- Selection, cleaning, creation of new variables
- Descriptive analyzes of the variables
- Creating K-Means Clusters by User Group and Movies
- Regressions Linear, Ridge, Lasso
- Collaborative filtering recommendation algorithms, alternating least square, stochastic gradient descent.

Bouygues TelecomMarch 2018 - July 2018

Analyse du déploiement de la fibre

Principales étapes:
Enrichissement et analyse d'une bases de données d'un parc de 30 000
équipements et de 50 variables avec Excel, VBA et Microsoft ACCESS.
- Production d'un rapport de cohérence.
- Production d'indicateurs de performance.

EDFApril 2017 - September 2017

Design of an energy efficiency tool to identify customers in overconsumption in professional circles.

Method implemented: Statistical analysis of time series of electrical consumption. Modeling of load curves. Creation of a web graphical interface (Shiny under R) to share the results.

Mission creation of an energy efficiency tool for professionals. This tool establishes the theoretical consumption of users and compares it with actual consumption to identify overconsumption.

Hardware: R and VBA software.
Database of 500 clients. For each client about 20 variables (Area, Number of Employees, Geographical area ....) And load curves about 3000 data.

-Selection of useful data
-Group of data
-Cleaning of the data

On R
-Normalization of data
-Classification with machine learning
-Proposal of models
-Test of models
-Programming a web graphical interface (R Shiny) to share the results

Texas A&M Institute for Preclinical Studies (TIPS)September 2015 - December 2015

Digital design of a research unit (NITRA SOLID) from Solidworks.
Mechanical and thermal simulations to determine the behavior of the device according to various constraints.

My stack

VBA, Teamwork, SQL, Spark, SolidWorks, Simulink, R Language, Python 3.5, Python, Oozie, NumPy, MS Office, MongoDB, Microsoft PowerPoint, Microsoft Excel, MapReduce, Machine Learning, Hadoop, Catia, C/C++, Big Data, Apache Web Server, Amazon Web Services (AWS)