Youness H.

Data Engineer

415$
/day
Freelancer
3 years
Paris, FRANCE

My experience

More

BPCE SANovember 2017 - Present

- Setting up BPCE-SA Datalake – HORTONWORKS

- Data ingestion framework development from scratch (Java 8, Spring, Scala, Spark2, Hive, Hbase, HDFS, git, jenkins, XlDeploy, KMS, Ranger…) :

     - Software architecture Design

- Development of the different bricks of the framework

- Establishment of the Devops chain to ensure continuous integration and deployment

- Ensure data security throughout the data manipulation chain (encryption before and after ingestion) with access management to exposed data

- Detecting Check’s Frauds (Azure, Databricks, Python) :

 - Encryption of checks on premise beforse sending them to Azure Storage

 - Ensuring the good transmission of millions of checks from on-premise to Azure blob Stroage

 - Ensuring data(checks) decryption on Databricks and the connection to Key Vault

 - Optimimizing the Loading of millions of checks to Spark dataframes in order to decrypt them in memory.

 - Implementing (on-premise) data anonymization for sensitive data

- Administration of the services of Hortonworks platform :

- HDFS storage management

- Management of the YARN resource manager (queues …)

- Spark Tuning

- Integration of Ambari with LDAP

- Configuration of high availability and cluster kerberization

- Configuration of Ranger policies (HDFS, Hive) and KMS encryption

- Ingestion of different business data (risks and finance) within BPCE SA in the datalake following a threelayer design on HIVE (RAW (raw data), ENHANCED (prepared data), EXPOSURE (data to be exposed)) (Spark , Hive, Scala, PySpark(python) )

- Development of a tool that implements non-regression tests for large positional data files ( for risks and finances) between different closing dates (Spark, scala)

My stack

Big Data

Big Data, Hive, Spark

Protocols

LDAP

Software testing

Regression testing

Analysis methods and tools

Apache Maven, DevOps

Others

Continuous Integration

Environment of Development

Maven

Application servers

Zookeeper

IT Infrastructure

Git

Technologies

AWS, HDFS

Computer Tools

Microsoft Excel

Middleware

Jenkins

Languages

HQL, SQL, Scala, Java

My education and trainings

Online Certicates - --

Master IFI (Computer Science) - Polytech2016 - 2017

Development of an LTE media trace analysis tool (voice-video) (Java 8): - ORANGE Labs Lannion2017 - 2017

Improving the caching policy in Spark1.6 (LRU, «eviction policy») - INRIA SOPHIA ANTIPOLIS2016 - 2017

Engineering degree in Computer Science - Institut National des Postes et Télécommunications (INPT)2014 - 2017