Paul-Vasile M.

Data Engineer

Outsourcing
11 years
Cluj, ROMANIA
Contact us for this profile

My experience

More

3rd Pillar GlobalFebruary 2019 - Present

Data Engineer within a product life cycle management and software product development company.
  • Developing, testing and implementing SQL queries on AWS Redshift.
  • Extending the existing ETL solution (Jenkins->Shell scripts-> Redshift SQL queries) by adding extra flows and/or modify existing ones.
  • Creating data flows for new sources of data.
  • Developing Shell automation scripts that interact with various AWS services, Jenkins, Slack (used for alerting specific user groups in case different processes encounter failures.
  • Developing and maintaining the alerting system (Slack, email) for critical processes.

Process Description and Responsibilities:
For the last two years, the project I was involved into was dedicated for a client from the airline industry. The data collected from the booking platforms is feeding the data warehouse daily and given that lots of marketing campaigns are based on the daily data feeds, ETL is one of the critical processes.

The technologies used for the process:
- Jenkins - acting as an automation server that enables running specific jobs at a specific time, triggering execution of sh scripts.
- AWS S3 - service used for raw data storage. This is where the data is extracted from.
- AWS Redshift -  data warehouse; data is loaded in Redshift and then transformed according to the business logic.
- Shell scripts - all the logic behind the ETL process is based on shell scripts and SQL queries; SQL part is also integrated with sh scripts.

Data structures:
There are two different schemas for the ETL process:
- a staging one, that will hold data as it is in the raw files.
- a production one, that will hold the same data as above, but with some transformations and having different data types. From the tables perspective, both schemas are having the same number of tables.
The production tables are having two more columns, and , used to keep track of when data was loaded in the table.

Process flow:
Data is sent by the client to an internal SFTP server. From this, an AWS Data Pipeline job, copies the daily files (.gz file format) to the S3 bucket. The process is scheduled to run every day, at a specific time, being triggered by a Jenkins job that first checks if all the needed files are available in the S3 location. If one or more files are not available, then the process stops and an alert email is sent for notifying purposes. If all files are in the S3 bucket, the process can start.

Data is extracted from the S3 location, being placed at first in the staging tables.
Staging tables are truncated every day right before the COPY to staging is executed, in order to have a way of checking data in its raw format, if needed.
From staging, data is loaded in the production tables. For this step, shell scripts are being executed, running SQL queries that handle UPSERT logic for each of the production tables. This step is done to avoid having duplicates and also to keep in the production tables the most recent ‘version’ of the data.
Once data is available in the production tables, there are other sh scripts that are handling specific transformations in order to have the data ready for other jobs to pick it up, such as various marketing campaigns, client reports.

Failures and action items:
- data is available but files are holding invalid data > data checks need to be executed to identify the exact column/value that are causing the issue.
- data is available but files are corrupted > data provider is notified.
- data is not available as per the agreed schedule > data provider is notified.

Responsibilities:
  • Daily monitoring of the process, making sure the data goes from S3 to the production tables without any error.
  • In case of any failure, acting as soon as possible.
  • If there is a need to integrate a new file to the existing process, making sure that the flow will not be affected and the changes are according to the current solution.
  • Testing any new file or data source separately from the ETL (if possible) before integrating it, running quality checks and reporting anything that can impact ETL.
  • Developing and updating the SQL logic if there is a clear business requirement.
  • In situations in which an existing flow needs to be replaced with another one, making sure the new feed will follow the standards already in place in terms of S3 availability, raw file formats, types of columns, distinct values for columns holding different codifications. If possible, creating a backup of the data coming from the old feed, as this will be helpful while comparing with the ‘new’ data- document any new implementations to the existing ETL process.
More

GarminOctober 2017 - February 2019

Database Engineer Analyst within a leading, worldwide provider of navigation specializing in GPS technology for automotive, aviation, marine, outdoor, and sport activities.
  • Identified potential sensitive assets that are part of the company's infrastructure and ran a comprehensive scan and validation process.
  • Developed an inventory of sensitive assets.
  • Created and maintained an up-to-date report of all scanned assets together with the vulnerabilities discovered.
  • Handled common database procedures such as upgrade, backup, recovery.
  • Created and optimized the existing SQL scripts in order to refine and automate regular processes.
  • Ensured security and availability of databases.
  • Identified key insights and business opportunities aligned with the client.
  • Troubleshot basic issue reports, reproduced issues, evaluated technical cause and implemented software solutions.
  • Developed and maintained the existing processes in order to identify and answer the client’s needs.
  • Shared domain and technical expertise on other internal projects or cross.
  • Provided training to other peers and team members.
  • Prepared the documentation and specifications.
  • Worked according to the Scrum Agile framework.
More

EvalueserveFebruary 2014 - October 2017

Senior Business Analyst for a global professional services provider offering research, analytics, and data management services.
  • Accessed/connected to client data through provided platforms.
  • Performed analyses on extracting data and completed modules within larger assignments.
  • Analyzed and synthesized available data and brought out value-added insights to the client.
  • Identified and solved data issues related to data quality, availability and/or integration.
  • Created dashboards that allow algorithms to simplify key tasks.
  • Designed and managed client applications that will generate a visible business impact through the automation processes.
  • Active participation and contribution in team discussions on project specific areas.

Technologies mainly used in company-related projects (experienced-level):
-Microsoft Office.
-MS Access.
  • Developed tables, queries, forms and reports and connect them together by using macros or VBA scripts.
  • Created GUIs allowing users to interact with the developed applications through graphical elements and/or visual indicators.
  • Imported and exported data to different formats.

-MS Excel.
  • Organized data manipulation by using the provided functions.
  • Displayed data as graphs, histograms and charts.
  • Interacted with the programming aspect of Excel, VBA, for creating complex analysis, data mining, and then reported the results back to the spreadsheet.
  • Created user forms along with a variety of graphical elements that enhance user to easily run tasks that are periodically needed, in an automated way.
-MS Word.
-MS Powerpoint.
-MS Outlook.
–Visual Basic for applications across MS components.
  • Building functions and procedures for enabling process automation that interacts with the host application.
  • Created scripts that are performing advanced tasks over MS desktop application suite and stored them accordingly so they are called and ran through modules.
  • Ran scripts for data consolidation and cleansing, imported and exported data from/to different file formats.
  • Created an environment that allows interaction with SQL programming language for data manipulation.
–MS SQL.
  • Stored and retrieved data from databases.
  • Performed joins between data sets stored in different tables.
  • Created advanced queries based on multiple tables.
  • Ran manipulation and definition operations over data sets, using specific statement.

Other adjacent softwares (mid-level):
-Microsoft Visual Studio 2013.
-Visual Studio for Web / Visual Studio for Windows Desktop.
  • Created and developed web applications and services / desktop applications using Visual Basic programming language across .NET framework and Windows Forms.
  • Designed the user interface using code and visual designer.
  • Customized applications through VB programming language usage.
  • Built and tested standard-based Web applications using both VB and HTML/CSS.
  • Created connection bridges between IDE and RDBMS (Microsoft Access, SQL Server 2014).
  • Created and managed project databases.
  • Facilitated transaction applications while processing data entry and retrieval.
-MicroStrategy Analytics Express & Analytics Enterprise.
  • Developed / updated and empowered dashboards based on consolidated files that are gathering data from different sources.
  • Created and added visualization elements that offers a better graphical understanding over the data sets.
-Teradata-Aster.
  • Set-up and connected working node architecture.
  • Imported and exported data using different environments and formats.
  • Created bridge connections between Aster components and additional software allowing different sets of tasks.
  • Database creation and modification, DML operation.
  • Developed and ran SQL statements designed for managing data using different environments – ACT, Teradata Studio.
-HTML.
  • Created static templates for e-mail client (MS Outlook) using HTML body features like text formatting, tables.
  • Empowered Web applications user interface design by integrating HTML syntax across .NET framework while using Visual Studio IDE.

Other softwares, with internal trainings provided (beginner-level):
-SASS.
  • Created procedures that integrates SASS with SQL programming language for data management and visualization.
-QlikView.
  • Created dynamic dashboards and interactive charts.
  • Uploaded data sources by using SQL statements.
More

Cord Blood CenterJuly 2011 - May 2015

Database Programmer for a blood processing center.
  • Configuring and maintaining existing application using Microsoft Office Access.
  • Implementing new enhancements to the database using Visual Basic for Applications and SQL.
  • Supporting user group through new features of database.
  • Ensuring the import and export of data through the application.
  • Optimizing application due to marketing department actions.
  • Ensuring the relation between database and accountancy system.
  • Monitoring the user activity.
  • Setting local security policies for groups of users.
  • Providing support and knowledge of the application to users.
  • Ensuring the database backup and security.
  • Generating reports and statistics as required using SQL.
  • Performing other responsibilities as required.
More

PFA Catalin VladescuNovember 2009 - June 2010

Database Developer for a freelancer.
  • Creating and developing project database using Microsoft Office Access 2007 and SQL.
  • Data management.
  • Developing and delivering application reports using SQL.
  • Monitoring database operator activity.

My education and trainings

Bachelor's Degree in Economic Sciences, Faculty of Economic Sciences - "Babes-Bolyai" University of Cluj-Napoca.2006 - 2009