Suraj P.

Data Scientist

95 dollar

My experience

More

Bank of AmericaNovember 2017 - Present

SCUBA for automated data-labelling:
* Created a tool which used open-source API to facilitate automated labelling called Snorkel. It allowed users to write rules which could be as simple as pattern matching and as sophisticated as running model. Once the rules were defined, we trained a weak supervision deterministic label model. The automated label generated from these rules could then be used to train a full-scale binary classification model. In additions to automated data labelling, we also added exciting features to augment data

Churn Analysis:
* Created churn analysis model for the HR team and provided them with the strategic Retention Plan and categorized the employee to various Risk Category. Association rule mining was also implemented to provide combinations of top manager, skill level and sub-band level etc under which employee was more

likely to churn.
More

Bank of America Continuum SolutionsJuly 2014 - October 2017

* Developed Unix/python scripts for transactional data pre-processing for fetching only the significant fields in the data coming from upstream.

* Responsible for data visualization for reporting number of incidents per month per application, number of image and non-image customer and job execution time.

* Created python script for automating QA process by checking for data provision, data completeness, uniqueness, accuracy, concurrency there by saving 720+ hours annually.

* Worked on migrating flagscape documents from google to Elasticsearch as google was moving to cloud and Bank didn't want its data on cloud, we indexed various types of documents and ingested all structured, semi-structured and unstructured data into the Elasticsearch using

My stack

Others

Analytics, Logstash, Natural Language Processing (NLP)

Analysis methods and tools

JIRA

IT Infrastructure

Unix, Git

Machine Learning

Deep learning

Languages

SQL, Shell Scripting, Python

Databases

MySQL

Technologies

Machine Learning, ElasticSearch, HDFS

Big Data

Spark, Data Visualization, Big Data, PySpark

Other

Manager, Version Control, Data Scientist, communication skills, database management, LSI StoreAge > StoreAge SVM, Analyst, industry~science, Natural Language Processing, PGP, Bachelors Degree > Bachelor of Technology > Bachelor of Technology Electronics, User Interface, Defect Management, ANOVA, Python Programming, Dynamic Data, Scientific Method > Hypothesis Testing, Bachelors Degree > Bachelor of Technology > Bachelor of Technology Communication, Weak Supervision, Churn Analysis

My education and trainings

Bachelor of Technology, Electronics, Communication - Vellore Institute of technology

- International School of Engineering2019