Suraj P.

Data Scientist

95 dollar

My experience


Bank of AmericaNovember 2017 - Present

SCUBA for automated data-labelling:
* Created a tool which used open-source API to facilitate automated labelling called Snorkel. It allowed users to write rules which could be as simple as pattern matching and as sophisticated as running model. Once the rules were defined, we trained a weak supervision deterministic label model. The automated label generated from these rules could then be used to train a full-scale binary classification model. In additions to automated data labelling, we also added exciting features to augment data

Churn Analysis:
* Created churn analysis model for the HR team and provided them with the strategic Retention Plan and categorized the employee to various Risk Category. Association rule mining was also implemented to provide combinations of top manager, skill level and sub-band level etc under which employee was more

likely to churn.

Bank of America Continuum SolutionsJuly 2014 - October 2017

* Developed Unix/python scripts for transactional data pre-processing for fetching only the significant fields in the data coming from upstream.

* Responsible for data visualization for reporting number of incidents per month per application, number of image and non-image customer and job execution time.

* Created python script for automating QA process by checking for data provision, data completeness, uniqueness, accuracy, concurrency there by saving 720+ hours annually.

* Worked on migrating flagscape documents from google to Elasticsearch as google was moving to cloud and Bank didn't want its data on cloud, we indexed various types of documents and ingested all structured, semi-structured and unstructured data into the Elasticsearch using

My stack



Analysis methods and tools


Big Data

PySpark, Big Data, Spark, Data Visualization


Churn Analysis, Weak Supervision, ANOVA, LSI StoreAge > StoreAge SVM, database management, communication skills, Bachelors Degree > Bachelor of Technology > Bachelor of Technology Communication, Scientific Method > Hypothesis Testing, Dynamic Data, Python Programming, Data Scientist, Defect Management, Manager, User Interface, Bachelors Degree > Bachelor of Technology > Bachelor of Technology Electronics, Version Control, PGP, Natural Language Processing, industry~science, Analyst

IT Infrastructure

Git, Unix

Machine Learning

Deep learning


Natural Language Processing (NLP), Analytics, Logstash


HDFS, Machine Learning, ElasticSearch


Shell Scripting, Python, SQL

My education and trainings

Bachelor of Technology, Electronics, Communication - Vellore Institute of technology

- International School of Engineering2019