Analyzed public GWAS catalog and compared it with RGC Results Browser associations catalog to find out potential gene mutations for target discovery using PySpark.
Developed ETL pipeline to automate the process of transforming raw data from GWAS catalog and different collaborators into RGC defined schemas, to combine it to the existing data of 90K patients.
Developed python notebooks (using Databricks) to automate the ancestry extraction process of people sequenced for a given gene mutation in public GWAS catalog, using Spacy, to facilitate continent specific target discovery.
Developed a web application using Django, to analyze CDR (Call Detail Records), using Pyspark, for more than 2.5 million subscribers of MTNL/BSNL. The analysis was used by the MTNL/BSNL for marketing and promotional purposes.
Conducted training session about Hadoop and Spark for the new recruits in the organization. Planned, organized and developed training materials that met specific needs of different teams for new hires.
Developed and managed different patch versions to upgrade the CMS (Centralized Monitoring System) application and resolve various software issues.
Performed validation and verification of the CMS application, created their test cases for acceptance testing and demonstrated to the clients.
Automated some parts of Disaster Recovery process for the CMS application running on Linux OS which included hostname related changes in the DNS and NetApps snapmirror technology for the storage of data.
Implemented k-means clustering algorithm in Python on a dataset of over 5000 images, to classify the skin burns as first, second and third-degree burns.
Developed automated test scripts in python to setup test environment, fetch test data from database and generate the reports as per business requirements & ensure delivery of quality software applications.