I am an experienced Data Engineer with a strong proficiency in PySpark, Apache Spark, SQL, Python, and Azure services. My expertise lies in designing and optimizing ETL (Extract, Transform, Load) pipelines, creating efficient data storage solutions, and enhancing data processing services. As a certified Azure professional, I have a proven track record of delivering impactful solutions that drive business success.
I am passionate about harnessing the power of data to solve complex problems and enable data-driven decision-making. With a keen eye for detail and a dedication to excellence, I consistently strive to create innovative solutions that improve data quality, accessibility, and usability.
Developed a dynamic Databricks workflow as a sole contributor, enabling trigger-based execution by backend microservices. This workflo...
Developed a dynamic Databricks workflow as a sole contributor, enabling trigger-based execution by backend microservices. This workflow autonomously generates delta tables and views, crucial for backend processes that create Excel workbooks containing fund information. The data source originates from transactional tables with fund audit financial data. The solution was built from the ground up, offering parallel processing capabilities and extensive configurability via JSON-controlled settings. Engineered agile pipelines with the ability to execute intricate calculations driven by microservices parameters. Optimized SQL queries, enhancing system performance by eliminating redundancies.
Designed and executed ETL production pipelines using PySpark and HIVE for data extraction, decryption, reconciliation, and transformat...
Designed and executed ETL production pipelines using PySpark and HIVE for data extraction, decryption, reconciliation, and transformation. Developed Databricks framework for downstream predictive analytics, enabling easy access to data as HIVE tables. Enhanced PySpark notebooks to achieve 40% reduction in runtime by parallelizing data loads. Delivered adhoc notebooks for real-time data manipulation needs. Established historical data ingestion workflow and incremental orchestration for daily loads.
Orchestrated Azure Data Factory (ADF) pipelines for seamless on-premises to cloud (ADLS) data migration. Designed ADF data wrangling b...
Orchestrated Azure Data Factory (ADF) pipelines for seamless on-premises to cloud (ADLS) data migration. Designed ADF data wrangling based Incremental pipeline (SCD2), replacing legacy SSIS jobs. Leveraged Azure DevOps for efficient ADF pipeline deployment across higher environments. Developed, enhanced, and maintained ETL packages using SSIS, ensuring client requirements were met. Investigated and resolved data discrepancies within strict SLAs, collaborating with cross-functional teams. Interacted with client stakeholders to gather requirements and craft technical specifications.