Home

Enosh kommula - Data Engineer
[email protected]
Location: Remote, Remote, USA
Relocation: Yes
Visa: H1B
Enosh K
Email: [email protected]
Phone: +1 (469)- 905-3226
Sr Data Engineer

PROFESSIONAL SUMMARY:
Results-driven Senior Data Analyst with 7+ years of experience in customer data analytics, cloud computing, and business intelligence, leveraging SQL, Python, and Databricks to drive data-driven decision-making in e-commerce, finance, and technology industries.
Expertise in cloud-native solutions using Azure (Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake Storage, Azure Event Hubs, Azure Stream Analytics), Google Cloud (BigQuery, Dataflow, Dataproc, Cloud Storage, Cloud Composer, Pub/Sub), and AWS (S3, Redshift, Glue, Lambda) for scalable data warehousing and real-time analytics.
Proficient in distributed data processing with Apache Spark, PySpark, Hadoop, Kafka, and Delta Lake, optimizing ETL workflows for high-volume, real-time data processing across Azure, AWS, and GCP.
Designed and implemented enterprise data lakes on Azure (ADLS, Synapse), GCP (BigQuery, GCS), and AWS (S3, Redshift), migrating on-premises workloads to the cloud, improving cost efficiency, performance, and compliance with HIPAA, PCI-DSS, and GDPR standards.
Developed and optimized real-time streaming pipelines using Azure Event Hubs, Azure Stream Analytics, Apache Kafka, Google Cloud Pub/Sub, and Spark Streaming, enabling fraud detection, underwriting analysis, and customer insights.
Built scalable data models and data warehouses (Star/Snowflake schemas) in Azure Synapse Analytics, BigQuery, Redshift, and Snowflake, supporting financial reporting, risk management, and marketing analytics.
Implemented CI/CD automation for data pipelines using Azure DevOps, Terraform, Jenkins, GitHub Actions, and Google Cloud Build, reducing deployment time and operational overhead.
Strong experience in orchestration and workflow automation using Apache Airflow (Cloud Composer), Azure Data Factory (ADF), AWS Step Functions, and Kubernetes, ensuring efficient data pipeline execution and monitoring.
Developed interactive dashboards and BI solutions using Power BI, Tableau, Looker, and Google Data Studio, transforming raw data into actionable business insights.
Hands-on experience with database design and optimization, including MongoDB, Azure SQL Database, Cloud SQL, PostgreSQL, and MS SQL Server, ensuring high-performance and scalable storage solutions.

EDUCATION:
Master of Science, Computer Science, Fitchburg State University, Fitchburg, USA
Bachelor of Technology, Mechanical Engineering, S.R Engineering College, Telangana, India
TECHNICAL SKILLS:
Big Data Processing Apache Spark, PySpark, Hadoop, Apache Beam, Kafka, Delta Lake
Cloud Platforms AWS (S3, Glue, Redshift, Lambda), GCP (BigQuery, Dataproc, Dataflow, Cloud Composer, Cloud Storage, Cloud Functions, Cloud Pub/Sub, Cloud Spanner, Google Kubernetes Engine)
ETL & Data Warehousing Google Cloud Dataflow, Apache Beam, Snowflake, Informatica, Hive, SQL-based solutions, Teradata, PostgreSQL, MS SQL Server, Oracle, MongoDB
Programming & Scripting Python (Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, TensorFlow, PyTorch), Scala, SQL, Bash, Java, C++
CI/CD & DevOps Jenkins, Terraform, Git, Airflow, Docker, Kubernetes, Google Cloud Build, Cloud Deployment Manager, Cloud Source Repositories
Data Visualization & BI Power BI, Tableau, Looker, Google Data Studio, Google Analytics
Machine Learning & AI Machine Learning, Databricks ML, TensorFlow, Scikit-learn, MLflow, PyTorch, Keras
Data Security & Compliance Cloud IAM, Google Cloud DLP, Cloud Security Command Center, GDPR, HIPAA, PCI-DSS, SOX
Development Tools IntelliJ, Eclipse, NetBeans, Hue, Microsoft Office
Operating Systems Linux, Unix, Windows, Mac OS

PROFESSIONAL EXPERIENCE:
Senior Azure Data Engineer | Citi Bank, Connecticut | Apr 2024 Present
Project: Enterprise Data Modernization & Predictive Analytics
Designed and implemented scalable, cloud-native data lakes on Azure (ADLS, Synapse, ADF) and Google Cloud (BigQuery, GCS, Dataflow, Dataproc) to ingest, store, and process structured and semi-structured customs and logistics data from global customs systems, shipping records, regulatory filings, and third-party trade platforms.
Integrated MongoDB Atlas for flexible document storage of semi-structured trade documents, customs declarations, and shipment metadata, enabling schema evolution and high-performance queries.
Developed robust ETL pipelines using Azure Databricks (PySpark, Spark SQL) and Apache Beam on Google Dataflow to cleanse, transform, and normalize customs datasets, including declarations, tariff codes, invoice data, and trade documentation, with MongoDB as a sink for processed JSON/BSON data.
Implemented lineage and audit trails for customs transactions using Azure Purview and MongoDB Change Streams to ensure traceability and transparency in data pipelines.
Enabled intelligent tracking and automation of customs clearance processes by integrating disparate data sources (e.g., customs brokers, port authorities, logistics providers) into a unified analytics layer, with MongoDB aggregations supporting real-time analytics.
Collaborated with compliance officers to define validation rules, identify anomalies in trade declarations, and implement business logic for regulatory checks, storing flagged cases in MongoDB for quick retrieval and investigation.
Designed real-time streaming pipelines using Azure Event Hubs, Azure Stream Analytics, and Google Cloud Pub/Sub to monitor transactional customs data for unusual patterns, with MongoDB time-series collections for efficient trend analysis.
Developed predictive models using BigQuery ML, Azure ML, and Spark ML to flag high-risk shipments, storing model outputs in MongoDB for low-latency risk scoring APIs.
Created interactive dashboards in Power BI and Looker, sourcing from MongoDB Atlas SQL Interface alongside traditional data warehouses for real-time insights.
Established data governance frameworks aligned with global trade compliance standards, leveraging Azure Purview, Google IAM, and Azure RBAC, with MongoDB field-level encryption for sensitive trade data.
Ensured compliance with GDPR and customs retention policies using MongoDB TTL indexes for automated data expiry and client-side encryption for PII protection.
Migrated legacy customs workloads from on-prem and AWS (S3, Redshift, Athena) to Azure and GCP, using MongoDB Atlas Live Migration for seamless transition of document-based datasets.
Deployed containerized ETL workloads on AKS and GKE, with MongoDB Kubernetes Operators for scalable, self-healing document storage.
Built monitoring dashboards using Azure Monitor and Cloud Logging, tracking MongoDB query performance and operational metrics alongside pipeline health.
Designed data quality pipelines with rule-based checks, validating customs data before storage in MongoDB collections with schema validation rules.
Environment: Azure Synapse Analytics, ADLS, ADF, Azure Databricks, PySpark, SQL, Apache Beam, Dataflow, Google BigQuery, GCS, MongoDB Atlas, MongoDB Change Streams, MongoDB Aggregation Framework, AWS Redshift, Athena, Azure Stream Analytics, Azure Event Hubs, Azure Functions, BigQuery ML, Azure ML, Spark ML, TensorFlow, Power BI, Looker, Google Data Studio, Azure Kubernetes Service (AKS), Google Kubernetes Engine (GKE), Terraform, Azure Purview, Google IAM, Azure RBAC, Azure Monitor, Data Quality Pipelines
Senior Data Analyst | Wayfair, Boston, MA | Feb 2023 Mar 2024
Project: Cloud-Based Customer Analytics & Data Pipeline Optimization
Extracted and processed large-scale customer and transactional data from cloud databases (AWS Redshift, GCP BigQuery, Azure Synapse Analytics) and MongoDB Atlas for semi-structured behavioral data (clickstreams, session logs, product interactions).
Developed and optimized SQL and Python-based ETL pipelines using Databricks, Apache Spark, and PySpark, with MongoDB Spark Connector for efficient JSON data processing.
Designed predictive models (clustering, regression, anomaly detection) and market basket analysis, storing model outputs in MongoDB for real-time recommendation APIs.
Automated data cleansing and transformation using NumPy, SciPy, and Azure Functions, with MongoDB aggregation pipelines for on-the-fly data enrichment.
Built interactive dashboards in Power BI, Tableau, and Looker, sourcing from MongoDB Atlas SQL Interface alongside cloud warehouses for unified reporting.
Collaborated with Marketing, Product, and Supply Chain teams to analyze MongoDB-stored user engagement data, improving retention strategies.
Implemented CI/CD pipelines (Jenkins, GitHub, Terraform) with MongoDB Atlas automation for schema versioning and deployment syncs.
Managed data security and IAM roles across AWS, GCP, and Azure, extending governance to MongoDB field-level encryption for PII protection.
Led cloud migration initiatives, transitioning on-prem analytics to AWS/GCP/Azure, with MongoDB Atlas Live Migration for document-based datasets.
Environment: AWS Redshift, GCP BigQuery, Azure Synapse Analytics, MongoDB Atlas, MongoDB Aggregation Framework, ADLS, Delta Lake, Apache Spark, PySpark, SQL, Databricks, ADF, Azure Stream Analytics, Kafka, Python (NumPy, SciPy), Predictive Modeling (Regression, Clustering, Anomaly Detection), Power BI, Tableau, Looker, Airflow, Jenkins, Terraform, GitHub, Azure DevOps
Data Analyst | Infosys, India | Jun 2019 Jul 2022
Project: Enterprise Data Warehouse & Business Intelligence
Designed and implemented an enterprise data warehouse using SQL Server, enabling centralized data management and reporting for business stakeholders.
Developed optimized stored procedures, functions, and views in T-SQL, improving data retrieval efficiency for analytical workloads.
Created and maintained DTS packages for ETL processes, ensuring seamless data transfers between SQL Server and external databases.
Automated report generation and data processing through SQL Server Agent jobs, reducing manual intervention.
Designed and optimized complex SQL queries for business intelligence, enabling faster insights from large datasets.
Developed relational database models aligned with business requirements, ensuring scalability and performance.
Led CRM system enhancements, integrating customer data analytics to improve reporting accuracy.
Created custom reports and dashboards using SSRS, Excel, and Lotus Notes, empowering stakeholders with actionable insights.
Conducted User Acceptance Testing (UAT) to validate report modifications and ensure data integrity.
Wrote and optimized T-SQL scripts, triggers, and stored procedures, enhancing query performance and execution speed.
Managed indexing strategies to improve database responsiveness for analytical queries.
Environment: MSSQL Server 2000/2005, MySQL, MS Access, SSIS (DTS), T-SQL, SSRS, Excel (PivotTables, Advanced Formulas), Lotus Notes, SQL Server Enterprise Manager, SQL Server Agent
Software Engineer | Prahem Technologies, India | Apr 2017 May 2019
Project: Large-Scale Data Processing & Log Analysis
Engineered high-performance MapReduce programs (Java) for petabyte-scale log processing, reducing processing time through algorithmic optimizations and parallel computing techniques
Designed efficient Hive data models with partitioning and bucketing strategies, accelerating query performance for analytical workloads
Implemented robust ETL pipelines using Sqoop for bi-directional data transfer between HDFS and RDBMS systems
Developed schema designs and access patterns for HBase to enable low-latency queries on massive datasets
Conducted comprehensive benchmarking of Hadoop ecosystem components, identifying and resolving performance bottlenecks
Optimized YARN resource allocation and HDFS block sizing to maximize cluster throughput
Configured high-availability setups for critical Hadoop/HBase deployments with failover mechanisms
Automated cluster monitoring and alerting for key performance metrics
Built custom log parsing and analysis frameworks to extract business insights from unstructured server logs
Implemented data quality checks and validation routines within the processing pipeline
Developed data transformation workflows to prepare raw logs for downstream analytics
Collaborated with analytics teams to implement efficient data access patterns
Environment: Hadoop Ecosystem (HDFS, YARN, MapReduce), Hive, HBase, Sqoop, Java 8, SQL, Linux, Shell Scripting
Keywords: cplusplus continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree rlang microsoft Massachusetts

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5436
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: