Enosh K - Data Engineer |
[email protected] |
Location: Remote, Remote, USA |
Relocation: Yes |
Visa: H1B |
PROFESSIONAL SUMMARY:
Results-driven Senior Data Analyst with 7+ years of experience in customer data analytics, cloud computing, and business intelligence, leveraging SQL, Python, and Databricks to drive data-driven decision-making in e-commerce, finance, and technology industries. Expertise in cloud-native solutions using Azure (Azure Data Factory, Azure Synapse Analytics, Azure Databricks, Azure Data Lake Storage, Azure Event Hubs, Azure Stream Analytics), Google Cloud (BigQuery, Dataflow, Dataproc, Cloud Storage, Cloud Composer, Pub/Sub), and AWS (S3, Redshift, Glue, Lambda) for scalable data warehousing and real-time analytics. Proficient in distributed data processing with Apache Spark, PySpark, Hadoop, Kafka, and Delta Lake, optimizing ETL workflows for high-volume, real-time data processing across Azure, AWS, and GCP. Designed and implemented enterprise data lakes on Azure (ADLS, Synapse), GCP (BigQuery, GCS), and AWS (S3, Redshift), migrating on-premises workloads to the cloud, improving cost efficiency, performance, and compliance with HIPAA, PCI-DSS, and GDPR standards. Developed and optimized real-time streaming pipelines using Azure Event Hubs, Azure Stream Analytics, Apache Kafka, Google Cloud Pub/Sub, and Spark Streaming, enabling fraud detection, underwriting analysis, and customer insights. Built scalable data models and data warehouses (Star/Snowflake schemas) in Azure Synapse Analytics, BigQuery, Redshift, and Snowflake, supporting financial reporting, risk management, and marketing analytics. Implemented CI/CD automation for data pipelines using Azure DevOps, Terraform, Jenkins, GitHub Actions, and Google Cloud Build, reducing deployment time and operational overhead. Strong experience in orchestration and workflow automation using Apache Airflow (Cloud Composer), Azure Data Factory (ADF), AWS Step Functions, and Kubernetes, ensuring efficient data pipeline execution and monitoring. Developed interactive dashboards and BI solutions using Power BI, Tableau, Looker, and Google Data Studio, transforming raw data into actionable business insights. Hands-on experience with database design and optimization, including Azure SQL Database, Cloud SQL, PostgreSQL, and MS SQL Server, ensuring high-performance and scalable storage solutions. EDUCATION: Master of Science, Computer Science, Fitchburg State University, Fitchburg, USA Bachelor of Technology, Mechanical Engineering, S.R Engineering College, Telangana, India TECHNICAL SKILLS: Big Data Processing Apache Spark, PySpark, Hadoop, Apache Beam, Kafka, Delta Lake Cloud Platforms AWS (S3, Glue, Redshift, Lambda), GCP (BigQuery, Dataproc, Dataflow, Cloud Composer, Cloud Storage, Cloud Functions, Cloud Pub/Sub, Cloud Spanner, Google Kubernetes Engine) ETL & Data Warehousing Google Cloud Dataflow, Apache Beam, Snowflake, Informatica, Hive, SQL-based solutions, Teradata, PostgreSQL, MS SQL Server, Oracle Programming & Scripting Python (Pandas, NumPy, SciPy, Matplotlib, Scikit-learn, TensorFlow, PyTorch), Scala, SQL, Bash, Java, C++ CI/CD & DevOps Jenkins, Terraform, Git, Airflow, Docker, Kubernetes, Google Cloud Build, Cloud Deployment Manager, Cloud Source Repositories Data Visualization & BI Power BI, Tableau, Looker, Google Data Studio, Google Analytics Machine Learning & AI Machine Learning, Databricks ML, TensorFlow, Scikit-learn, MLflow, PyTorch, Keras Data Security & Compliance Cloud IAM, Google Cloud DLP, Cloud Security Command Center, GDPR, HIPAA, PCI-DSS, SOX Development Tools IntelliJ, Eclipse, NetBeans, Hue, Microsoft Office Operating Systems Linux, Unix, Windows, Mac OS PROFESSIONAL EXPERIENCE: Senior Azure Data Engineer | Citi Bank, Connecticut | Apr 2024 Present Project: Enterprise Data Modernization & Predictive Analytics Designed and implemented scalable, cloud-native data lakes on Azure (ADLS, Synapse, ADF) and Google Cloud (BigQuery, GCS, Dataflow, Dataproc) to ingest, store, and process structured and semi-structured customs and logistics data from global customs systems, shipping records, regulatory filings, and third-party trade platforms. Integrated real-time customs and shipment tracking data through APIs and streaming pipelines to enable dynamic clearance status monitoring and risk flagging. Developed robust ETL pipelines using Azure Databricks (PySpark, Spark SQL) and Apache Beam on Google Dataflow to cleanse, transform, and normalize customs datasets including declarations, tariff codes, invoice data, and trade documentation. Implemented lineage and audit trails for customs transactions to ensure traceability and transparency in data pipelines. Enabled intelligent tracking and automation of customs clearance processes by integrating disparate data sources (e.g., customs brokers, port authorities, logistics providers) into a unified analytics layer. Collaborated with compliance officers to define validation rules, identify anomalies in trade declarations, and implement business logic for regulatory checks and cross-border data validation. Designed real-time streaming pipelines using Azure Event Hubs, Azure Stream Analytics, and Google Cloud Pub/Sub to monitor transactional customs data for unusual patterns such as misclassified goods, duplicate entries, or value discrepancies. Developed predictive models using BigQuery ML, Azure ML, and Spark ML to flag high-risk shipments and potential fraud scenarios such as under-invoicing, smuggling, or improper routing. Created interactive dashboards and self-service analytics in Power BI and Looker to visualize customs clearance KPIs, risk scores, fraud alerts, and compliance exceptions. Empowered customs teams with real-time insights and historical trends. Regulatory Compliance & Data Governance: Established data governance frameworks aligned with global trade and customs compliance standards, leveraging Azure Purview, Google IAM, and Azure RBAC for role-based access, metadata management, and regulatory auditability. Ensured compliance with trade data retention policies, GDPR, and customs authority reporting requirements through automated pipeline controls and encryption protocols. Migrated legacy customs data workloads from on-prem and AWS (S3, Redshift, Athena) to Azure and GCP using ADF, Synapse, Data Fusion, and Data Transfer Service, ensuring data consistency and high availability. Deployed containerized ETL workloads on AKS and GKE to handle large volumes of customs data, achieving scalable processing and fault tolerance. Built monitoring dashboards using Azure Monitor and Cloud Logging to track pipeline performance and detect data quality issues specific to customs declarations and documentation inconsistencies. Designed data quality pipelines to perform rule-based checks (e.g., HS code validation, value reconciliation, and date consistency) ensuring data accuracy before it reached downstream analytics layers. Environment & Tools: Azure Synapse Analytics, ADLS, ADF, Azure Databricks, PySpark, SQL, Apache Beam, Dataflow, Google BigQuery, GCS, AWS Redshift, Athena, Azure Stream Analytics, Azure Event Hubs, Azure Functions, BigQuery ML, Azure ML, Spark ML, TensorFlow, Power BI, Looker, Google Data Studio, Azure Kubernetes Service (AKS), Google Kubernetes Engine (GKE), Terraform, Azure Purview, Google IAM, Azure RBAC, Azure Monitor, Data Quality Pipelines Senior Data Analyst | Wayfair, Boston, MA | Feb 2023 Mar 2024 Project: Cloud-Based Customer Analytics & Data Pipeline Optimization Extracted and processed large-scale customer and transactional data from various cloud databases (AWS Redshift, GCP BigQuery, Azure Synapse Analytics), enabling customer behavior analysis and strategic decision-making. Developed and optimized SQL and Python-based ETL pipelines using Databricks, Apache Spark, and PySpark, improving data processing efficiency. Designed predictive models and descriptive analytics (e.g., clustering, regression, anomaly detection, market basket analysis) to provide actionable insights into customer behavior, sales performance, and retention strategies. Built and automated data cleansing, transformation, and aggregation using NumPy, SciPy, SQLAlchemy, and Azure Functions, enhancing reporting accuracy. Created interactive dashboards and data visualizations using Power BI, Tableau, and Looker, simplifying complex data insights for non-technical stakeholders. Collaborated with cross-functional teams (Marketing, Product, Supply Chain) to define business questions, extract insights, and drive data-driven decision-making. Implemented CI/CD pipelines (Jenkins, GitHub, Terraform, Azure DevOps) to automate deployment and data pipeline orchestration. Managed data security, IAM roles, and access controls across AWS, GCP, and Azure, ensuring compliance with data governance standards. Led cloud migration initiatives, optimizing data storage and retrieval strategies while transitioning on-premises analytics to AWS, GCP, and Azure. Environment & Tools: AWS Redshift, GCP BigQuery, Azure Synapse Analytics, Azure Data Lake Storage (ADLS), Delta Lake, Apache Spark, PySpark, SQL, Databricks, Azure Data Factory (ADF), Azure Stream Analytics, Kafka, Python (NumPy, SciPy), Predictive Modeling (Regression, Clustering, Anomaly Detection), Power BI, Tableau, Looker, Airflow, Jenkins, Terraform, GitHub, Azure DevOps Data Analyst | Infosys, India | Jun 2019 Jul 2022 Project: Enterprise Data Warehouse & Business Intelligence Designed and implemented a data warehouse, enabling centralized data management and reporting. Developed stored procedures, functions, triggers, indexes, and views using T-SQL for SQL Server 2000/2005. Created and optimized DTS Packages to facilitate ETL processes, transferring data between SQL Server and external databases. Managed job scheduling and execution in Enterprise Manager, automating report generation and data processing. Conducted User Acceptance Testing (UAT) to validate report modifications and data integrity. Designed and optimized complex SQL queries for business intelligence and reporting. Developed relational database models based on business requirements, ensuring scalability and performance. Led CRM system enhancements, managing customer data analytics and reporting. Created custom reports and dashboards using SSRS, Excel, and Lotus Notes, driving data-driven decision-making. Wrote and optimized T-SQL scripts, triggers, stored procedures, and cursors, improving query performance and execution time. Environment: MSSQL Server 2000/2005, MySQL, Lotus Notes, MS Access, SSRS, SSIS, Excel, PowerPoint Software Engineer | Prahem Technologies, India | Apr 2017 May 2019 Project: Large-Scale Data Processing & Log Analysis Developed and optimized MapReduce programs for efficient large-scale data processing and log analysis. Designed and implemented Hive tables, optimizing queries for faster data retrieval. Utilized Sqoop for seamless data transfer between HDFS and relational databases, improving ETL workflows. Benchmarked and fine-tuned Hadoop and HBase clusters to enhance performance and scalability. Developed Java-based MapReduce programs for log file analysis, enabling data-driven insights. Automated data ingestion using Sqoop to import structured data into HDFS and Hive from multiple sources. Configured and optimized Hadoop/HBase clusters for internal enterprise applications, ensuring high availability and reliability. Environment: Hadoop, MapReduce, HDFS, Hive, Sqoop, HBase, Java, SQL Keywords: cplusplus continuous integration continuous deployment artificial intelligence machine learning business intelligence sthree rlang information technology microsoft Massachusetts |