Jyoshna Chennoju - Senior Data Engineer |
[email protected] |
Location: Chicago, Illinois, USA |
Relocation: Yes |
Visa: |
Resume file: JYOSHNA CHENNOJU_1744820912227.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
Jyoshna Chennoju
Senior Azure Data Engineer Phone: +1(312)498-4031 Mail: [email protected] LinkedIn: www.linkedin.com/in/jyoshna-c-979a64147 Professional Summary: Over 10+ years of experience in architecting and delivering enterprise-grade Data Engineering and Data Warehousing solutions with deep expertise in Azure, AWS, and Big Data ecosystems. Specialized in designing and implementing robust, scalable, and secure end-to-end ETL/ELT pipelines across Microsoft Azure and AWS platforms. Hands-on expertise in orchestrating complex data workflows using Azure Data Factory, Databricks, Synapse Analytics, Snowflake, AWS Glue, EMR, and Redshift, while leveraging Azure components such as Key Vault, Integration Runtime, Data Flows, Linked Services, Logic Apps, Function Apps, and Blob Storage for seamless integration and automation. Extensive experience in Delta Lake architecture and PySpark transformations, processing over 200GB+ of daily data in Amazon EMR, with a focus on performance, scalability, and cost-efficiency. Migrated data from on-premise SQL databases to Azure Data Lake using ADF, and ingested data into Azure Blob Storage from applications, websites, and on-prem systems to optimize downstream analytics. Developed and deployed Azure Function Apps for serverless data workflows, automating tasks triggered by events such as data processing, API handling, or changes in Azure services. Enhanced application security using Azure Key Vault to securely manage secrets, certificates, and credentials across services. Skilled in leveraging Azure Data Flow for efficient, no-code/low-code ETL transformations in cloud-based data integration scenarios. Expert in designing scalable, cloud-native data intelligence solutions powered by Azure Synapse Analytics, supporting large-volume processing and advanced analytics. Leveraged Snowflake for high-performance data warehousing, optimizing data storage, compute, and analytics workloads for business intelligence use cases. Expertise in Snowflake, DBT, Airflow, and SQL, with a solid foundation in data modeling, metadata analysis, and pipeline observability. Designed real-time streaming pipelines using Amazon Kinesis (Data Streams & Firehose) and Azure Event Hubs, enabling low-latency analytics and improved data freshness. Proficient in real-time data streaming and ingestion using Kafka, Event Hubs, Spark Streaming, and Azure Functions, supporting event-driven architecture patterns. Automated infrastructure provisioning and management using Infrastructure as Code (IaC) principles, improving system scalability and reducing manual efforts. Adept at infrastructure automation, cluster optimization, and serverless orchestration using tools like Azure Functions, AWS Lambda, and Terraform. Configured and managed components within the Hadoop ecosystem, including HDFS and YARN, ensuring distributed and efficient data processing. Handled real-time data workflows using Spark Streaming, Oozie, and Zookeeper, enabling distributed coordination and scheduled job execution in big data pipelines. Implemented Sqoop to extract and load data from Oracle, MySQL, and SQL Server into HDFS, enabling smooth transitions into the Hadoop ecosystem. Designed and implemented staging layers in ETL pipelines to optimize data transformations before warehousing and reporting. Skilled in Redshift Spectrum, DynamoDB, Amazon S3, and Amazon RDS, integrating data lakes and warehouses for federated querying and Star Schema design. Strong programming skills in Python, Scala, and SQL, used to develop scalable data pipelines, automate transformations, and reduce manual intervention. Developed and maintained SSIS packages for efficient ETL processing, incorporating best practices to enhance performance and ensure data integrity. Proficient in T-SQL, PostgreSQL, and PL/SQL, building optimized stored procedures, triggers, and incremental loading logic to support high-throughput ETL workflows. Integrated PostgreSQL with ADF pipelines using JDBC and Linked Services, enabling seamless cloud ingestion and data movement workflows. Performed query tuning and index optimization in PostgreSQL, enhancing performance of downstream workloads by over 35%. Extensive use of Azure DevOps for automating CI/CD workflows and deploying secure, scalable data pipelines across development and production environments. Collaborated using Git/GitHub for version control of SSIS packages and data engineering projects, enabling agile, test-driven deployment practices. Designed and optimized complex SQL workflows and procedures in both T-SQL and PostgreSQL, enabling incremental processing, change tracking, and high-performance data pipelines. Education: Bachelor of Computer Science (JNTUH University) May 2013 Certifications: Microsoft Certified Azure Engineer Associate DP 203 Databricks Certified Data Engineer Professional Technical Skills: Azure Services Azure Data Factory (ADF), Azure Synapse Analytics, Azure Databricks, Azure Data Lake Storage (Gen1 & Gen2), Azure Blob Storage, Azure SQL DB/DW, Azure Cosmos DB, Azure Functions, Azure Logic Apps, Azure Event Hubs, Azure Monitor, Azure Purview, Azure Key Vault, Azure Notification Hubs, Azure DMS, Azure Machine Learning, Azure Entra ID (AAD), Azure HDInsight AWS Glue, Redshift, Redshift Spectrum, EMR, Lambda, RDS, S3, Kinesis, DynamoDB, Secrets Manager, IAM, CloudWatch, QuickSight, Step Functions, CodePipeline Big Data Technologies Apache Spark (Core, SQL, Streaming), Delta Lake, PySpark, Spark on Kubernetes, Hive, HQL, MapReduce, Sqoop, Flume, Kafka (Apache & Confluent), HBase, Zookeeper, NiFi, StreamSets, Oozie, YARN, HDFS Databases and Modeling MS SQL Server, Azure SQL DB, Oracle, PostgreSQL, MongoDB, Cosmos DB, Cassandra, IBM DB2, Star Schema, Snowflake Schema, Dimensional Modeling, Kimball, Inmon, Partitioning, Clustering, Materialized Views Languages Python, PySpark, SQL, T-SQL, PL/SQL, Scala, Shell Script, Java, HiveQL, Apex, Salesforce APIs (REST/SOAP), Pandas BI & Visualization Tools Power BI, Tableau, QlikView, Alteryx, Microsoft Fabric DevOps & CI/CD Git, GitHub, GitLab, Bitbucket, Jenkins, Azure DevOps, Terraform, Docker, Kubernetes, Helm, GitHub Actions, YAML, Bicep, CloudFormation IDE & Build Tools, Design PyCharm, Visual Studio, Eclipse, SSMS, SSIS, SSAS, SSRS, Maven, Ant Methodologies & Workflow orchestration Agile, Scrum, CI/CD, Airflow, Oozie, Prefect, Azure Pipelines, ADF Pipelines, Synapse Pipelines Work Experience Role: Azure data engineer July 2022 Till Now Client: Huntington Bank, Columbus, Ohio Responsibilities: Designed and implemented robust and scalable data ingestion pipelines using Azure Data Factory (ADF) to extract data from heterogeneous sources including PostgreSQL, SQL Server, CSV, and REST APIs, standardizing it into a structured format for downstream processing and analytics. Orchestrated large-scale ETL workflows using ADF, Snowflake, and Azure Data Lake, enabling seamless data movement while implementing governance, validation, and lineage tracking across pipelines. Developed and optimized AWS Glue, Lambda, and Redshift-based data pipelines for processing high-volume financial datasets, enabling batch and near real-time analytics on customer behavior and loan transactions. Built and deployed scalable ELT pipelines using DBT, Snowflake, and Apache Airflow, transforming raw data into curated models and enabling seamless reporting for business stakeholders. Implemented Master Data Management (MDM) strategies to standardize customer, product, and account data across multiple data sources, improving data consistency and integrity across enterprise systems. Participated in MDM implementation for healthcare client domains, enabling accurate provider-patient mapping, location hierarchies, and taxonomy standardization. Ensured data completeness and accuracy in MDM systems by building exception handling, anomaly detection, and alerting workflows in ADF and Logic Apps. Created reusable DBT macros, custom models, and version-controlled artifacts integrated with GitHub, enhancing code reusability, pipeline modularity, and lineage tracking. Automated CRM data ingestion and processing workflows into ADLS Gen2, reducing manual effort and ensuring unified views for marketing, support, and product analytics teams. Integrated Amazon S3 with Redshift Spectrum to enable federated querying across structured and semi-structured datasets, optimizing insights delivery from hybrid data lake environments. Utilized Azure Logic Apps and Function Apps to automate daily jobs and alerting processes, orchestrating ADF pipelines based on event-based triggers. Developed RESTful APIs using Flask and Python, enabling programmatic access and ingestion of datasets from partner APIs, third-party services, and application logs into the central data platform. Established CI/CD practices using AWS CodePipeline, Terraform, and Azure DevOps, streamlining infrastructure provisioning, ETL deployment workflows, and release automation across Azure and AWS platforms. Migrated legacy on-prem Oracle workloads to Azure Synapse, optimizing data partitioning and transformation logic for cloud-native processing. Implemented enterprise-grade Snowflake environments on Azure, designing optimized schemas, clustering strategies, and secure data sharing between business domains. Configured real-time monitoring and alerting dashboards using Amazon CloudWatch to track performance of Redshift and Lambda jobs, improving SLA compliance and operational reliability. Ensured pipeline health and 99.99% availability by implementing advanced monitoring and alerting mechanisms using Azure Monitor and Splunk, enabling proactive resolution of pipeline failures and SLA violations. Applied best practices in Delta Lake architecture, file compaction, and Z-ordering within Azure Databricks to ensure ACID compliance, reduce storage costs, and improve read performance. Designed highly scalable data models in ADLS and Snowflake, leveraging hierarchical namespace features, partitioned storage, and external tables to support interactive analytics. Ingested and processed Salesforce and CRM data into Azure Data Lake, supporting dynamic customer segmentation and boosting campaign personalization metrics by 20%. Collaborated with business analysts to transform curated datasets into business-friendly outputs, leveraging Alteryx for data wrangling, cleansing, and light-weight automation of marketing analytics workflows. Developed real-time Power BI dashboards to visualize customer segmentation, loan behavior, and marketing funnel data, providing actionable insights to sales and product teams. Applied Lakehouse Architecture principles with Unity Catalog, centralizing data governance and enabling schema enforcement, access control, and metadata management across departments. Tuned performance of analytical queries by applying columnar partitioning, materialized views, caching, and SQL tuning across Snowflake and Synapse. Designed and enforced data lifecycle strategies including partitioning, compression, and retention policies in ADLS, significantly optimizing storage performance and reducing cloud expenditure. Enhanced customer data reliability through data profiling, anomaly detection, and custom data validation scripts, significantly reducing rework and manual intervention. Enforced strict role-based access control (RBAC) and encryption policies in Azure Data Lake, Key Vault, and Secrets Manager, ensuring regulatory compliance and zero trust architecture. Implemented regulatory-compliant data encryption and retention policies using Azure Blob Storage, ADLS, and Key Vault, aligning with SOX, CCAR, and GDPR standards while reducing storage cost by 25% Conducted performance benchmarking and resource capacity planning for Azure Synapse, Databricks, and AWS Redshift workloads, improving cost-efficiency and system utilization. Handled high-throughput real-time streaming data from Event Hub, Kafka, and Spark Streaming, enabling timely decisions in customer experience, marketing, and fraud prevention. Designed and implemented DevOps automation pipelines using Terraform, GitHub, and Azure DevOps, maintaining code quality, versioning, and continuous integration across ETL jobs. Utilized Apache Oozie alongside Airflow and DBT to orchestrate and schedule Spark and Hive workflows, enabling seamless execution of cross-platform ETL jobs across hybrid infrastructure. Collaborated closely with cross-functional teams including data analysts, QA engineers, and business stakeholders to ensure platform alignment with enterprise objectives. Leveraged hybrid cloud infrastructure (IaaS, PaaS, SaaS) using AWS and Azure for scalable, cost-efficient deployment of critical analytics workloads. Environment: Azure Data Factory (ADF), Azure Synapse Analytics, Databricks, Delta Lake, Azure Data Lake Storage (ADLS Gen2), SQL Database, Blob Storage, Functions, Logic Apps, Azure Monitor, Key Vault, Apache Airflow, Apache Oozie, Apache Spark, Spark SQL, PySpark, Spark Streaming, Apache Kafka, Event Hubs, Snowflake, DBT, REST APIs, Flask, PostgreSQL, Oracle, SQL Server, Power BI, Unity Catalog, Splunk, Terraform, GitHub, Azure DevOps, Python, SQL, Scala, Shell Scripting Role: Snowflake Data Engineer July 2020 June 2022 Client: Elevance Health, Houston, TX Responsibilities: Led the migration of legacy ETL pipelines from on-prem SQL environments to Azure Data Lake Storage Gen2, Azure SQL Database, and Azure Synapse Analytics, improving performance, scalability, and downstream analytics readiness. Designed and implemented scalable data integration pipelines using Azure Data Factory (ADF), leveraging activities such as Lookup, ForEach, Wait, Execute Pipeline, and Set Variable to orchestrate multi-stage workflows across cloud and on-prem systems. Developed robust real-time ingestion pipelines using Apache Kafka and Azure Event Hubs, integrating streaming data into Delta Lake on Azure Data Lake to support real-time analytics and operational intelligence. Built scalable ETL frameworks in Azure Databricks using PySpark and Spark SQL to process structured and semi-structured data from PostgreSQL, SQL Server, and Azure Blob Storage, enabling advanced analytics across business domains. Designed and maintained Delta Lake architectures with ACID transactions, supporting versioning, schema evolution, and time-travel features to enhance the reliability and traceability of data pipelines. Engineered a real-time StreamSets pipeline to capture incremental changes from PostgreSQL and publish to Confluent Kafka topics, which were then consumed and written into ADLS Gen2, ensuring fault-tolerant and scalable ingestion. Utilized Azure Function Apps and Logic Apps to automate monitoring, logging, and alerts across ETL pipelines, increasing transparency and enabling proactive issue resolution. Designed event-driven microservices using Azure Functions, triggered by Event Hub and Blob Storage changes to enrich, validate, and transform incoming data streams in real-time. Created over 20+ T-SQL/PLSQL stored procedures and user-defined functions (UDFs) to handle complex transformations, aggregations, and business logic as part of the Azure ELT process. Enabled predictive analytics by embedding MLlib-based machine learning models in Databricks pipelines, automating churn prediction and behavior segmentation across 50M+ records. Configured Spark Streaming applications for real-time data processing using Kafka sources, with windowing and watermarking logic to ensure accurate event-time aggregations. Integrated Azure IoT Edge modules to process sensor and device data locally, reducing bandwidth, lowering latency, and enhancing data security before syncing with cloud storage. Applied partitioning, bucketing, and map-side joins within Hive and Spark SQL to optimize performance of distributed joins and aggregation queries across large datasets. Built Apache Airflow DAGs to automate batch and real-time ETL workflows, coordinating Databricks notebooks, ADF triggers, and external API calls into a unified orchestration layer. Deployed and maintained distributed workloads using Azure HDInsight and Apache Spark, validating performance benchmarks and conducting proof-of-concepts for hybrid architectures. Leveraged PowerApps integration to automate workflows for internal teams, enabling self-service access to real-time data and reducing dependencies on manual data pull requests. Applied Azure Synapse Spark Pools to conduct big data processing and in-memory joins across 100+ GB datasets, optimizing workloads using parallelism and caching. Conducted advanced statistical analysis and visual storytelling using RStudio, ggplot2, and Matplotlib to detect trends, outliers, and patterns in healthcare data. Created custom Hive UDFs to embed dynamic policy logic and business rules directly into ETL workflows, improving flexibility and adaptability across varying product lines. Implemented role-based access control (RBAC) and Azure Key Vault for secure secrets management, ensuring compliance with internal data governance and HIPAA standards. Collaborated with offshore and onsite teams in Agile ceremonies, managing daily standups, backlog refinement, and sprint retrospectives to enhance delivery timelines and team coordination. Built Kafka-Spark-Hive pipelines for ingesting large log files, transforming them using Spark SQL, and analyzing system behavior patterns via Hive tables. Utilized Oozie workflows for job orchestration within Hadoop, enabling multi-step pipelines including ingestion, transformation, and loading into HDFS and Hive. Applied advanced data lineage and traceability using tools like Azure Purview and Apache Atlas, ensuring auditability and compliance in critical data movement workflows. Implemented CI/CD pipelines using Azure DevOps and Terraform, automating deployment of ADF pipelines, Databricks notebooks, and infrastructure provisioning. Developed complex SQL logic to power downstream reporting in Power BI, applying DAX expressions and performance-optimized queries for highly responsive dashboards. Tuned and optimized Databricks Spark jobs by adjusting shuffle partitions, memory configurations, and caching strategies, improving pipeline throughput by 40%. Ensured high data quality and pipeline reliability by developing automated validation scripts in Python and integrating them into scheduled production jobs. Conducted index tuning and query optimization in PostgreSQL and SQL Server, resulting in performance gains across key business-critical workloads. Supported production environments 24/7 by implementing robust monitoring dashboards using Azure Monitor, Log Analytics, and custom Function Apps for alerting and SLA tracking. Environment: Azure Data Factory (ADF), Azure Synapse Analytics, Databricks, Delta Lake, Azure Data Lake Storage (ADLS Gen2), SQL Database, Blob Storage, Functions, Logic Apps, Azure Monitor, Key Vault, Purview, Apache Kafka, Event Hubs, Apache Spark, Spark SQL, PySpark, Spark Streaming, Hive, HBase, Oozie, PostgreSQL, SQL Server, PL/SQL, T-SQL, RStudio, ggplot2, Matplotlib, Power BI, DAX, StreamSets, Confluent Kafka, HDInsight, PowerApps, Git, Azure DevOps, Terraform, CI/CD Pipelines, Apache Airflow, Apache Atlas, Python, Scala, Shell Scripting, REST APIs, MLlib Role: Big Data Developer September 2018 June 2020 Client: American Express, Phoenix, AZ Responsibilities: Designed and built end-to-end ETL pipelines using Azure Data Factory to orchestrate data movement and transformation across on-prem and cloud sources. Developed scalable Spark-based transformations in Azure Databricks using PySpark and Scala to process structured and semi-structured data. Implemented real-time ingestion using Azure Event Hubs and Azure Functions into Synapse Analytics, enabling near real-time analytics and timely insights. Migrated large on-prem databases to Azure SQL and Synapse using Azure DMS, ensuring zero data loss and minimal downtime. Developed Synapse-optimized schemas, views, and materialized queries to support complex business intelligence reporting. Integrated Azure Data Factory with Azure Logic Apps to trigger event-driven workflows and automate actions based on business rules. Used Azure Data Lake and Azure Blob Storage for raw and processed data storage with compression, encryption, and partitioning strategies to optimize cost and performance. Tuned Spark configurations, caching, and partitioning in Databricks to reduce job runtime and cost by over 30%. Designed and implemented Delta Lake architecture in Databricks for reliable, scalable storage and ACID-compliant data lakes. Created and deployed Azure Functions for enrichment, validation, and preprocessing within serverless ETL pipelines. Built CDC-based data replication using Azure Data Factory to synchronize Synapse with operational systems, maintaining high availability. Enabled predictive analytics by integrating Azure Machine Learning models with Synapse workflows for advanced use cases. Implemented data cataloging and lineage tracking using Azure Purview and Apache Atlas, ensuring end-to-end data traceability. Developed OLAP dashboards and drill-down reports in Power BI, empowering business teams with self-service analytics from Synapse and Data Lake sources. Developed YAML-based deployment automation scripts to streamline ADF and Databricks deployment workflows, reducing human errors and improving release speed. Migrated and transformed RDBMS (Oracle) data using Sqoop into Hadoop ecosystem for historical processing, before cloud migration. Developed and ran Oozie workflows to orchestrate multi-step Hadoop jobs and logged execution metadata for performance tuning. Designed Spark-on-Kubernetes workflows for distributed high-performance analytics pipelines. Implemented data classification algorithms and batch analytics in Spark for customer segmentation and risk analysis use cases. Collaborated with data engineers to resolve JVM and Spark execution issues, improving cluster stability in Databricks. Utilized Git for version control and Jira for Agile project tracking and sprint planning, ensuring transparency and consistent delivery. Documented architecture diagrams, best practices, and lessons learned for future Azure migrations and platform improvements. Environment: Azure Data Factory, Databricks, Synapse Analytics, Azure Functions, Azure DMS, Azure Data Lake, Azure Purview, Power BI, Event Hubs, Blob Storage, PySpark, Kafka, Hive, YAML, JIRA, Git, Spark, HDFS, Scala, Shell Scripting Role: Data Warehouse Developer September 2013 October 2017 Client : Zensar Technologies, Hyderabad, India Responsibilities: Experience in developing complex store procedures, efficient triggers, required functions, and creating indexes and indexed views for performance. Excellent Experience in monitoring SQL Server Performance tuning in SQL Server Expert in designing ETL data flows using SSIS, creating mappings/workflows to extract data from SQL Server, and Data Migration and Transformation from Access/Excel Sheets using SQL Server SSIS. Efficient in Dimensional Data Modeling for Data Mart design, identifying Facts and Dimensions, and developing fact tables, and dimension tables, using Slowly Changing Dimensions (SCD). Experience in Error and Event Handling: Precedence Constraints, Break Points, Check Points, Logging. Experienced in Building Cubes and Dimensions with different Architectures and Data Sources for Business Intelligence and writing MDX Scripting. Thorough knowledge of Features, Structure, Attributes, Hierarchies, Star and Snowflake Schemas of Data Marts. Good working knowledge of Developing SSAS Cubes, Aggregation, KPIs, Measures, Partitioning Cube, Data Mining Models, and Deploying and Processing SSAS objects. Proficient in NoSQL databases like MongoDB and Cassandra for efficient handling and analysis of large-scale unstructured and semi-structured data in data warehouse environments. Proficient in SharePoint, leveraging expertise in managing and collaborating on documents, workflows, and team sites to enhance productivity and streamline business processes. Developed stored procedures and triggers to facilitate consistent data entry into the database. Shared data outside using Snowflake to quickly set up data without transferring or developing pipelines. Experience in creating Ad hoc reports and reports with complex formulas and querying the database for Business Intelligence. Designed and supported dashboard components using QlikView and Power BI, enabling stakeholders to visualize KPIs and trends across financial services datasets. Expertise in developing Parameterized, Charts, Graph, Linked, Dashboard, Scorecards, and Report on SSAS Cube using Drill-down, Drill-through, and Cascading reports using SSRS. Flexible, enthusiastic, and project-oriented team player with excellent written, verbal communication, and leadership skills to develop creative solutions for challenging client needs. Environment: MS SQL Server 2016, Visual Studio 2017/2019, SSIS, Share point, MS Access, Team Foundation server, GIT. Keywords: cprogramm continuous integration continuous deployment quality analyst business intelligence sthree database active directory information technology microsoft procedural language Arizona Idaho Texas |