Home

Farhana Zaman Rozony - Sr.Data Engineer
[email protected]
Location: Houston, Texas, USA
Relocation: Yes
Visa: GC_EAD
Resume file: Farhana Data Engineer_1745594706090.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Farhana Zaman Rozony
Sr. Data Engineer
Ph: 832-271-6671
Email: [email protected]
LinkedIn:

SUMMARY:
Over 9 years of hands-on experience in designing, developing, and implementing enterprise-level data engineering and ETL solutions across diverse industries.
Expertise in cloud platforms, including Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP), with a strong focus on data integration and cloud-native solutions.
Proficient in ETL development using Python, PySpark, SQL, and Unix shell scripting for automating data ingestion, transformation, and loading processes.
Extensive experience with Big Data technologies, including Apache Hadoop, Apache Spark, Hive, and Cassandra, to handle large-scale data processing and analytics.
Skilled in developing scalable and fault-tolerant data pipelines to support real-time and batch processing use cases.
Strong command of workflow automation and orchestration using tools like Apache Airflow, Jenkins, and Control-M to streamline end-to-end data processes.
Hands-on experience with modern data platforms such as Snowflake, Databricks, and Delta Lake for advanced analytics and high-performance computing.
Proficient in creating interactive dashboards and data visualizations using Tableau, Power BI, and SQL-based reporting tools to enable business insights and data-driven decision-making.
Deep understanding of data warehousing, data modeling (Star/Snowflake schemas), and data governance best practices.
Strong knowledge of Agile methodologies (Scrum, Kanban) and experience working in cross-functional teams to deliver high-quality data solutions on time.
Adept at troubleshooting and optimizing data systems for performance, scalability, and reliability in fast-paced production environments.
Proven ability to collaborate with stakeholders, data scientists, analysts, and DevOps teams to translate business requirements into robust technical solutions
.
Education:
Bachelor's in Computer Science from Osmania University, India.

Certifications:
Certified in Microsoft Azure Fundamentals
Certified AWS Cloud Practitioner
Certified Database Programming with SQL from Oracle Academy


SKILLSET

Big Data Ecosystems Hadoop, Map Reduce, Spark, HDFS, HBase, Pig, Hive, Sqoop, Kafka, Cloudera, Horton Works, Oozie, Nifi, and Airflow.
Spark Technologies Spark SQL, Spark Data frames, and RDD
Scripting Languages Python and shell scripting
Programming Languages Python, Scala, SQL, PL/SQL
Cloud Technologies: Azure, AWS EMR, EC2, S3, Glue, Athena, Redshift, Docker
Databases Oracle, MySQL, and Microsoft SQL Server
NoSQL Technologies HBase, MongoDB, Cassandra, DynamoDB
BI tools Tableau, Kibana, Power BI
Web Technologies SOAP, and REST.
Other Tools Eclipse, PyCharm, Git, ANT, Maven, Jenkins, SOAP UI, QC, Jira, Bugzilla
Methodologies Agile /Scrum, Waterfall
Cloud Technologies Azure, AWS, GCP
Operating Systems: Windows, UNIX, LINUX.

Professional Experience

Gainwell, NYC, NY (Hybrid) Jan 2024 Till Date
Sr. Azure Data Engineer
Responsibilities:
Designed and implemented scalable ETL pipelines in Azure Databricks using Scala, PySpark, and Spark SQL to process and analyze high-volume customer utilization data from diverse file formats.
Developed robust data ingestion solutions integrating Apache HBase with Spark, performing efficient CRUD operations, and optimizing data retrieval for distributed systems.
Automated data validation workflows in Azure Data Factory (ADF) using Python scripts, enhancing pipeline reliability and reducing manual overhead.
Configured S3-based data lakes with partitioned Parquet files, enabling scalable storage and efficient querying through Athena and Glue.
Automated data ingestion from external sources into S3 using Python and scheduled workflows.
Led migration efforts of legacy on-premise SQL Server ETL pipelines to Azure Cloud, utilizing ADF and Databricks to improve performance, scalability, and cost efficiency.
Orchestrated data workflows by migrating legacy DAGs to Azure Managed Airflow, enhancing task scheduling, monitoring, and error handling.
Engineered data pipelines across Azure Data Lake, Blob Storage, and Synapse Analytics for large-scale batch and streaming workloads.
Automated structured and semi-structured data ingestion from web APIs into Azure SQL using ADF and custom REST integration patterns.
Enabled seamless data routing between big data systems using Apache NiFi, optimizing data flow across Hadoop clusters and Azure services.
Built real-time streaming pipelines with Apache Kafka, Spark Structured Streaming, and Snowflake, enabling near real-time analytics on ingestion into HDFS.
Utilized libraries such as Pandas, Boto3, and SQLAlchemy for data manipulation, AWS service access, and database operations.
Designed IoT analytics pipelines using Azure Event Hubs, Stream Analytics, and Databricks, driving operational insights through sensor telemetry analysis.
Explored the application of Generative AI frameworks for text summarization and data quality insights in ETL pipelines.
Integrated Spark applications with scheduling tools via Zeppelin, Jupyter Notebooks, and Spark Shell, supporting development and automation needs.
Developed custom Scala UDFs and collaborated with analytics stakeholders to deliver enriched datasets tailored to reporting and ML use cases.
Implemented CI/CD pipelines using Azure DevOps (VSTS) for automated testing, packaging, and deployment across QA and production environments.
Enforced testing standards and peer-reviewed code to ensure maintainability and reliability of production-grade data pipelines.
Built and maintained analytical models in Redshift, improving data aggregation performance and supporting business intelligence dashboards.
Deployed ADF JSON templates to automate data pipeline creation and implemented Cosmos DB connectors for scalable NoSQL processing.
Enabled centralized log ingestion from Kubernetes pods using Filebeat, Logstash, and Elasticsearch, with real-time visualization in Kibana.
Designed and maintained Snowflake data warehouse objects, including schemas, tables, views, and secure shares.
Wrote comprehensive ScalaTest FunSuite unit and integration tests for Spark applications to ensure data quality and business logic validation.
Integrated Gen AI APIs with Python-based data engineering workflows for proof-of-concept automation tasks.
Created interactive dashboards and visual reports using Tableau and Python (Pandas, Matplotlib, Seaborn) to support business decision-making.
Built low-latency processing pipelines using Kafka and PySpark, supporting high-throughput use cases in production.
Utilized Git and Bitbucket for source control, implementing feature branching, pull request workflows, and code merge strategies.
Followed Agile methodologies, delivering features and enhancements in bi-weekly sprints with active participation in standups and retrospectives.
Conducted operational diagnostics in Cosmos DB using Visual Studio scripts to troubleshoot performance and connectivity issues.
Improved task automation by scripting workflows in PowerShell, reducing manual tasks and improving operational efficiency.
Integrated Gen AI APIs with Python-based data engineering workflows for proof-of-concept automation tasks.
Deployed and managed containerized solutions using Docker, Kubernetes, Maven, and Jenkins, supporting cloud-native Azure deployments.
Designed and implemented data pipelines to process retail transaction data, including sales, inventory, and product master data, supporting analytics across pricing and replenishment domains.
Transflo, Tampa, FL July 2021 Dec 2023
Sr. Data Engineer
Responsibilities:
Designed and developed end-to-end Big Data workflows using AWS Glue, AWS Lambda, and Amazon EMR, from data ingestion to transformation and loading into Amazon S3 and Redshift.
Built scalable and reusable ETL pipelines with PySpark on EMR, performing complex data aggregations and transformations.
Developed serverless data ingestion frameworks using AWS Lambda, Amazon SQS, and SNS for real-time processing.
Integrated AWS Secrets Manager to securely manage credentials for services like Glue, Redshift, and third-party APIs.
Assisted in building dashboards using BI tools such as Tableau and Power BI to visualize KPIs and support data-driven decision-making.
Created custom data connectors to ingest data from SFTP, RDBMS, NoSQL (MongoDB, Cassandra), and REST APIs into S3 using Python and Boto3.
Designed and implemented data lake architecture on S3 with partitioned data, leveraging Parquet, ORC, and Avro formats.
Performed data wrangling, validation, and enrichment using PySpark and Hive on Amazon EMR.
Tuned performance of HiveQL, Spark SQL, and PySpark jobs for efficient processing of large-scale datasets.
Orchestrated data workflows using Apache Airflow on Amazon MWAA, enabling scheduling, monitoring, and retries of data pipelines.
Developed event-driven streaming applications using Kinesis Data Streams and Kinesis Data Analytics to process IoT and real-time log data into Amazon Redshift and S3.
Led data quality checks and reconciliation processes across retail systems like RMS, RPM, and ReSA to ensure accurate reporting and audit compliance.
Migrated legacy ETL workloads from on-premise systems (SAS, Ab Initio) to AWS-native solutions with PySpark and Glue.
Implemented data ingestion pipelines on Amazon EC2-hosted Hadoop clusters, managing Hive, Sqoop, and Spark installations.
Documented retail data flows and created knowledge transfer materials to onboard new team members and stakeholders in merchandising analytics projects.
Built custom logging and alerting solutions using CloudWatch, SNS, and Lambda for proactive pipeline monitoring.
Developed and deployed Python-based data transformation scripts in Docker containers, managed with AWS CodePipeline and Git.
Built serverless data transformation services in Node.js, deployed on Lambda, for handling high-throughput REST API traffic.
Participated in regular patching and upgrades of Azure Stack infrastructure, ensuring zero downtime and compliance with Microsoft best practices.
Collaborated with data analysts to validate reporting logic and ensure accurate data representation from Redshift and S3 sources.

Sierra Nevada, Sparks, NV May 2019 Jun 2021
Data Engineer
Responsibilities:
Developed scalable backend data pipelines using Scala and Apache Spark, implementing complex data aggregation logic.
Engineered and optimized ETL architectures for data ingestion and transformation, creating Source-to-Target mappings to efficiently populate a Data Warehouse.
Implemented data extraction processes from flat files and RDBMS into staging environments, ensuring seamless integration into the Data Warehouse.
Created and managed external Hive tables, enabling efficient querying and analysis of large datasets for downstream processing.
Designed stateful ETL workflows using AWS Step Functions to coordinate Lambda invocations and error handling across distributed data processing tasks.
Supported and tested MapReduce jobs for raw data processing, ensuring results were accurately stored in HDFS.
Automated data exports from Hive tables to SQL databases using Sqoop, enabling visualization and reporting for business intelligence teams.
Designed and deployed high-availability, scalable applications on AWS, utilizing services such as EC2, S3, Route 53, RDS, SNS, SQS, IAM, and CloudFormation.
Developed and optimized PySpark jobs in AWS Glue and EMR pipelines, improving processing efficiency.
Supported the ingestion of semi-structured data into DynamoDB for downstream analytics in Redshift.
Automated infrastructure management with AWS Lambda functions written in Python.
Worked with HBase and MongoDB, implementing NoSQL solutions to store and process large, unstructured datasets.
Optimized partitioning strategies in Hive to improve query performance and supported the BI team with data analysis and visualization in tools like Tableau.
Validated and implemented data flows across Hadoop and Teradata environments, ensuring data integrity through business logic checks.
Built and optimized data pipelines with Talend, leveraging Apache Spark for predictive analytics and real-time data processing.
Developed PL/SQL procedures, triggers, and packages to automate business processes and improve database efficiency.
Automated data workflows and frameworks using Python, Shell scripting, and asynchronous programming frameworks like Node.js and Twisted.
Integrated AWS Lambda with DynamoDB to enable real-time lookups and updates in serverless microservices architecture.
Managed source control using Git and automated builds with Maven.
Monitored and managed ETL workflows to track data movement and ensure seamless processing in Hadoop applications.
Performed data reconciliation between source files and extracted datasets to validate ETL accuracy.
Utilized SQL Loader for data transfer from flat files to Oracle databases, ensuring efficient and reliable loading.
Integrated data from SQL Server into data marts, views, and flat files for reporting and analysis, primarily using T-SQL.
Wrote modular JavaScript functions (ES6+) to support front-end data visualizations and dashboard enhancements in BI platforms.

Locuz, India Aug 2016 Dec 2018
BigData Engineer
Responsibilities:
Designed and implemented Data Warehouse integrated with OLTP systems to enable streamlined reporting and analytics.
Developed and maintained ETL pipelines for the PSN project, ensuring efficient data transformation and integration.
Produced detailed reports for third-party stakeholders, ensuring accurate royalty payment calculations and compliance.
Managed PowerCenter user accounts and workspaces, optimizing access and security across ETL processes.
Automated file transfers and email tasks through advanced UNIX and Windows scripting, improving operational efficiency.
Developed PL/SQL procedures and Stored Procedure Transformations to automate data workflows.
Proficient in Oracle and SQL Server, writing complex queries to extract and analyze ERP system data.
Developed Python-based Lambda functions for event-driven data ingestion, transformation, and alerting across S3 and DynamoDB.
Migrated ETL workflows from Talend to Informatica, managing development, testing, and post-production support.
Documented ETL processes for future maintainability and transparency.
Optimized ETL job performance by tuning workflows, ensuring system efficiency and minimal downtime.
Administered Talend Console, ensuring smooth operation of production jobs.
Designed Business Objects universes and developed interactive reports for business users.
Created ad-hoc reports using standalone tables, providing flexible data analysis for stakeholders.
Developed and modified Web Intelligence reports, supporting decision-making with accurate insights.
Generated Publications tailored to vendor-specific reporting needs.
Created custom SQL queries for complex reporting requirements, ensuring precise data analysis.
Collaborated with business partners to define requirements and data specifications for reporting and data warehouse solutions.
Implemented serverless compute for real-time processing of streaming data via SQS and Kinesis into Redshift.
Provided data extracts in Excel for meetings, simplifying data discussions.
Resolved data discrepancies through root cause analysis, ensuring accurate reports.
Addressed user-reported issues, troubleshooting and resolving report or data source problems.
Delivered custom ad-hoc reports to meet specific user requirements, ensuring actionable insights
Provided troubleshooting support for critical finance reports during month-end, ensuring timely and accurate financial reporting.
Keywords: continuous integration continuous deployment quality analyst artificial intelligence machine learning user interface javascript business intelligence sthree database active directory procedural language Florida Nevada New York

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5361
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: