Resume View

Home

Sai Lakshmi - Business Data Analyst/ Data Engineer

Location: Bordentown, New Jersey, USA

Relocation: Yes

Visa: GC

Resume file: Sai Lakshmi Data Engineer Resume_1745351784909.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

Sr. Business Data Analyst | Data Engineer

Name: Sai Lakshmi
Email.Id: [email protected]
Ph. No: +1 (201) 609-8006
PROFESSIONAL SUMMARY:
Business Data Analyst & Aspiring Data Engineer with over 11 years of experience designing and optimizing ETL
pipelines, building real-time data workflows, and developing cloud-based analytics solutions. Proficient in SQL,
Python, Spark, Power BI, and cloud platforms (AWS, Azure). Experienced in big data ecosystems, data modeling, and
HIPAA-compliant healthcare systems. Skilled in both business insights and backend data engineering bridging gaps
between strategy, analytics, and data architecture.
Strong professional experience with emphasis on Analysis, design, development, testing, maintenance and
implementation of Data Mapping, Data Validation, and Requirement gathering in Data warehousing Environment.
Experience in data warehousing applications using ETL tools and programming languages like python, R, Java,Scala,
Matlab,SQL/PLSQL, Oracle and SQL Server Database And SSIS.
Experience in handling huge set of data using cloud clusters like Amazon Web Services (AWS), MS Azure,Amazon,
Redshift, Hadoop and archiving the data.
Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata.
Experience in providing custom solutions like Eligibility criteria, Match and Basic contribution calculations for major
clients using Informatica and reports using Power BI, Tableau, Looker, QlikView.
Extensively used Python Libraries PySpark, Pytest, Pymongo, cx_Oracle, PyExcel, Boto3, Psycopg, embedPy, NumPy
and Beautiful Soup.
Experience in Data Analysis, Data Profiling, Data Migration, Data Integration and validation of data across all the
integration points.
Familiarity with ETL (Extract, Transform, Load) processes, tools like Apache NiFi, Apache Airflow, or dbt (Data Build
Tool) are crucial for working with data pipelines.
Built predictive models using Google AutoML, H2O.ai and TPOT achieving increased revenue and improved accuracy.
Familiarity with GDPR and CCPA regulations, and ensuring data privacy is maintained, is becoming increasingly
important.
Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting
Services (SSRS).
Experienced in Big Data technologies including Apache Hadoop and Apache Spark With expertise in data extraction and
exploratory analysis.
Implemented Scikit-learn, PyTorch, Keras, and TensorFlow for machine learning, developing predictive models to
forecast waste generation patterns and optimize resource allocation.
Developed and maintained multiple batch and streaming data pipelines supporting real-time business operations and
predictive analytics.
Automated ingestion and transformation of data across cloud and on-prem systems using tools like Apache NiFi, Spark,
and Kafka.
Reduced pipeline execution time by 40% through distributed processing optimizations using AWS Redshift and Spark
SQL.
Designed and developed weekly, monthly reports by using MS Excel Techniques (Quip, Zoho Sheet or WPSSpreadsheets.
Charts, Graphs, Pivot tables) and Power point presentations
Experience with Data analysis and SQL querying with database integrations and data warehousing for large financial
organizations.
Created and managed over 100 user stories and backlog items using JIRA and MIRO.
Led requirements workshops, documented BRDs and process flows, and aligned stakeholder expectations.
Facilitated UAT efforts, test case creation, and defect triage for internal dashboards and data systems.
Built interactive Power BI dashboards tied to executive KPIs across health and insurance domains.
Improved reporting accuracy and delivery time through business process optimization and collaboration with cross-
functional teams.
Strong working experience in Data Cleaning, Data Warehousing and Data Massaging using PythonLibraries and MySQL.
Experience in creating Ad-hoc reports, data driven subscription reports by using SQL.
Expertise in Power BI, Power BI Pro, Power BI Mobile. Expert in creating and developing Power BI Dashboards.
Experienced in RDBMS such as Oracle, MySQL and IBM DB2 databases.

Hands on experience in complex querying writing, query optimizing in relation Databases including Oracle, T-SQL,
Teradata, SQL Server and Python.
Experienced in business requirements collection methods using Agile, Scrum and Waterfall methods and software
development life cycle (SDLC) testing methodologies, disciplines, tasks, resources and scheduling.
Extensive knowledge on Data Profiling using Informatica Developer 9.x/8.6.0/8.1.1/7. x/6.x tool.
Understanding version control systems like Git is becoming important for collaborating on data-related projects,
especially in team settings.

TECHNICAL SKILLS:
Languages SAS, SQL, Python, R, Java, Scala, Matlab, Shell Scripting
BI Tools Tableau, Microsoft Power BI, PowerPivot
Business Analysis & Agile Tools JIRA, Confluence, MIRO, Visio, MS Project, Agile/Scrum, User Stories, Backlog

Grooming

Data Warehousing Tools Talend, SSIS, SSAS, SSRS, Toad Data Modeller
Python Libraries Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly
Data Visualization Tableau, Microsoft Power BI
Documentation & Analysis BRD, FRD, User Stories, Acceptance Criteria, SOPs, Flowcharts, Process Flows,

UAT Test Cases

ETL Informatica PowerCenter, SSIS, Apache NiFi, Airflow
Machine learning models scikit-learn and TensorFlow
Statistical Techniques A/B Testing, Regression, Hypothesis Testing
Version Control Git, GitHub, Bitbucket, Docker, GitHub Actions
Big Data Ecosystem Hadoop, Spark, PySpark, Kafka, Hive, HDFS
Microsoft Tools Microsoft Office, MS Project
Database Tools SQL server, MySQL, MS Excel, PostgreSQL, SQLite, MongoDB
Data Analysis Web Scraping, Statistical Modelling, Hypothesis testing, Predictive Modelling
Data Mining Algorithms Decision Trees, Clustering, Random Forest, Regression
Soft Skills Stakeholder Communication, Influencing Decision-Makers, Requirements
Gathering, Process Optimization, Organized & Detail-Oriented, Cross-functional
Collaboration, Agile Team Alignment

CERTIFICATIONS
Google Data Analytics Professional Certificate-Coursera
Microsoft Certified: Power BI Data Analyst Associate
PROFESSIONAL EXPERIENCE:
Client: Southwest Airlines, Dallas TX April 2023 - Present
Role: Business Analytics & Data Engineering Specialist
Responsibilities:
Worked as an Agile Business Data Analyst, liaising with stakeholders and Product Owners to define and manage user
stories, business requirements, and UAT planning in JIRA. Facilitated backlog grooming and ensured team alignment on
priorities.
Analyzing and validating large datasets to support ad-hoc analysis, reporting, and remediation using SAS.
Conducted data analysis across large datasets using SQL and SAS to support business decision-making and executive
scorecard development.
Processing data from tools such as Snowflake and writing complex queries in SQL or SAS using complex joins, sub-
queries, Table creation, Aggregation, and using concepts of DLL, DQL and DML.
Perform ETL (Extract Transform Load) using tools such as Informatica, azure to integrate/transform disparate data
sources.
Built and maintained real-time and batch ETL pipelines using Apache NiFi, Informatica, and Airflow, integrating
multiple data sources into cloud-based data lakes.
Led requirements gathering sessions with business stakeholders and converted them into detailed user stories and
acceptance criteria.
Regularly presented insights and solution proposals to product owners and senior management, helping influence key
decisions on prioritization and delivery timelines.

Evaluated cross-functional business processes and identified automation opportunities that improved report
generation speed by 30%.
Conducted ongoing reviews of analytics workflows and implemented optimization strategies to streamline
performance metric tracking.
Delivered recommendations through stakeholder-facing presentations, ensuring alignment on data priorities, KPIs, and
business impact.
Regularly monitored competing priorities, working with team leads to ensure timely delivery of sprint-based
deliverables.
Perform data scraping, data cleaning, data analysis, and data interpretation and generate meaningful reports using
Python libraries like pandas, matplotlib, etc.
Developed PySpark jobs for processing large-scale datasets and integrated them with Hadoop/S3 for distributed
analytics.
Adept at leveraging cloud platforms such as AWS, Azure, and Google Cloud to support scalable data storage and analytics
solutions.
Partnered with the tech team to implement system configuration changes and updated lookup tables and backend
mappings as part of iterative releases.
Managed Agile ceremonies including sprint planning, backlog grooming, and daily standups using JIRA and Confluence.
Generate weekly reports with visualization using tools such as MS EXCEL (Pivot tables and macros), and Tableau to
enable business decisions making.
Created near real-time data ingestion workflows using Kafka and Spark Structured Streaming to support operational
reporting dashboards.
Coordinated UAT testing by preparing test cases, facilitating sign-offs, and tracking defect resolution.
Created performance dashboards using Power BI for executive scorecard reporting across departments.
Performed Data analysis, statistical analysis, generated reports, listings and graphs using SAS Tools
SAS/Base,SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect, SAS/Access.
Generate various dashboards and created calculated fields in Tableau for data intelligence and analysis based on the
business requirement.
Played a pivotal role in the selection and utilization of ML frameworks including PyTorch, TensorFlow, andScikit-learn,
aligning technology choices with project requirements.
Applied advanced analytical techniques to solve business problems that are typically medium to large scale with impact to
current and/or future business strategy.
Mapped and optimized business processes to streamline reporting pipelines and reduce turnaround by 25%.
Applied innovative and scientific/quantitative analytical approaches to draw conclusions and make 'insight to action'
recommendations to answer the business objective and drive the appropriate change.
Translated recommendation into communication materials to effectively present to colleagues for peer review and mid-
to-upper-level management.
Familiarity with Spark SQL and PySpark (for Python) can be a game changer when working with large-scale data
analysis.
Incorporated visualization techniques to support the relevant points of the analysis and ease the understanding for less
technical audiences.
Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports.
Identified and gathered the relevant and quality data sources required to fully answer and address the problem for the
recommended strategy through testing or exploratory data analysis (EDA).
Transformed disparate data sources and determines the appropriate data hygiene techniques to apply.
Understands and adopts emerging technology that can affect the application of scientific methodologies and/or
quantitative analytical approaches to problem resolutions.
Delivers analysis/findings in a manner that conveys understanding, influences mid to upper-level management, garners
support for recommendations, drives business decisions, and influences business strategy.
Environment: SAS, Dremio, Snowflake, Spark, Tableau, Agile, UAT Testing, UML, JIRA, Confluence, Hadoop, Kafka, Apache
NiFi, Airflow, Pandas, NumPy, seaborn, SciPy, Matplotlib, PowerBI, T-SQL, MS SQL Server, MS Excel, , AWS (S3, Redshift),
Azure, MS Visio, ETL - SSIS, SSRS, Informatica, PySpark, Shell Scripting, Git, Data Modelling - Star Schema,Business Process
Mapping, Star Schema, SnowFlake Schema.
Client: JME Insurance, Dallas, TX Nov 2021 Mar 2023

Role: Hybrid Agile Data Analyst / Engineer
Responsibilities:
Collaborated with stakeholders and cross-functional teams to elicit, elaborate and capture functional and non-functional
solutions.
Led periodic business process assessments to identify gaps in insurance claims and reporting workflows, implementing
configuration-based improvements using internal tools.
Stayed current with healthcare analytics platform advancements and led small-scale user platform updates,
including table-level administration.
Translated business requirements to technical requirement and did data modeling as per the technical requirement.
Performed analytical modeling, database design, data analysis, regression analysis, data integrity, and business analytics.
Created Data Mapping between the source and the target system. Created documentation to map source and target tables
columns and datatypes.
Worked within the Life and Disability insurance domain, supporting group insurance analytics through data modeling,
reporting, and executive dashboard creation.
Collaborated with cross-functional teams to gather functional and non-functional requirements and documented
detailed BRDs and FRDs.
Skilled in machine learning algorithms for anomaly detection and predictive analytics, leveraging frameworks like
PyODand XGBoost.
Experienced in conducting geographic analysis using geospatial analysis frameworks like GeoPandas and Folium.
Built Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau,
Power BI. Strong experience in migrating other databases to Snowflake.
Collaborated with engineering teams to design and deploy cloud-native pipelines in AWS Redshift and Azure Data
Lake, enabling scalable data processing.
Performing basic DML and DDL skills like writing subqueries, window function and CTE.
WroteSQL Queries using insert, update, delete statements and exported data in the form of csv, xml, txt etc.
Also, wrote SQL queries that included joins like inner, outer, left, right, self-join in SQL Server.
Developed user stories, acceptance criteria, and maintained backlog items using JIRA.
Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression.
Compared different WFHM DB environments and determined, resolved and documented discrepancies
Involved in ETL development, creating required mappings for the data flow using SSIS.
Performed A/B testing, hypothesis testing, and regression modeling to evaluate the impact of health initiatives.
Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib, SciPy.
Designed ETL flows to implement using Informatica Power Center as per the mapping sheets provided.
Optimization of data sources for route distribution analytics dashboard in PowerBI report runtime.
Conducted UAT planning and execution for healthcare analytics solutions and stakeholder reports.
Involved in developing UML use-case diagrams, Class diagrams, and diagrams using MS Visio.
Worked on predictive analytics use-cases using Python language.
Conducted A/B testing and statistical analysis using frameworks like SciPyStats and Models to evaluate the effectiveness
of marketing campaigns and product features.
Created and maintained detailed documentation, including BRDs, SOPs, process flow diagrams, and test scripts.
Conducted and tracked UAT sessions, working directly with cross-functional insurance stakeholders to validate new
dashboards and data pipelines.
Enforced HIPAA and GDPR compliance in healthcare data workflows by embedding data masking and validation rules
within ETL layers.
Designed Power BI dashboards for performance tracking across population health and health plan metrics.
Developed serverless data processing pipelines using AWS Lambda functions data workflows and integrating with API
Gateway for event-driven processing.
Facilitated business process reviews and implemented enhancements to improve data delivery speed.
Manage Departmental Reporting systems, troubleshooting daily issues, and integrating existing Access databases with
numerous external data sources including (SQL, Excel, & Access).
Utilized PowerBI and custom SQL features to create dashboards and identify correlation
Prepared Scripts in R and Shell for Automation of administration tasks.
Developed and implemented data governance frameworks and policies to ensure data quality and compliance with
regulatory requirements such as GDPR and CCPA.
Wrote several Teradata SQL Queries using SQL Assistant for Ad Hoc Data Pull request.

Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets.
Created Data Quality Scripts using SQL to validate successful data load and the quality of the data. Created various types of
data visualizations using R and PowerBI.
Performed data analysis and data profiling using complex SQL on various sources systems.
Categorizing and generating a report on the multiple parameters using MS Excel, Power BI.
Worked on logical and physical modeling of various data marts as well asDW/BI architecture using Teradata.
Demonstrated the ability to manage multiple competing priorities and projects while staying highly organized and
detail-focused, ensuring accuracy and timely delivery of deliverables.
Environment: SQL, Power BI, UAT,Azure Data Lake, AWS Redshift, PySpark, Git, Informatica,, Docker, GitHub, JIRA, Agile, MS
Excel, Tableau, Snowflake, SSIS, Informatica, Python (NumPy, Matplotlib, SciPy), MS Visio, Business Process Documentation,
Shell Script, Teradata, Git, Life & Disability Insurance, Group Insurance Analytics, Hive, Regression Analysis, GDPR/HIPAA.
Client: Zensar Technologies,India Oct 2018- Dec 2020
Role: Business Data Analyst
Responsibilities:
Involved in requirements gathering, Analysis, Design, Development, testing production of application using SDLC - Agile/
Scrum model.
Worked on the entire Data Analysis project life cycle and actively involved in all the phases including data cleaning, data
extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and
schema.
Engineered Spark-based solutions using PySpark for transforming raw data into structured datasets optimized for
analytics.
Advanced knowledge, especially libraries like NumPy, pandas, scikit-learn, TensorFlow, PyTorch, and PyCaret.
Also, manipulated the data for benefit related for lab related work and clinic services etc. by writingSQL queries that
included joins like inner, outer, left, right, self-join in SQL Server and exported the data in the form of csv, txt, XML etc.
Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression.
Worked on Data Warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema
and SnowFlake Schema.
Partnered with project managers and end users to gather requirements, draft user stories, and maintain sprint
backlogs using Agile methodologies.
Conducted process flow documentation and proposed automation solutions for ETL error handling and dashboard
reporting delays.
Facilitated internal requirements walkthroughs and presentations, gathering feedback and revising data solutions for
better stakeholder satisfaction.
Expertise in data manipulation, statistical analysis, and visualization using tools like dplyr, ggplot2, and Shiny.
Created Adhoc reports to users in Tableau by connecting various data sources.
Used excel sheet, flat files, CSV files to generated TableauAdhoc reports.
Involved in defining the source to target data mappings, business rules, business and data definitions
Worked closely with stakeholders and subject matter experts to elicit and gather business data requirements.
Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine-learning algorithms and
utilized machine learning algorithms such as linear regression, multivariate regression for data analysis.
Work with business analyst groups to ascertain their database reporting needs.
Created database using MongoDB, wrote several queries to extract data from database.
Wrote scripts in Python for extracting data from HTML file.
Worked with Hadoop ecosystem tools including Hive, HDFS, and Spark SQL to process over 1M+ records per batch.
Worked with connecting the databases from PostgreSQL to python.
Tracked Velocity, Capacity, Burn down Charts, and other metrics during iterations.Created Data flow diagrams.
Using R automated a process to extract data and various document types from a website, save the documents to specified
file path, and upload documents into an excel template. Performed data analysis and data profiling using SQL on various
source systems including Oracle and Teradata.
Utilized SSIS ETL toolset to analyze legacy data for data profiling
Utilized PowerBI reporting to create, test and validate various visualization, reports (ad-hoc), dashboards, and KPI s.
Designed and published visually rich and intuitively interactive PowerBI/Excel workbooks and dashboards for executive
decision making.

Evaluated legacy processes across banking dashboards and prototyped modern alternatives using Power BI and SQL.
Orchestrated big data processing workflows using EMR clusters, leveraging frameworks like Apache Spark and Hadoop
For distributed data processing.
Developed, Streamlined CRM database and built SQL queries for data analysis of 1 million + records.
Generated New Market and Investment Banking reports by using SSRS and increased the efficiency by 50%.
Introduced Power BI, designed dashboards for time-based data and improved performance by 40%.
Build ETL workflows for automated reporting of Investment Banking data and reduceD the workload by 40% using SSIS.
Environment:Tableau, SQL server,NumPy, seaborn, SciPy, Hadoop, Spark, PySpark, Hive, HDFS, Kafka (learning), Matplotlib,
Python, SDLC - gathering, Analysis, Design, Development, testing, Agile/ Scrum, Data Warehouse, MongoDB, PostgreSQL,
Oracle, Teradata, Shell Scripting, Git, MongoDB, Data Lake, Informatica - Informatica Data Explorer, and Informatica Data
Quality, ETL, Data Modelling - Star Schema,SnowFlake Schema, KPI.
Company: HSBC, India Nov 2016 - Sep 2018
Role: Business Data Analyst
Responsibilities:
Generated energy consumption reports using SSRS, which showed the trend over day, month and year.
Performed ad-hoc analysis and data extraction to resolve 20% of the critical business issues.
Contributed to group insurance reporting initiatives by developing automated dashboards and conducting trend analysis
across policy and claims data.
Well versed in Agile SCRUM development methodology, used in day-to-day work in application for Building Automation
Systems (BAS) development.
Weekly, monthly and Quarterly insight reporting utilizing Excel, Tableau and SQL database about pricing trend and
opportunities.
Streamlining and automating Excel/ Tableau dashboards for improved speed utilization through Python and SQL based
solutions.
familiarity withCloud-native analytics tools like AWS Quick Sight, Azure Synapse Analytics, and Google Data Studio.
Designed creative dashboards, storylines for dataset of a fashion store by using Tableau features.
Developed SSIS packages for extract/load/transformation of source data into a DW/BI architecture/OLAP cubes as per
the functional/technical design and conforming to the data mapping/transformation rules.
Developed data cleaning strategies in Excel (multilayer fuzzy match) and SQL (automated typo detection and correction)
to organize alternative datasets daily to produce consistent and high-quality reports.
Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into
the database.
Participated in executive scorecard development, tailoring dashboards to reflect KPIs across group insurance
portfolios.
Performed hands-on UAT coordination, executing test cases, tracking QA feedback, and ensuring all reporting
enhancements met stakeholder needs.
Used MS Office tools (Word, Excel, PowerPoint) to document, track, and present insights to internal teams and managers
for strategy alignment.
Involved in Data Analysis and Data Validation by extracting records from multiple databases using SQL in Oracle SQL
Developer tool.
Understanding SQL-based querying engines for big data platforms like Apache Hive, Impala, and Presto is also becoming
important.
Identified the data source and defining them to build the data source views.
Involved in designing the ETL specification documents like the Mapping document (source to target).
Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse.
Created Stored Procedures and executed the stored procedure manually before calling it in the SSIS package creation
process.
Written SQL test scripts to validate data for different test cases and test scenarios
Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets.
Performed data manipulation - inserting, updating, and deleting data from data sets
Developed various stored procedures for the data retrieval from the database and generating different types of reports
using SQL reporting services (SSRS).

Environment: Windows, SDLC-Agile/Scrum, SQL Server, Group Insurance Reporting, SSIS, SSAS, SSRS, ETL, PL/SQL, Tableau,
Excel, CSV Files, Text Files, OLAP, Data Warehouse, SQL - join, inner join, outer join, and self-joins.
Client:Sagar soft Pvt Limited, India Feb 2013 - Oct 2016
Role: Data Analyst
Responsibilities:
Evaluated new applications and identified system requirements.
Visualized KPI metrics like resource utilization, net profit margin, gross profit margin and burn rate using Tableau.
Worked on time series analysis using Pandas to identify patterns on how asset variable changes which in turn helped
project completion by 70%.
Conducted data extraction, transformation, and loading (ETL) processes using tools like Apache NiFi and Talend to
ingest healthcare data from disparate sources.
Designed and implemented data models using tools like Erwin and SQL Server Management Studio to ensure efficient
storage and retrieval of banking data.
Recommended solutions to increase revenue reduce expense; maximize operational efficiency, quality, compliance, etc.
Identified business requirements and analytical needs from potential data sources.
Performed SQL validation to verify the data extracts integrity and record counts in the database tables
Worked with ETL developers for testing, mapping data and aware of data models to translate and migrate data.
Created Requirements Traceability Matrix (RTMs) using Rational Requisite Pro to ensure complete requirements
coverage with reference to low level design document and test cases.
Assisted the Project Manager to develop both high-level and detailed application architecture to meet user requests and
business needs. Also, assisted on project expectations and in evaluating the impact of changes on the project plans
accordingly and conducted project related presentations and in performing Risk Assessment, Management and Mitigation.
Collaborated with different teams to analyze, investigate and diagnose root cause of problems and publish root cause
analysis report (RCA).
Achieved in using advanced SQL queries and analytic functions for date calculations, cumulative distribution and NTILE
calculations.
Used advanced Excel formulas and functions like Pivot Tables, Lookup, If with and/index, match for data cleaning.
Environment:SQL, ETL, Mapping data,Tableau, NTILE, RCA, RTMs,Pivot Tables, KPI metrics.
Keywords: quality analyst artificial intelligence machine learning business intelligence sthree database active directory rlang information technology microsoft procedural language Idaho Texas

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];5331

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: