Sai Lakshmi - Business Data Analyst/ Data Engineer |
[email protected] |
Location: Bordentown, New Jersey, USA |
Relocation: Yes |
Visa: GC |
Resume file: Sai Lakshmi Data Engineer Resume_1745351784909.docx Please check the file(s) for viruses. Files are checked manually and then made available for download. |
Sr. Business Data Analyst | Data Engineer
Name: Sai Lakshmi Email.Id: [email protected] Ph. No: +1 (201) 609-8006 PROFESSIONAL SUMMARY: Business Data Analyst & Aspiring Data Engineer with over 11 years of experience designing and optimizing ETL pipelines, building real-time data workflows, and developing cloud-based analytics solutions. Proficient in SQL, Python, Spark, Power BI, and cloud platforms (AWS, Azure). Experienced in big data ecosystems, data modeling, and HIPAA-compliant healthcare systems. Skilled in both business insights and backend data engineering bridging gaps between strategy, analytics, and data architecture. Strong professional experience with emphasis on Analysis, design, development, testing, maintenance and implementation of Data Mapping, Data Validation, and Requirement gathering in Data warehousing Environment. Experience in data warehousing applications using ETL tools and programming languages like python, R, Java,Scala, Matlab,SQL/PLSQL, Oracle and SQL Server Database And SSIS. Experience in handling huge set of data using cloud clusters like Amazon Web Services (AWS), MS Azure,Amazon, Redshift, Hadoop and archiving the data. Performed Data analysis and Data profiling using complex SQL on various sources systems including Oracle and Teradata. Experience in providing custom solutions like Eligibility criteria, Match and Basic contribution calculations for major clients using Informatica and reports using Power BI, Tableau, Looker, QlikView. Extensively used Python Libraries PySpark, Pytest, Pymongo, cx_Oracle, PyExcel, Boto3, Psycopg, embedPy, NumPy and Beautiful Soup. Experience in Data Analysis, Data Profiling, Data Migration, Data Integration and validation of data across all the integration points. Familiarity with ETL (Extract, Transform, Load) processes, tools like Apache NiFi, Apache Airflow, or dbt (Data Build Tool) are crucial for working with data pipelines. Built predictive models using Google AutoML, H2O.ai and TPOT achieving increased revenue and improved accuracy. Familiarity with GDPR and CCPA regulations, and ensuring data privacy is maintained, is becoming increasingly important. Extensive experience on usage of ETL & Reporting tools like SQL Server Integration Services (SSIS), SQL Server Reporting Services (SSRS). Experienced in Big Data technologies including Apache Hadoop and Apache Spark With expertise in data extraction and exploratory analysis. Implemented Scikit-learn, PyTorch, Keras, and TensorFlow for machine learning, developing predictive models to forecast waste generation patterns and optimize resource allocation. Developed and maintained multiple batch and streaming data pipelines supporting real-time business operations and predictive analytics. Automated ingestion and transformation of data across cloud and on-prem systems using tools like Apache NiFi, Spark, and Kafka. Reduced pipeline execution time by 40% through distributed processing optimizations using AWS Redshift and Spark SQL. Designed and developed weekly, monthly reports by using MS Excel Techniques (Quip, Zoho Sheet or WPSSpreadsheets. Charts, Graphs, Pivot tables) and Power point presentations Experience with Data analysis and SQL querying with database integrations and data warehousing for large financial organizations. Created and managed over 100 user stories and backlog items using JIRA and MIRO. Led requirements workshops, documented BRDs and process flows, and aligned stakeholder expectations. Facilitated UAT efforts, test case creation, and defect triage for internal dashboards and data systems. Built interactive Power BI dashboards tied to executive KPIs across health and insurance domains. Improved reporting accuracy and delivery time through business process optimization and collaboration with cross- functional teams. Strong working experience in Data Cleaning, Data Warehousing and Data Massaging using PythonLibraries and MySQL. Experience in creating Ad-hoc reports, data driven subscription reports by using SQL. Expertise in Power BI, Power BI Pro, Power BI Mobile. Expert in creating and developing Power BI Dashboards. Experienced in RDBMS such as Oracle, MySQL and IBM DB2 databases. Hands on experience in complex querying writing, query optimizing in relation Databases including Oracle, T-SQL, Teradata, SQL Server and Python. Experienced in business requirements collection methods using Agile, Scrum and Waterfall methods and software development life cycle (SDLC) testing methodologies, disciplines, tasks, resources and scheduling. Extensive knowledge on Data Profiling using Informatica Developer 9.x/8.6.0/8.1.1/7. x/6.x tool. Understanding version control systems like Git is becoming important for collaborating on data-related projects, especially in team settings. TECHNICAL SKILLS: Languages SAS, SQL, Python, R, Java, Scala, Matlab, Shell Scripting BI Tools Tableau, Microsoft Power BI, PowerPivot Business Analysis & Agile Tools JIRA, Confluence, MIRO, Visio, MS Project, Agile/Scrum, User Stories, Backlog Grooming Data Warehousing Tools Talend, SSIS, SSAS, SSRS, Toad Data Modeller Python Libraries Scikit Learn, Pandas, Numpy, Scipy, Matplotlib, Seaborn, Plotly Data Visualization Tableau, Microsoft Power BI Documentation & Analysis BRD, FRD, User Stories, Acceptance Criteria, SOPs, Flowcharts, Process Flows, UAT Test Cases ETL Informatica PowerCenter, SSIS, Apache NiFi, Airflow Machine learning models scikit-learn and TensorFlow Statistical Techniques A/B Testing, Regression, Hypothesis Testing Version Control Git, GitHub, Bitbucket, Docker, GitHub Actions Big Data Ecosystem Hadoop, Spark, PySpark, Kafka, Hive, HDFS Microsoft Tools Microsoft Office, MS Project Database Tools SQL server, MySQL, MS Excel, PostgreSQL, SQLite, MongoDB Data Analysis Web Scraping, Statistical Modelling, Hypothesis testing, Predictive Modelling Data Mining Algorithms Decision Trees, Clustering, Random Forest, Regression Soft Skills Stakeholder Communication, Influencing Decision-Makers, Requirements Gathering, Process Optimization, Organized & Detail-Oriented, Cross-functional Collaboration, Agile Team Alignment CERTIFICATIONS Google Data Analytics Professional Certificate-Coursera Microsoft Certified: Power BI Data Analyst Associate PROFESSIONAL EXPERIENCE: Client: Southwest Airlines, Dallas TX April 2023 - Present Role: Business Analytics & Data Engineering Specialist Responsibilities: Worked as an Agile Business Data Analyst, liaising with stakeholders and Product Owners to define and manage user stories, business requirements, and UAT planning in JIRA. Facilitated backlog grooming and ensured team alignment on priorities. Analyzing and validating large datasets to support ad-hoc analysis, reporting, and remediation using SAS. Conducted data analysis across large datasets using SQL and SAS to support business decision-making and executive scorecard development. Processing data from tools such as Snowflake and writing complex queries in SQL or SAS using complex joins, sub- queries, Table creation, Aggregation, and using concepts of DLL, DQL and DML. Perform ETL (Extract Transform Load) using tools such as Informatica, azure to integrate/transform disparate data sources. Built and maintained real-time and batch ETL pipelines using Apache NiFi, Informatica, and Airflow, integrating multiple data sources into cloud-based data lakes. Led requirements gathering sessions with business stakeholders and converted them into detailed user stories and acceptance criteria. Regularly presented insights and solution proposals to product owners and senior management, helping influence key decisions on prioritization and delivery timelines. Evaluated cross-functional business processes and identified automation opportunities that improved report generation speed by 30%. Conducted ongoing reviews of analytics workflows and implemented optimization strategies to streamline performance metric tracking. Delivered recommendations through stakeholder-facing presentations, ensuring alignment on data priorities, KPIs, and business impact. Regularly monitored competing priorities, working with team leads to ensure timely delivery of sprint-based deliverables. Perform data scraping, data cleaning, data analysis, and data interpretation and generate meaningful reports using Python libraries like pandas, matplotlib, etc. Developed PySpark jobs for processing large-scale datasets and integrated them with Hadoop/S3 for distributed analytics. Adept at leveraging cloud platforms such as AWS, Azure, and Google Cloud to support scalable data storage and analytics solutions. Partnered with the tech team to implement system configuration changes and updated lookup tables and backend mappings as part of iterative releases. Managed Agile ceremonies including sprint planning, backlog grooming, and daily standups using JIRA and Confluence. Generate weekly reports with visualization using tools such as MS EXCEL (Pivot tables and macros), and Tableau to enable business decisions making. Created near real-time data ingestion workflows using Kafka and Spark Structured Streaming to support operational reporting dashboards. Coordinated UAT testing by preparing test cases, facilitating sign-offs, and tracking defect resolution. Created performance dashboards using Power BI for executive scorecard reporting across departments. Performed Data analysis, statistical analysis, generated reports, listings and graphs using SAS Tools SAS/Base,SAS/Macros and SAS/Graph, SAS/SQL, SAS/Connect, SAS/Access. Generate various dashboards and created calculated fields in Tableau for data intelligence and analysis based on the business requirement. Played a pivotal role in the selection and utilization of ML frameworks including PyTorch, TensorFlow, andScikit-learn, aligning technology choices with project requirements. Applied advanced analytical techniques to solve business problems that are typically medium to large scale with impact to current and/or future business strategy. Mapped and optimized business processes to streamline reporting pipelines and reduce turnaround by 25%. Applied innovative and scientific/quantitative analytical approaches to draw conclusions and make 'insight to action' recommendations to answer the business objective and drive the appropriate change. Translated recommendation into communication materials to effectively present to colleagues for peer review and mid- to-upper-level management. Familiarity with Spark SQL and PySpark (for Python) can be a game changer when working with large-scale data analysis. Incorporated visualization techniques to support the relevant points of the analysis and ease the understanding for less technical audiences. Used Power BI Desktop to develop data analysis multiple data sources to visualize the reports. Identified and gathered the relevant and quality data sources required to fully answer and address the problem for the recommended strategy through testing or exploratory data analysis (EDA). Transformed disparate data sources and determines the appropriate data hygiene techniques to apply. Understands and adopts emerging technology that can affect the application of scientific methodologies and/or quantitative analytical approaches to problem resolutions. Delivers analysis/findings in a manner that conveys understanding, influences mid to upper-level management, garners support for recommendations, drives business decisions, and influences business strategy. Environment: SAS, Dremio, Snowflake, Spark, Tableau, Agile, UAT Testing, UML, JIRA, Confluence, Hadoop, Kafka, Apache NiFi, Airflow, Pandas, NumPy, seaborn, SciPy, Matplotlib, PowerBI, T-SQL, MS SQL Server, MS Excel, , AWS (S3, Redshift), Azure, MS Visio, ETL - SSIS, SSRS, Informatica, PySpark, Shell Scripting, Git, Data Modelling - Star Schema,Business Process Mapping, Star Schema, SnowFlake Schema. Client: JME Insurance, Dallas, TX Nov 2021 Mar 2023 Role: Hybrid Agile Data Analyst / Engineer Responsibilities: Collaborated with stakeholders and cross-functional teams to elicit, elaborate and capture functional and non-functional solutions. Led periodic business process assessments to identify gaps in insurance claims and reporting workflows, implementing configuration-based improvements using internal tools. Stayed current with healthcare analytics platform advancements and led small-scale user platform updates, including table-level administration. Translated business requirements to technical requirement and did data modeling as per the technical requirement. Performed analytical modeling, database design, data analysis, regression analysis, data integrity, and business analytics. Created Data Mapping between the source and the target system. Created documentation to map source and target tables columns and datatypes. Worked within the Life and Disability insurance domain, supporting group insurance analytics through data modeling, reporting, and executive dashboard creation. Collaborated with cross-functional teams to gather functional and non-functional requirements and documented detailed BRDs and FRDs. Skilled in machine learning algorithms for anomaly detection and predictive analytics, leveraging frameworks like PyODand XGBoost. Experienced in conducting geographic analysis using geospatial analysis frameworks like GeoPandas and Folium. Built Text Analytics, generating data visualizations using R, Python and creating dashboards using tools like Tableau, Power BI. Strong experience in migrating other databases to Snowflake. Collaborated with engineering teams to design and deploy cloud-native pipelines in AWS Redshift and Azure Data Lake, enabling scalable data processing. Performing basic DML and DDL skills like writing subqueries, window function and CTE. WroteSQL Queries using insert, update, delete statements and exported data in the form of csv, xml, txt etc. Also, wrote SQL queries that included joins like inner, outer, left, right, self-join in SQL Server. Developed user stories, acceptance criteria, and maintained backlog items using JIRA. Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression. Compared different WFHM DB environments and determined, resolved and documented discrepancies Involved in ETL development, creating required mappings for the data flow using SSIS. Performed A/B testing, hypothesis testing, and regression modeling to evaluate the impact of health initiatives. Generating various capacity planning reports (graphical) using Python packages like NumPy, matplotlib, SciPy. Designed ETL flows to implement using Informatica Power Center as per the mapping sheets provided. Optimization of data sources for route distribution analytics dashboard in PowerBI report runtime. Conducted UAT planning and execution for healthcare analytics solutions and stakeholder reports. Involved in developing UML use-case diagrams, Class diagrams, and diagrams using MS Visio. Worked on predictive analytics use-cases using Python language. Conducted A/B testing and statistical analysis using frameworks like SciPyStats and Models to evaluate the effectiveness of marketing campaigns and product features. Created and maintained detailed documentation, including BRDs, SOPs, process flow diagrams, and test scripts. Conducted and tracked UAT sessions, working directly with cross-functional insurance stakeholders to validate new dashboards and data pipelines. Enforced HIPAA and GDPR compliance in healthcare data workflows by embedding data masking and validation rules within ETL layers. Designed Power BI dashboards for performance tracking across population health and health plan metrics. Developed serverless data processing pipelines using AWS Lambda functions data workflows and integrating with API Gateway for event-driven processing. Facilitated business process reviews and implemented enhancements to improve data delivery speed. Manage Departmental Reporting systems, troubleshooting daily issues, and integrating existing Access databases with numerous external data sources including (SQL, Excel, & Access). Utilized PowerBI and custom SQL features to create dashboards and identify correlation Prepared Scripts in R and Shell for Automation of administration tasks. Developed and implemented data governance frameworks and policies to ensure data quality and compliance with regulatory requirements such as GDPR and CCPA. Wrote several Teradata SQL Queries using SQL Assistant for Ad Hoc Data Pull request. Extracting the source data from Oracle tables, MS SQL Server, sequential files and excel sheets. Created Data Quality Scripts using SQL to validate successful data load and the quality of the data. Created various types of data visualizations using R and PowerBI. Performed data analysis and data profiling using complex SQL on various sources systems. Categorizing and generating a report on the multiple parameters using MS Excel, Power BI. Worked on logical and physical modeling of various data marts as well asDW/BI architecture using Teradata. Demonstrated the ability to manage multiple competing priorities and projects while staying highly organized and detail-focused, ensuring accuracy and timely delivery of deliverables. Environment: SQL, Power BI, UAT,Azure Data Lake, AWS Redshift, PySpark, Git, Informatica,, Docker, GitHub, JIRA, Agile, MS Excel, Tableau, Snowflake, SSIS, Informatica, Python (NumPy, Matplotlib, SciPy), MS Visio, Business Process Documentation, Shell Script, Teradata, Git, Life & Disability Insurance, Group Insurance Analytics, Hive, Regression Analysis, GDPR/HIPAA. Client: Zensar Technologies,India Oct 2018- Dec 2020 Role: Business Data Analyst Responsibilities: Involved in requirements gathering, Analysis, Design, Development, testing production of application using SDLC - Agile/ Scrum model. Worked on the entire Data Analysis project life cycle and actively involved in all the phases including data cleaning, data extraction and data visualization with large data sets of structured and unstructured data, created ER diagrams and schema. Engineered Spark-based solutions using PySpark for transforming raw data into structured datasets optimized for analytics. Advanced knowledge, especially libraries like NumPy, pandas, scikit-learn, TensorFlow, PyTorch, and PyCaret. Also, manipulated the data for benefit related for lab related work and clinic services etc. by writingSQL queries that included joins like inner, outer, left, right, self-join in SQL Server and exported the data in the form of csv, txt, XML etc. Summarized data from a sample using indexes such as mean or standard deviation and performed linear regression. Worked on Data Warehousing principles like Fact Tables, Dimensional Tables, Dimensional Data Modelling - Star Schema and SnowFlake Schema. Partnered with project managers and end users to gather requirements, draft user stories, and maintain sprint backlogs using Agile methodologies. Conducted process flow documentation and proposed automation solutions for ETL error handling and dashboard reporting delays. Facilitated internal requirements walkthroughs and presentations, gathering feedback and revising data solutions for better stakeholder satisfaction. Expertise in data manipulation, statistical analysis, and visualization using tools like dplyr, ggplot2, and Shiny. Created Adhoc reports to users in Tableau by connecting various data sources. Used excel sheet, flat files, CSV files to generated TableauAdhoc reports. Involved in defining the source to target data mappings, business rules, business and data definitions Worked closely with stakeholders and subject matter experts to elicit and gather business data requirements. Used Pandas, NumPy, seaborn, SciPy, Matplotlib in Python for developing various machine-learning algorithms and utilized machine learning algorithms such as linear regression, multivariate regression for data analysis. Work with business analyst groups to ascertain their database reporting needs. Created database using MongoDB, wrote several queries to extract data from database. Wrote scripts in Python for extracting data from HTML file. Worked with Hadoop ecosystem tools including Hive, HDFS, and Spark SQL to process over 1M+ records per batch. Worked with connecting the databases from PostgreSQL to python. Tracked Velocity, Capacity, Burn down Charts, and other metrics during iterations.Created Data flow diagrams. Using R automated a process to extract data and various document types from a website, save the documents to specified file path, and upload documents into an excel template. Performed data analysis and data profiling using SQL on various source systems including Oracle and Teradata. Utilized SSIS ETL toolset to analyze legacy data for data profiling Utilized PowerBI reporting to create, test and validate various visualization, reports (ad-hoc), dashboards, and KPI s. Designed and published visually rich and intuitively interactive PowerBI/Excel workbooks and dashboards for executive decision making. Evaluated legacy processes across banking dashboards and prototyped modern alternatives using Power BI and SQL. Orchestrated big data processing workflows using EMR clusters, leveraging frameworks like Apache Spark and Hadoop For distributed data processing. Developed, Streamlined CRM database and built SQL queries for data analysis of 1 million + records. Generated New Market and Investment Banking reports by using SSRS and increased the efficiency by 50%. Introduced Power BI, designed dashboards for time-based data and improved performance by 40%. Build ETL workflows for automated reporting of Investment Banking data and reduceD the workload by 40% using SSIS. Environment:Tableau, SQL server,NumPy, seaborn, SciPy, Hadoop, Spark, PySpark, Hive, HDFS, Kafka (learning), Matplotlib, Python, SDLC - gathering, Analysis, Design, Development, testing, Agile/ Scrum, Data Warehouse, MongoDB, PostgreSQL, Oracle, Teradata, Shell Scripting, Git, MongoDB, Data Lake, Informatica - Informatica Data Explorer, and Informatica Data Quality, ETL, Data Modelling - Star Schema,SnowFlake Schema, KPI. Company: HSBC, India Nov 2016 - Sep 2018 Role: Business Data Analyst Responsibilities: Generated energy consumption reports using SSRS, which showed the trend over day, month and year. Performed ad-hoc analysis and data extraction to resolve 20% of the critical business issues. Contributed to group insurance reporting initiatives by developing automated dashboards and conducting trend analysis across policy and claims data. Well versed in Agile SCRUM development methodology, used in day-to-day work in application for Building Automation Systems (BAS) development. Weekly, monthly and Quarterly insight reporting utilizing Excel, Tableau and SQL database about pricing trend and opportunities. Streamlining and automating Excel/ Tableau dashboards for improved speed utilization through Python and SQL based solutions. familiarity withCloud-native analytics tools like AWS Quick Sight, Azure Synapse Analytics, and Google Data Studio. Designed creative dashboards, storylines for dataset of a fashion store by using Tableau features. Developed SSIS packages for extract/load/transformation of source data into a DW/BI architecture/OLAP cubes as per the functional/technical design and conforming to the data mapping/transformation rules. Developed data cleaning strategies in Excel (multilayer fuzzy match) and SQL (automated typo detection and correction) to organize alternative datasets daily to produce consistent and high-quality reports. Created views to facilitate easy user interface implementation, and triggers on them to facilitate consistent data entry into the database. Participated in executive scorecard development, tailoring dashboards to reflect KPIs across group insurance portfolios. Performed hands-on UAT coordination, executing test cases, tracking QA feedback, and ensuring all reporting enhancements met stakeholder needs. Used MS Office tools (Word, Excel, PowerPoint) to document, track, and present insights to internal teams and managers for strategy alignment. Involved in Data Analysis and Data Validation by extracting records from multiple databases using SQL in Oracle SQL Developer tool. Understanding SQL-based querying engines for big data platforms like Apache Hive, Impala, and Presto is also becoming important. Identified the data source and defining them to build the data source views. Involved in designing the ETL specification documents like the Mapping document (source to target). Used ETL (SSIS) to develop jobs for extracting, cleaning, transforming and loading data into data warehouse. Created Stored Procedures and executed the stored procedure manually before calling it in the SSIS package creation process. Written SQL test scripts to validate data for different test cases and test scenarios Created SSIS Packages to export and import data from CSV files, Text files and Excel Spreadsheets. Performed data manipulation - inserting, updating, and deleting data from data sets Developed various stored procedures for the data retrieval from the database and generating different types of reports using SQL reporting services (SSRS). Environment: Windows, SDLC-Agile/Scrum, SQL Server, Group Insurance Reporting, SSIS, SSAS, SSRS, ETL, PL/SQL, Tableau, Excel, CSV Files, Text Files, OLAP, Data Warehouse, SQL - join, inner join, outer join, and self-joins. Client:Sagar soft Pvt Limited, India Feb 2013 - Oct 2016 Role: Data Analyst Responsibilities: Evaluated new applications and identified system requirements. Visualized KPI metrics like resource utilization, net profit margin, gross profit margin and burn rate using Tableau. Worked on time series analysis using Pandas to identify patterns on how asset variable changes which in turn helped project completion by 70%. Conducted data extraction, transformation, and loading (ETL) processes using tools like Apache NiFi and Talend to ingest healthcare data from disparate sources. Designed and implemented data models using tools like Erwin and SQL Server Management Studio to ensure efficient storage and retrieval of banking data. Recommended solutions to increase revenue reduce expense; maximize operational efficiency, quality, compliance, etc. Identified business requirements and analytical needs from potential data sources. Performed SQL validation to verify the data extracts integrity and record counts in the database tables Worked with ETL developers for testing, mapping data and aware of data models to translate and migrate data. Created Requirements Traceability Matrix (RTMs) using Rational Requisite Pro to ensure complete requirements coverage with reference to low level design document and test cases. Assisted the Project Manager to develop both high-level and detailed application architecture to meet user requests and business needs. Also, assisted on project expectations and in evaluating the impact of changes on the project plans accordingly and conducted project related presentations and in performing Risk Assessment, Management and Mitigation. Collaborated with different teams to analyze, investigate and diagnose root cause of problems and publish root cause analysis report (RCA). Achieved in using advanced SQL queries and analytic functions for date calculations, cumulative distribution and NTILE calculations. Used advanced Excel formulas and functions like Pivot Tables, Lookup, If with and/index, match for data cleaning. Environment:SQL, ETL, Mapping data,Tableau, NTILE, RCA, RTMs,Pivot Tables, KPI metrics. Keywords: quality analyst artificial intelligence machine learning business intelligence sthree database active directory rlang information technology microsoft procedural language Idaho Texas |