Home

Thirupathi Naidu Pemmasani - Data Science
[email protected]
Location: Remote, Remote, USA
Relocation:
Visa: GC
THIRUPATHI NAIDU PEMMASANI
[email protected]

Professional Summary:
10+ years of hands on experience and comprehensive industry knowledge of Artificial Intelligence/ Machine Learning, Statistical Modelling, Deep Learning, Data Analytics, Data Modelling, Data Analysis, Data Mining, Text Mining & Natural Language Processing (NLP), and Business Intelligence.
Extensive experience applying AI and machine learning to real-world data challenges, including supervised and unsupervised learning, deep learning, and natural language processing (NLP).
Having good experience in Analytics Models like Decision Trees, Linear & Logistic Regression, Hadoop (Hive, PIG), R, Python, Spark, Scala, MS Excel, SQL and PostgreSQL, Erwin.
Strong knowledge in all phases of the SDLC (Software Development Life Cycle) Agile/ Scrum from analysis, design, development, testing, implementation and maintenance.
Strong leadership in the fields of Data Cleansing, Web Scraping, Data Manipulation, Predictive Modeling with R and Python, and Power BI & Tableau for data visualization.
Experienced in Data Modeling techniques employing Data Warehousing concepts like Star/ Snowflake schema and Extended Star.
Hands-on experience in working experience in the entire Data Science project life cycle including data extraction, data cleaning, statistical modeling and data visualization with large data sets of structured and unstructured data, and created ER diagrams and schema.
Excellent knowledge of Artificial Intelligence/ Machine Learning, Mathematical Modeling and Operations Research. Comfortable with R, Python, SAS and Weka, MATLAB, Relational databases. Deep understanding & exposure of Big Data Eco-system.
Expertise in Data Analysis, Data Migration, Data Profiling, Data Cleansing, Transformation, Integration, Data Import/ Export through the use of ETL tools such as Informatica Power Center.
Experience working in AWS environment using S3, Athena, Lambda, AWS Sage maker, AWS Lex, AWS Aurora, Quick Sight, Cloud formation, Cloud Watch, IAM, Glacier, EC2, EMR, Recognition and API Gateway.
Proficient in Artificial Intelligence/ Machine Learning, Data/ Text Mining, Statistical Analysis & Predictive Modeling.
Good Knowledge and experience in deep learning algorithms such as Artificial Neural network (ANN), Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN), LSTM and RNN based speech recognition using Tensor Flow.
Expertise in using AWS S3 to stage data and to support data transfer and data archival. Experience in using
Excellent knowledge and experience in OLTP/ OLAP System Study with focus on Oracle Hyperion Suite of technology, developing Database Schemas like Star schema and Snowflake schema (Fact Tables, Dimension Tables) used in relational, dimensional and multidimensional modeling, physical and logical Data Modeling using Erwin tool.
Created functions and assigned roles in AWS Lambda to run python scripts, and AWS Lambda using Python to perform event driven processing.
Proficient in Python Scripting and worked in stats function with NumPy, visualization using Matplotlib and Pandas for organizing data.
High level of understanding in concepts of Deep learning using CNN, RNN, ANN, Reinforcement learning, Transfer Learning and performing data augmentation using Generative Adversarial Networks (GANs).
High level analytical thinking by extensively leveraging statistical techniques, such as T-test, P-value analysis, z-score analysis, ANOVA, Confidence Interval, Confusion Matrix, Precision, Recall, ROC / AUC curve analysis, etc.
Good Knowledge on Natural Language Processing (NLP) and Time Series Analysis and Forecasting using ARIMA model in Python and R.
Very good experience and knowledge in provisioning virtual clusters under AWS cloud which includes services like EC2, S3, and EMR.
Strong programming experience in Matlab and Python visioning libraries
Experienced in the use of active dashboards and reports that are both visually appealing and functional, such as Python Matplotlib, R Shiny, Power BI, and Tableau.
Proficiency with creating, publishing, and customizing Tableau dashboards and dynamic reports with user filters.
Experience in Dimensional Modeling, ER Modeling, Star Schema/ Snowflake Schema, FACT and Dimensional tables and Operational Data Store (ODS).
Extensive knowledge in working with Amazon EC2 to provide a solution for computing, query processing, and storage across a wide range of applications.
Proficiency in Python, a strong program development capability, with experience in image processing algorithm
Proficient in data visualization tools such as Tableau, Power BI, Python Matplotlib, R Shiny to create visually powerful and actionable interactive reports and dashboards.
Experience in building models with deep learning frameworks like TensorFlow, PyTorch, and Keras
Expertise in building, publishing customized interactive reports and dashboards with customized parameters and user-filters using Tableau.
Familiar on Deep learning projects for image identification CNN, RNN for stock price prediction auto encoders for Movie Recommender System (PyTorch), Image captioning (CNN-RNN auto encoder architecture).
Proficient in Python, experience building, and product ionizing end-to-end systems.
Strong programming expertise (preferably in Python) and strong in Database SQL.
Solid coding and engineering skills preferably in Artificial Intelligence/ Machine Learning.
Exposure to python and python packages.
Experience in developing various solution driven views and dashboards by developing different chart types including Pie Charts, Bar Charts, Tree Maps, Circle Views, Line Charts, Area Charts, and Scatter Plots in Power BI.
Be a valued contributor in shaping the future of our products and services.
Successfully working in fast-paced multitasking environment both independently and in a collaborative team.
Excellent communication skills needed for swift implementation of data science and data analytic projects.

Technical Skills:
Languages SQL, Python, JAVA, JavaScript, jQuery, ReactJS, Next.js, HTML, CSS, C, C++, Angular, R, Impala, Hive
Statistical Methods Hypothetical Testing, ANOVA, Time Series, Confidence Intervals, Bayes Law, Principal Component Analysis (PCA), Dimensionality Reduction, Cross-Validation, Auto-correlation
Artificial Intelligence/ Machine Learning Regression analysis, Bayesian Method, Decision Tree, Random Forests, Support Vector Machine, Neural Network, Sentiment Analysis, K-Means Clustering, KNN and Ensemble Method
R Package dplyr, sqldf, data table, Random Forest, gbm, caret
Big Data Hadoop, Spark
Python Packages NumPy, Scipy, Pandas, Matplotlib, Seaborn, scikit-learn, Requests, urllib3, NLTK, Pillow, Pytest
Deep Learning CNN, RNN, ANN, Reinforcement learning, Transfer Learning, TensorFlow, PyTorch, Keras
Python Framework Django, Flask
Methodologies SDLC Agile/ Scrum, TDD, BDD
Databases SQL, MYSQL, MongoDB, Oracle
Cloud AWS - EMR, EC2, ENS, RDS, S3, Athena, Glue, Elastic search, Lambda, SQS, DynamoDB
BI/ Analysis Tools SAS, Stata, Tableau, Power BI, Docker, Git, SAP, MS Office Suite, Anaconda, SSIS
Data Modelling Snowflake, Star Schema
Reporting Tools Tableau, Power BI
Operating Systems Windows, Linux

Education:
Bachelors of Computer Science from India.



Professional Experience:

Client: American Airline, July 2023 Till Date
Role: Data Scientist
Responsibilities:
Involved in requirement analysis, application development, application migration, and maintenance using Software Development Lifecycle (SDLC) and Python technologies.
Extensive experience applying AI and machine learning to real-world data challenges, including supervised and unsupervised learning, deep learning, and natural language processing (NLP).
Participated in all phases of data acquisition, data cleaning, developing models, validation, and visualization to deliver data science solutions.
Hands-on experience with advanced algorithms such as neural networks (CNN, RNN, LSTM), ensemble models (XGBoost, Random Forest), and statistical techniques for predictive modeling.
Developed AI-powered solutions for fraud detection, speech recognition, image analysis, and predictive analytics using tools like AWS SageMaker, Lex, and Lambda.
Proficient in using popular AI frameworks including TensorFlow, Keras, PyTorch, and Scikit-learn to build and deploy end-to-end machine learning solutions.
Performing data parsing, data ingestion, data manipulation, data modelling and data preparation with methods including describing data contents, computing descriptive statistics of data such as regex, split and combine, remap, merge, subset, reindex, melt, and reshape.
Built Support Vector Machine algorithms for detecting the fraud and dishonest behaviors of customers by using several packages: Scikit-learn, Numpy, Scipy, Matplotlib, and Pandas in Python.
Used AWS S3, Dynamo DB, and AWS Lambda, AWS EC2 for data storage and models& deployment. Worked extensively on AWS services like Sage Maker, Lambda, Lex, EMR, S3, Redshift etc.
Used AWS transcribe to obtain call transcripts, perform text processing (cleaning, tokenization, and lemmatization)
Extensive experience working with healthcare data, including electronic health records (EHR), clinical trial datasets, and epidemiological models.
Designed and implemented predictive models for patient recovery time, adverse event detection, and medication adherence, supporting clinical decision-making.
Familiar with life sciences data standards and taxonomies such as MedDRA, ICD-10, and SNOMED for drug safety and medical coding.
Collaborated on AI-driven projects for pharmaceutical and public health organizations, including work on pharmacovigilance and disease prediction.
Participated in features engineering such as feature intersection generating, feature normalize and label encoding with Scikit-learn pre-processing.
Designed the data marts in dimensional data modeling using Snowflake schemas.
Generated executive summary dashboards to display performance monitoring measures with Power BI.
Developed and implemented predictive models using Artificial Intelligence/ Machine Learning algorithms such as linear regression, classification, multivariate regression, Naive Bayes, RandomForest, K-means clustering, KNN, PCA and regularization for Data Analysis.
Leverage AWS Sage Maker to build, train, tune and deploy state of art Artificial Intelligence/ Machine Learning and Deep Learning models.
Built classification models include: Logistic Regression, SVM, Decision Tree, and Random Forest.
Used Pandas API to put the data as time series and tabular format for east timestamp data manipulation and retrieval.
Worked with creating ETL specification documents, & creating flowcharts, process workflows and data flow diagrams.
Designed both 3NF data models for OLTP systems and dimensional data models using star and snowflake Schemas.
Worked on the Snow-flaking the Dimensions to remove redundancy.
Created reports utilizing Excel services and Power BI.
Applying Deep Learning (RNN) to find the Optimum route for guiding the tree trim crew.
Using XGBOOST algorithm predicting storm under different weather conditions and using Deep Learning analyzing the severity of after storm effects on the Power lines and Circuits.
Worked with Snowflake SaaS for cost effective data warehouse implementation on cloud.
Developed Data Mapping, Transformation and Cleansing rules for the Master Data Management Architecture involved OLTP, ODS and OLAP.
Produced A/B test readouts to drive launch decisions for search algorithms including query refinement, topic modeling, and signal boosting and machine-learned weights for ranking signals.
Implemented an Image Recognition (CNN + SVM) anomaly detector and convolutional neural nets (CNN/ Image Recognition) to determine fraud purchase direction.
Designed and developed Power BI graphical and visualization solutions with business requirement documents and plans for creating interactive dashboards

Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, and AWS Lambda, AWS EC2, Sage Maker, Lex, EMR, Redshift, Snowflake, RNN, Machine Learning, Deep Learning, OLAP, ODS, OLTP, 3NF, Naive Bayes, RandomForest, K-means clustering, KNN, PCA, Power BI.


Client: State of AR

April 2020 to June 2023
Role: Data Scientist
Responsibilities:
Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch. Performed data imputation using Scikit-learn package in Python.
Built several predictive models using machine learning algorithms such as Logistic Regression, Linear Regression, Lasso Regression, K-Means, Decision Tree, Random Forest, Na ve Bayes, Social Network Analysis, Cluster Analysis, and Neural Networks, XGboost, KNN and SVM.
Building detection and classification models using Python, TensorFlow, Keras, and scikit-learn.
Used Amazon Web Services, AWS provisioning and good knowledge of AWS services like EC2, S3, Red shift, Glacier, Bamboo, API Gateway, ELB (Load Balancers), RDS, SNS, SWF and EBS.
Designed and developed Machine Learning technique (Classification, Regression, Clustering, Ensemble Learning, Neural Networks, and Predictions.)
Generated ad-hoc reports in Excel Power Pivot and shared them using Power BI to the decision makers for strategic planning.
Developed regressions models to predict the time of recovery of a patient diagnosed with a disease based on the previous disease report data using R Programming.
Implemented and tested the model on AWS EC2 and collaborated with development team to get the best algorithms and parameters.
Built multi-layers Neural Networks to implement Deep Learning by using Tensor flow and Keras.
Developed the required data warehouse model using Snowflake schema for the generalized model
Worked on processing the collected data using Python Pandas and Numpy packages for statistical analysis.
Used Cognitive Science in Artificial Intelligence/ Machine Learning for Neuro feedback training which is essential for intentional control of brain rhythms.
Worked on data cleaning and ensured Data Quality, consistency, integrity using Pandas, and Numpy.
Developed Star and Snowflake schemas based dimensional model to develop the data warehouse.
Used Numpy, Scipy, Pandas, NLTK (Natural Language Processing Toolkit), and Matplotlib to build models.
Involving in Text Analytics, generating data visualizations using Python and creating dashboards using tools like Power BI.
Performed Na ve Bayes, KNN, Logistic Regression, RandomForest, SVM and XGboost to identify whether a design will default or not.
Managed database design and implemented a comprehensive Snow Flake Schema with shared dimensions.
Application of various Artificial Intelligence (AI)/ machine learning algorithms and statistical modeling like decision trees, text analytics, natural language processing (NLP), supervised and unsupervised, regression models.
Implemented Ensemble of Ridge, Lasso Regression and XGboost to predict the potential loan default loss.
Performed data cleaning and feature selection using MLlib package in PySpark and working with deep learning frameworks.
Involved in scheduling refresh of Power BI reports, hourly and on-demand.

Environment: SDLC, Python, Scikit-learn, Numpy, Scipy, Matplotlib, Pandas, AWS S3, Dynamo DB, AWS Lambda, AWS EC2, Sage Maker, NLTK, Lex, EMR, Redshift, Machine Learning, Deep Learning, Snowflake, OLAP, OLTP, Naive Bayes, RandomForest, K-means clustering, KNN, PCA, PySpark, XGBoost, Tensor flow, Keras, Power BI.

Client: BNYM, United States. June 2017 to Mar 2020
Role: Data Analyst
Responsibilities:
Facilitated agile team ceremonies including Daily Stand-up, Backlog Grooming, Sprint Review, Sprint Planning etc.
Collaborated with data engineers and operation team to implement ETL process, wrote and optimized SQL queries to perform data extraction to fit the analytical requirements.
Involved in building database Model, APIs and Views utilizing Python, in order to build an interactive web based solution.
Performed univariate and multivariate analysis on the data to identify any underlying pattern in the data and associations between the variables.
Used Pandas, Numpy, Seaborn, Matplotlib, Scikit-Learn, Scipy, and NLTK in Python for developing various Artificial Intelligence/ Machine Learning algorithms like XGBOOST.
Performed data cleaning in Python and R and visualized data and findings using Tableau.
Developed dashboards and visual KPI reports using Power BI, highlighting keyword trends, clicks, and click-through rates, impressions by device, month, and states for clients leading to an increase of 30% in client reach.
Designed and Implemented Data Cleansing Process and statistical analysis with Python.
Built and Developed an End-to End Data Engineering pipeline with automated data ingestion using Snowflake and AWS (S3, and RDS).
Analyzed technical and economic feasibility of clients performing requirement gathering to optimize and reduce project expenses by up to 60%.
Performed data imputation using Scikit-learn package in Python.
Ensure that the model has low False Positive Rate and Text classification and sentiment analysis for unstructured and semi-structured data.
Curated SEO-optimized solutions for business enterprises to boost sales and internet presence by 70%.
Worked with data analytics team to develop time series and optimizations.
Developed an end-to-end multilingual E-Learning Management System (SCORM compliant) based on Articulate 360 and Redwood Web Authoring Tools.
Created Logical and Physical data models with Star and Snowflake schema techniques using Erwin in Data warehouse as well as in Data Mart.
Utilized Power Query in Power BI to Pivot and Un-pivot the data model for data cleansing and data massaging.
Performed ad-hoc requests for clients using SQL queries to extract and format requested information.
Involved in Data Analysis, Data Validation, Data Cleansing, Data Verification and identifying data mismatch.
Designed data model, analyzed data for online transactional processing (OLTP) and Online Analytical Processing (OLAP) systems.
Worked Normalization and De-normalization concepts and design methodologies.
Wrote and executed customized SQL code for ad-hoc reporting duties and other tools for routine.
Created customized reports in PowerBI for data visualization.

Environment: ER/ Studio, SQL, Python, APIs, OLAP, OLTP, PL/ SQL, Oracle, Teradata, Power BI, ETL, SQL, Redshift, Pandas, Numpy, Seaborn, Matplotlib, Scikit-Learn, Scipy, NLTK, Python, XGBOOST, Tableau, Power Query, Snowflake, AWS, S3, RDS, Erwin.

Client: Nordstrom Corporation Seattle, WA. Oct 2015 to May 2017
Role: Data Analyst
Responsibilities:
Performed Data Analysis, Data Migration, and Data Preparation useful for Customer Segmentation and Profiling.
Implementing investigation calculations in Python. Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit learn, and NLTK in Python
Implementing Data Warehousing and Data Modelling procedures to build ETL pipelines to extract and transform data across multiple sources.
Architected scalable algorithms using Python programming and capable of performing Data Mining, Predictive Modelling using all kinds of statistical algorithms as required.
Utilize ETL tooling to build, template, and rapidly deploy new pipelines for gathering and cleaning data
Developed Multivariate data validation scripts in Python for equity, derivate, currency and commodity related data, thereby improving efficiency of pipeline by 17%.
Used Predictive Analysis to develop and design of sample methodologies and analyzed data for pricing of client's products.
Involved in optimizing the ETL process of Alteryx to Snowflake.
Used Data visualization tools Such as Tableau, Advanced MS Excel (macros, index, conditional list, arrays, pivots, and lookups), Alteryx Designer, and Modeler.
Used Data Analytics, Data Automation and coordinated with custom representation instruments utilizing Python, Mahout, Hadoop and MongoDB.
Performed all necessary day-to-day GIT support for different projects, Responsible for design and maintenance of the GIT Repositories, and the access control strategies.
Fostered teamwork, communication, and collaboration while managing competing priorities of weekly, bi-weekly, monthly and quarterly priorities.
Worked extensively on ER/ Studio in several projects in both OLAP and OLTP applications.
Generated ad-hoc SQL queries using joins, database connections and transformation rules to fetch data from legacy Oracle and SQL Server database systems.
Analyzed business requirements and upgraded function specification while conducting testing on multiple versions and resolving critical bugs to improve the functionality of the Learning Management System.
Built and Deployed a UI/ UX e-learning web application using jQuery, JavaScript, HTML, and NodeJS for various courses.
Cleaned and transformed the data using Python, developed dashboards and visual KPI reports using Tableau.
Involved in publishing of various kinds of live, interactive data visualizations, dashboards, reports and workbooks from Tableau Desktop to Tableau servers.

Environment: Python, Pandas, Numpy, Seaborn, Scipy, Matplotlib, Scikit Learn, NLTK, ETL, Alteryx, Snowflake, Tableau, jQuery, JavaScript, HTML, NodeJS, Hadoop, MongoDB, OLTP, OLAP, ER Studio, Oracle, SQL Server, SQL, Tableau Server.

Client: Nabla Infotech Pvt. Ltd., Pune India. June 2013 Sept 2015
Role: ETL Developer.
Responsibilities:
Developed Mappings as per the technical specification approved by Client.
Extracted data from various sources like Oracle, flat files, XML files, Sybase, SAP, and SQL Server.
Worked in various types of transformation like Lookup, Update Strategy, Stored Procedure, Joiner, Filter, Aggregator, Rank, Router and Expression.
Created Mapplets to reduce the development time and complexity of Mappings and better maintenance.
Extensively worked on Informatica Designer Components Source Analyzer, Warehouse Designer, Mapping Designer, Transformation Developer, and Workflow Manager.
Developed Complex mappings by extensively using Informatica Transformations.
Implemented the SCD Type2 to keep track of historical data.
Monitored sessions using the workflow monitor, which were scheduled, running, completed, or failed. Debugged mappings for failed sessions. Conducted Unit Testing.
Involved in Scheduling the Informatica sessions using the workflow manager to automate the loading process, import and export data from Production to development box for testing.
Developed Shell scripts through Putty for scheduling and automating the job flow. Transferred files through FileZilla.
Optimizing/Tuning mappings for better performance and efficiency.

Environment: Informatica Power Center 10.5, SQL Developer, MS SQL Server, Flat Files, XML files Oracle 10g, DB2, SQL, PL/SQL, Unix/Linux, Putty, FileZilla.
Keywords: cprogramm cplusplus artificial intelligence user interface user experience javascript business intelligence sthree database active directory rlang microsoft procedural language Arkansas Delaware Washington

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5688
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: