Home

Koteswara Singothu - Aws cloud data engineer
[email protected]
Location: Dallas, Texas, USA
Relocation: No
Visa: H1B
Resume file: Singothu_Cloud_DataEngineer_Profile_1744141019917.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.
Koteswara Singothu
Mobile: +1-469-933-9876 Email/skype: [email protected]
LinkedIn Profile: https://www.linkedin.com/in/koteswara-rao-singothu-19b32571/

Professional Summary
Having 15+ years of experience in IT which includes Analysis, Design, Development, implementation, and support of business applications in multiple domains(Banking, Retail and Insurance) using different technologies like Big Data, Cloud AWS, DevOps, ETL tools, SQL, NoSQL, Python, Scala, and UNIX.
Having 8+ years of work experience on Hadoop Echo system Tools with hands on experience in installing, configuring, and using ecosystem components like HDFS, Map Reduce, YARN, Pig, Hive, HBase, Oozie, Impala, TEZ, Sqoop, Kafka, HUE, Nifi and Databricks, Spark on both Hortonworks and Cloudera clusters.
Having 3+ years of experience in AWS Cloud solution development using S3, Lambda, EC2, EMR, IAM, Athena, Redshift, Glue, Airflow, AWS Data Pipeline, and CloudFormation.
Having 1+ years of experience in Snowflake Multi - cluster Warehouse
Having 1+ years of experience in CDP7.X after collaboration of Cloudera and Hortonworks Hadoop platform.
Hands on experience on created data pipeline by using PySpark and AWS tools like IAM, LAMBDA, Glue, SNS, CloudWatch and Redshift
Hands on experience on Unified Data Analytics with Databricks, Databricks Workspace User Interface, Managing Databricks Notebooks, Delta Lake with Python, Delta Lake with Spark SQL.
Experience in developing Spark applications using Spark-SQL in Databricks for data extraction, transformation and aggregation from multifile file formats for Analyzing and transforming the data to uncover insights into the cluster usage patterns.
Extensive experience in working on Analyze, Build, Test and Implementation of ETL functionality using IBM Data Stage, Informatica and SSIS and visualize the data by using BI Report tools like Tableau, OBIEE, ThoughtSpot and MicroStrategy by using Databases like Oracle, Teradata, SQL Server, MariaDB, Greenplum and MongoDB.
Experience in Spark Core, Spark SQL, and Spark Streaming by using Python and Scala program to create the batch, Realtime data pipeline to move the heterogenous data into Hadoop Lake or AWS cloud.
Strong experience in extracting and loading data using complex business logic s using Spark, Sqoop, PIG from different data sources and built the ETL pipelines to process tera bytes of data daily.
Experience in migrating the workflow jobs written in Scalding, a MapReduce-based framework to Spark jobs
Worked and designed performance tuning of Spark applications for adjusting Batch Interval time, level of parallelism for spark jobs and memory tuning to get an optimized throughput for a longer running applications to mitigate expenses caused by inefficient use of resources.
Monitoring Spark containerized applications which are running on Jenkins using IBM monitoring Grafana.
Worked with Spark on an EMR cluster along with other Hadoop applications, and it can also leverage the EMR file system (EMRFS) to directly access data in Amazon S3, Amazon Glue.
Having experience in Version Control tools like GIT, Bitbucket and DevOps tools like Jenkins and Bamboo and scheduling tools like Crontab, Autosys, Tidal and Control-M
Experience in writing and running shell scripts with SQOOP, HQL, Spark-shell and spark-submit tasks along with file validations and archive data.
Knowledge on various file formats such as Avro, Parquet, ORC, etc. and on various compression codecs such as GZip, Snappy.
Strong understanding of Software Development Lifecycle (SDLC), and Agile methodology).
Worked in projects based on Agile-Scrum Methods.
Have good experience, excellent communication and interpersonal skills which contribute to timely completion of project deliverable well ahead of schedule.
Educational:
B-Tech (Computer Science Engineering), Sep 2002- July 2006 Andhra University, India

Professional Experience:
Currently working as a Senior Hadoop Data Engineer since Aug 2019.
Worked as a Cloud Hadoop Data Engineer in Mphasis Limited from April 2019 to Aug 2019
Worked as a Cloud Hadoop Data Engineer in Net Logic Solutions from April 2018 to April 2019
Worked as a Technology Lead in Infosys Pvt Ltd from July 2013 to April 2018.
Worked as an Associate in Cognizant Technology Solution from July 2009 to Aug 2013.
Worked as a Programmer Analyst in IP Creations Pvt Ltd from May 2008 to July 2009.


Certifications:
CCA Spark and Hadoop Developer: http://certification.cloudera.com/verify - 100-024-306
AWS Certified Solutions Architect Associate: http://aws.amazon.com/verification --4S0EM7RL11FQQDWZ
AWS Certified Machine Learning Specialty: http://aws.amazon.com/verification --WE9KBJ4BBJB4QQGF
Coursera Machine Learning: https://coursera.org/share/5c26059eb23ff105d12d2ffb181e987a
Data Science with Math skills: https://coursera.org/share/8bd20c7f422ee69943a114e49f3c8040


Technical Skills:
Hadoop Platform Cloudera(CDH), Hortonworks(HDP), CDP
Hadoop Tools HDFS, Hive, PIG, SQOOP, HBase, Oozie, Tez, Spark, Impala, Kafka, YARN, Zookeeper, Flume, Nifi
AWS Tools IAM,S3,LAMDA,ATHENA ,Redshift,Glue,Glacier,CloudFormation, CloudWatch
Programming Python, Scala, Unix, Java, YAML, BTEQ, PL/SQL
ETL Tools IBM Data Stage, Informatica, SSIS
Reporting Tools Tableau, OBIEE, MicroStrategy, ThoughtSpot, Alteryx
CICD/Version Tools GIT, Bitbucket, Bamboo, Jenkins
Databases Teradata, Oracle, SQL Server, Greenplum, Maria DB
No SQL Database HBase, MongoDB
Scheduling Tools Control-M, Autosys, Crontab
Domains Finance, Retail, Insurance
Agile Tools JIRA, Remedy, ServiceNow
Developer Tools Win SCP, Putty, SQL Server 2014 Management Studio, Developer, Teradata Developer Studio, Greenplum developer tool
Operating Systems Windows, LINUX, UNIX






Work Experience:
Bank of America, Addison, TX Oct 2021 to Till Date
Project: Client Writing & Client Underwriting Role: Cloud Data Engineer

Responsibilities:
Work with the business users, understand their business needs, analyze, and determine the feasibility of design within time and cost constraints. Evaluating various big data technology solutions to identify best fit for the current requirement.
Analyze the existing system and connect with the infrastructure team to provide the Hadoop, AWS platforms along with AD groups, Databases in different environments like DEV, UAT and PROD.
Designed data pipeline by using PySpark and AWS tools like IAM, LAMBDA, Glue, SNS, CloudWatch and Redshift.
Developing Spark applications using Spark-SQL in Databricks for data extraction, transformation and aggregation from multifile file formats for Analyzing and transforming the data to uncover insights into the cluster usage patterns.
Designing, Building, installing, configuring, and supporting the Hadoop Cloudera and configure the same with Bit Bucket and Jenkins for CICD process.
Migrating the CDH6.X environment to CDP7.X for all HDFS files, Hive queries and Spark jobs.
Created architecture framework to process data from different sources to HDFS, HIVE, HBase and processed aggregated data to data mart for reports and predictive analytics using SQOOP, Kafka, Spark Core, Spark SQL, Scala, Hive, HBase, HUE and Impala.
Created end-to-end data pipeline in cloud AWS by using AWS tools like S3, Lambda, Athena, AWS Glue, Redshift
Created and stored the data into S3 buckets and designed Lambda function to trigger the Lambda script when the data loaded into S3 buckets
Used Athena for analyzing the data in S3 buckets as well as AWS clue data catalog
Created and scheduled the Glue jobs by creating the data catalog, crawl and applying the transformations on them for end-to-end data flow.
Created tables and views in Redshift database and applied the optimization techniques in Redshift tables.
Written complex UDFs using Python to process business logics and leverage UDFs in PySpark for computing large amounts of data.
Push the code into Bitbucket once the development, unit testing and review completed and merge, build the repositories by using Jenkins to push the code into UAT
Assisted the team with several coordination activities, actively participated in the daily stand ups, Planning, Reviews and Retrospectives in the two-week Iterations. And guided the team with several demos and knowledge sharing work sessions.
Environment: Hadoop Cloudera(CDH5.X,CDH6.X & CDP7X),Databricks, Snowflake, HDFS, SQOOP, Spark, Spark SQL, S3, Glue, Athena, Redshift, Python, Hive, HUE, Impala, JIRA, SSIS, SQL Server, Oracle, Linux, GIT, Bit Bucket, Jenkins

Wells Fargo, Dallas, TX Mar 2021 to Oct 2021
Project: RFDS Role: Big Data Engineer

Responsibilities:
Work with the business users, understand their business needs, analyze, and determine the feasibility of design within time and cost constraints.
Analyzed the source system and raised the request for creating the AD groups if the source system is not existing.
Raised the request for service id to access the AD groups if the AD groups are created newly.
Sent mail to devOps team to create the git repo if the repo is new
Checked git repo whether the repo has Flowmaster framework or not.
Check all env folders are matched with Upstream and downstream connections
Analysed the mapping sheet and create the json files which are input to the flowmaster framework.
Analyzed the workflows are powered by offline batch jobs written in Scalding, a MapReduce-based framework
Developed the Spark jobs by enhance scalability and performance, migrating the scalding and mapReduce jobs.
Created hive external tables to load the data into HDFS files
Created MongoDB collections and views
Tested for both History and Go forward loads
Worked and designed performance tuning of Spark applications for adjusting Batch Interval time, level of parallelism for spark jobs and memory tuning to get an optimized throughput for a longer running applications to mitigate expenses caused by inefficient use of resources.
Created auto generated HQL scripts for Oracle to Hive column mapping, HIVE DDL scripts
Push the code into git once the development, unit testing and review completed and merge, build the repositories by using Jenkins to push the code into UAT
Created the implementation document for PROD release and same review with Support team
Communicated with Support team and DevOps team on PROD release day to deploy the code into PROD.
Run the post validation scripts and same confirmed by the business team.
Environment: Hadoop MapR, HDFS, Flowmaster Framework, Spark Core, Spark SQL, Scala, Hive, HBase, HUE, JIRA, SQL Server, Oracle, Linux, GIT, Jenkins

Bank of America, Addison, TX Aug 2019 to Mar 2021
Project: Client Writing & Client Underwriting Role: Big Data Engineer

Responsibilities:
Work with the business users, understand their business needs, analyze, and determine the feasibility of design within time and cost constraints. Evaluating various big data technology solutions to identify best fit for the current requirement.
Analyze the existing system and connect with the Infrastructure team to provide the Hadoop, AWS platforms along with AD groups, Databases in different environments like DEV, UAT and PROD.
Designing, Building, installing, configuring, and supporting the Hadoop Cloudera and configure the same with Bit Bucket and Jenkins for CICD process.
Created architecture framework to process data from different sources to HDFS, HIVE, HBase and processed aggregated data to data mart for reports and predictive analytics using SQOOP, Kafka, Spark Core, Spark SQL, Scala, Hive, HBase, HUE and Impala.
Created streams using Spark framework and processed real time data into RDDs and created analytics using SPARK SQL and Data frame API s by using Python program to load data into HDFS.
Analyzed existing ETL SSIS packages and prepared high and low-level design to create and migrate data into data lake by using Spark Core and Spark SQL with Python programming language.
Written complex UDFs using Python to process business logics and leverage UDFs in pySpark for computing large amount of data.
Worked and designed performance tuning of Spark applications for adjusting Batch Interval time, level of parallelism for spark jobs and memory tuning to get an optimized throughput for a longer running applications to mitigate expenses caused by inefficient use of resources.
Created auto generated HQL scripts for Oracle to Hive column mapping, HIVE DDL scripts which are equal to Oracle tables, Custom scripts and DML scripts.
Created views to populate the updated data to end business users.
Pushed the Hive views to Impala by using invalidate or refresh them
Push the code into Bitbucket once the development, unit testing and review completed and merge, build the repositories by using Jenkins to push the code into UAT
Assisted the team with several coordination activities, actively participated in the daily stand ups, Planning, Reviews and Retrospectives in the two-week Iterations. And guided the team with several demos and knowledge sharing work sessions.
Environment: Hadoop Cloudera, HDFS, SQOOP, Spark Core, Spark SQL, Python, Hive, HBase, HUE, Impala, JIRA, Rally, SSIS, SQL Server, Oracle, Linux, GIT, Bit Bucket, Jenkins

Mphasis, USA APR 2019 to AUG 2019
Project: Historical Access Archive Role: Cloud Big Data Engineer

Responsibilities:
Work with the business users, understand their business needs, analyze, and determine the feasibility of design within time and cost constraints. Evaluating various big data technology solutions to identify best fit for the current requirement.
Designed the pySpark program to read the data from different sources and applying complex transformations and then, load the data into AWS S3.
Applied the conditions to archive the S3 data into Glacier
Developed the LAMBDA and Glue ETL jobs to load the S3 data into Redshift
Created the CloudFormation temple to create the Infrastructure to AWS Data Pipeline.
Created monitors, alarms, notifications and logs for Lambda functions, Glue Jobs, EC2 hosts using CloudWatch.
Created tables, views and copy the data from S3 in Redshift tables.
Created the stack template by using CloudFormation tool
Created AWS Data Pipelines by using IAM, EC2, EMR,S3, Glacier, LAMBDA, GLUE, ATHENA, Redshift in AWS tools.
Scheduled the Hadoop jobs in Autosys and AWS jobs in Airflow
Environment: Spark Core, Spark SQL, Python, Hive, HUE, IAM, EC2, EMR, S3, Glacier, LAMBDA, GLUE, ATHENA, Redshift, CloudFormation, Data Pipeline, JIRA, Oracle, Linux, Windows









Anthem, Mason, OH APR 2018 to APR 2019
Project: RX Finance Role: Cloud Big Data Engineer

Responsibilities:
Providing directions on solutions, recommend options, drive things from design to development in big data/Hadoop.
Process the claims data by applying the transformations in pyspark and provide the fixed length file to the end user.
Created the pySpark programs to read the fixed length files and Oracle tables data by applying complex transformations on data frames and load the data into HDFS fixed length file for downstream process.
Designed and implemented UDF's using Python for evaluation, filtering, loading, and storing of data in Spark data frames.
Worked and designed performance tuning of Spark applications for adjusting Batch Interval time, level of parallelism for spark jobs and memory tuning to get an optimized throughput for a longer running applications to mitigate expenses caused by inefficient use of resources.
Created Hive tables to capture the missing and invalid data from the source files.
Push the code into Bitbucket and created the snapshot in the Bamboo.
Load the on-premises data into AWS S3 by using Spark program.
Run the Spark jobs by configure the EMR cluster
Created the Data Pipeline to move the data landing S3 bucket to perm S3 bucket by using AWS tools like LAMBDA, Glue, ATHENA, Redshift
Scheduled and Monitored the jobs in Control-M.
Environment: Hadoop Cloudera, Spark, Python, HDFS, Hive, HUE, Impala, IAM, EC2, EMR, S3, LAMBDA, ATHENA, CloudFormation, JIRA, Oracle, Linux, GIT, Bit Bucket, Bamboo, UNIX, Windows

Infosys Limited, USA FEB 2017 to APR 2018
Project: Global Data Portal Role: Technology Lead

Responsibilities:
Providing directions on solutions, recommend options, drive things from design to development in Big data/Hadoop.
Designed the Spark Streaming application by using Scala Program for Producers and Consumers to push and pull the data from Kafka Topics
Building data applications and automating the pipelines in Spark for bulk loads as well as Incremental Loads of various Datasets.
Inserted the tables (raw data) into Hive for querying the large data systems and for result storage.
Worked and designed performance tuning of Spark applications for adjusting Batch Interval time, level of parallelism for spark jobs and memory tuning to get an optimized throughput for a longer running applications to mitigate expenses caused by inefficient use of resources.
Supervise complex data workflows based on data integration between Kafka, Apache Spark, HBase, Hive and other similar systems.
Captured the data from Node.js jobs by using REST API for streaming data from Producers in KAFKA
Reporting the spark job stats, monitoring, and running data quality checks are made available for each Datasets.
Designed and developed scalable data pipelines using Apache Spark and Scala to process terabytes of data.
Monitoring Spark containerized applications which are running on Jenkins using IBM monitoring Grafana.
Improve the performance aspects of the spark applications by monitoring the CPU and memory utilization of the Jenkins over Grafana.
Experienced in managing applications by executing and building them on the Maven Build tool and IDEs Eclipse and IntelliJ with a lot of supported plugins.
Helped DevOps Engineers for deploying code and debug issues.
Environment: Hadoop Hartonworks, HDFS, Hive, Spark Core, Spark SQL, Spark Streaming, Kafka, YARN, GIT, Bit Bucket, Jenkins, Bitbucket, Grafana, MongoDB, JIRA, UNIX, Windows

Infosys Limited, Hyderabad, IND JAN 2015 to JAN 2017
Project: Data Caf Tech Foundation Client: Walmart, Bentonville, AR
Role: Technology Lead

Responsibilities:
Interacted with the business users for understanding the business requirements and the existing system and conducted the gap analysis to identify the feasibility of the existing system to accommodate the new integrations.
Migrate Java code to Python script to standardize the database in future purpose and migrated bash scripts to point to SQL Server instead of Greenplum Database.
Moved all SAP and NONSAP data flat files generated from various retailers to HDFS for further processing for historical load and daily loads.
Written the Apache PIG scripts to process the HDFS data based on required output HDFS data.
Created Hive tables to store the processed results in a tabular format.
Developed the SQOOP scripts to make the interaction between Pig and Teradata, SQL Server Database.
Created Hive tables to store the processed results in a tabular format.
Developed the SQOOP scripts to make the interaction between Teradata, SQL Server Database.
Created HBase tables to store the transactional data.
Developed the UNIX shell scripts for creating the reports from Hive data.
Setting up corn job to delete Hadoop logs/local old job files/cluster temp files.
Scheduled these jobs in Crontab and Atomic scheduler to run the jobs daily or Weekly or Monthly.
Environment: Hadoop Hartonworks, HDFS, Hive, HBase, SQOOP, UNIX, BTEQ, TPT, YAML, Teradata, SQL Server, Greenplum, Crontab, Autosys, GIT, Bit Bucket, Jenkins, JIRA, UNIX, Windows

Infosys Limited, Hyderabad, IND SEP 2013 to JAN 2015
Project: Store Support & Enhancements Client: Starbucks, Seattle, WA, USA
Role: Data Engineer

Responsibilities:
Extensively involved in preparation of low-level design documents and unit test case preparation.
Design Data Stage jobs for the business requirements using ETL specifications extracting data from heterogeneous source systems, transform and load into the target.
Used Data Stage Designer and Data Stage Director for creating and implementing jobs, schedule running the jobs.
Designed and coded the ETL logic using Data Stage to enable initial load and incremental processing from oracle and SQL.
Create script for import and export the data by using Teradata utilities with BTEQ, FastLoad, MultiLoad and FastExport.
Schedule the designed jobs in Control-M scheduler.
Developed multiple tab reports in Tableau Desktop connecting to live or extract based on Historical data or Transactional data that were published to internal team for review.
Providing OBIEE and Micro strategy technical support to BIAPPS reporting.
Writing Unix Scripts to auto generate the jobs execution status.
Environment: Data stage 9.1, Informatica, Tableau, OBIEE, MicroStrategy, Oracle, Teradata, Control-M, Remedy, JIRA, UNIX, Windows




Cognizant Technology Solutions, Hyd, IND JUL 2009 to AUG 2013
Project: BSC Dev Client: Blue Shield of California, USA
Role: ETL Developer

Role and Responsibilities:
Extensively involved in preparation of low-level design documents and unit test case preparation.
Extensively analyzed the subscriber, groups information against claims processing.
Design Data Stage jobs for the business requirements using ETL specifications extracting data from heterogeneous source systems, transform and load into the target.
Used Data Stage Manager for exporting & importing metadata definitions, creating Metadata definitions and view and edit the contents of the repository.
Designed and coded the ETL logic using Data Stage to enable initial load and incremental processing from Oracle.
Extensively involved in using the various activities like job activity, routine activity, wait for file activity, sequencer activity, termination activity, exception handler activity etc.
Create script for import and export the data by using Teradata utilities with BTEQ, FastLoad, MultiLoad and FastExport.
Involved in creating Shell Scripts.
Environment: Data stage 9.1, Oracle, Teradata, Facets, Remedy, Windows, UNIX

IP Creations Ltd, Chennai, IND May 2008 to Jul 2009
Client: Daman National Health Insurance, Abu Dhabi, UAE
Role: Java Documentum Developer

Roles and Responsibilities:
Extensively involved in preparation of low-level design documents and unit test case preparation.
Good exposure in Connectivity Architecture of DocBrokers/Docbases/Servers.
Exposure in Object hierarchy in Documentum as well as in Workflows.
Involved in creating Cabinets, Folders and Documentum ObjectTypes and corresponding Attributes for those documents.
Well versed with Import/Export/Copy/Move/Delete/CheckOut/Checkin and CancelCheckOut operations.
Designed and implemented applications that support the new features using Java, JSP, Documentum, DFC, HTML, Java Script, and XML.
Involved in creating Virtual documents and thus developing components using DFC and DQL programming.
Excellent communication and presentation skills as well as vibrant team player.

Environment: Java, JSP, CSS, HTML, Java Script, DFC, DQL, WDK, Webtop, Oracle, Windows 7, UNIX
Keywords: javascript business intelligence sthree database active directory information technology golang procedural language Arkansas Idaho Ohio Texas Washington

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)
[email protected];5207
Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: