Vamsi - GCP Data Enginner |
[email protected] |
Location: Texas City, Texas, USA |
Relocation: open to relocate |
Visa: H1b |
I have 9+ years of experience in designing and implementing data processing solutions using GCP and Hadoop technologies.
Have Proven Hands-On Experience Migrating On-Premises ETLs to Google Cloud Platform (GCP) Using Cloud Native Tools Like BIG Queries, Cloud Data Proc, Google Cloud Storage, and Composer. A working knowledge of ideas in relational and dimensional data modeling, such as fact and dimension tables, star-schema modeling, and snowflake modeling. Very skilled in creating data marts and warehousing designs that use distributed SQL concepts, Presto SQL, Hive SQL, Python (Pandas, Numpy, SciPy, Matplotlib), and Pyspark to handle the growing volume of data. Practical experience creating data pipelines on Unix/Linux systems and Bash scripting. Skilled in creating SSIS packages to extract, convert, and load (ETL) data from various sources into data marts. Practical knowledge of many programming languages, such as SAS, Python and Java. Developed multiple pass-through SQL queries for the Hive database while using the SAS Enterprise Guide. Possessing previous experience creating Pyspark scripts for ETL that handle Python, Java and Spark context. Previous experience importing and exporting data from RDBMS to HDFS and Hive using SQOOP. Broad experience across all stages of the software development life cycle (SDLC), particularly in the creation, testing, and deployment of applications. Extensive expertise creating a variety of reports and dashboards utilizing Tableau's Visualizations, as well as strong knowledge of data preparation, data modeling, and data visualization using Power BI. Examine and assess the reporting needs of different business divisions. Familiarity with agile software development methodologies and experience working in agile teams. Strong analytical and problem-solving skills to troubleshoot and resolve data-related issues. TECHNICAL SKILLS: Hadoop/Big Data Technologies: HDFS, Hive, Pig, Sqoop, Yarn, Spark, Spark SQL, Kafka Cloud Services: Cloud Storage, Big Query, Composer, Cloud Data Proc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, AWS, Azure Distributions: Cloudera Hadoop Languages & Scripting: Python, Java, R, SQL, SAS, XML, Machine learning, Pyspark, Scala Library & Packages: Pandas, NumPy, matplotlib, Seaborn, SciPy, MLLIB, NLTK Data Visualization Tools: Tableau, Power BI Development Tools: Microsoft SQL Studio, Eclipse, Visual Studio, Informatica Power center Data Modeling Tools: Erwin 9.6, Lucidchart Database: MySQL, MS-SQL server, Postgres, MongoDB, NoSQL Operating Systems: Unix, Linux, Windows, Mac OS PWC Tampa, FL Sr. Data Engineer Aug 2021 - Present Responsibilities: Developed and maintained data pipelines using Hadoop, Hive, and Spark/Scala for processing and analysis. Create data pipelines in GCP utilizing various airflow operators for ETL-related tasks. Knowledge of BigQuery, GCS, Cloud functions, and GCP Data Proc. Knowledge of utilizing Azure Data Factory to transfer data between GCP and Azure. Performance improvement through experience in generating power BI reports using Azure Analysis services. Migrating a whole Oracle database to BigQuery and utilizing Power-BI for reporting. Configured the services Data Proc, Storage, and BigQuery using the cloud shell SDK in CP. Consulted with the group and developed a framework for extracting business data from BigQuery and generating daily ad hoc reports. Designed and coordinated the implementation of advanced analytical models in a Hadoop cluster over sizable datasets with the data science team. Wrote Hive SQL scripts to create complicated tables with high performance metrics like partitioning, clustering, and skewing. Data download from BigQuery into pandas or Scala or Spark data frames for advanced EL capabilities. Collaborated with Google Data Catalog and other Google Cloud APIs to monitor, query, and charge BigQuery consumption. Created a proof of concept for using ML models and Cloud ML for table quality analysis in the batch process. Understanding of cloud dataflow and Apache beam. Solid understanding of cloud shell for various tasks and service deployment. BigQuery authorized views were created for row level security or exposing the data to other teams. Environment: SQL, Oracle, PL/SQL, CP Cloud Storage, Big Query, Composer, Cloud Data Proc, Cloud SQL, Cloud Functions, Cloud Pub/Sub, Spark/Scala, Azure Storage, Azure Database, Azure Data Factory, Azure Analysis Services, Power BI, Data Studio, Tableau, Pandas, Numpy, SciPy, Matplotlib. Cornerstone Bank Overland Park, KS Data Engineer Sept 2020 - July 2021 Responsibilities: Experience building and designing multiple data pipelines, end-to-end ETL and ELT processes for data ingestion and transformation on AWS. Deep understanding of AWS components such as EC2 and S3 Used Python to manipulate a g-cloud function to load data from CV files arriving in a GCS bucket into Big Query To extract, transform, and load data from many sources, including Azure SQL, Blob storage, an Azure SQL data warehouse, a write-back tool, and rearward, Pipelines were created in Azure Data Factory (ADF) utilizing Linked Services/Datasets/Pipeline. Created an Azure Web Job allowing Product Management teams to connect to various APIs and sources, retrieve data, and load it into an Azure Data Warehouse. Create several pipelines to link the Azure Cloud to AWS S3 and transfer data to Azure DataBase. Develop simple and complex SQL scripts to check and validate data flows in various applications. Perform data analysis, data migration, data cleansing, transformation, integration, data import, and data export with Java/Scala. Develop and deploy data pipelines in clouds such as AWS and GCP. Hands on architecting the ETL transformation layers and writing spark jobs to do the processing. Developed logistic regression models (Python) to expect subscription reaction price primarily based totally on client variables like beyond transactions, reaction to earlier mailings, promotions, demographics, interests, and hobbies, etc. Process and cargo certain and unbound Data from Google pub/subtopic to massive question the use of cloud Dataflow with Java Implemented Apache Airflow for authoring, scheduling, and tracking Data Pipelines Proficient in Machine Learning techniques (Decision Trees, Linear/Logistic Regressors) and Statistical Modeling Worked on confluence and Jira professional in statistics visualization like Matplotlib and seaborn library. Environment: Azure, Gcp, Bigquery, Gcs Bucket, AWS, Azure Data Factory, Apache Airflow, Java, Python, Pandas, Matplotlib, seaborn library, text mining, Numpy, Scikit-learn, Heatmaps, Bar charts, Line Charts, ETL, Python, Scala, Spark Tandy Leather Fort Worth, TX Data Engineer - Associate July 2019 - Aug 2020 Responsibilities: Perform ETL procedures on the existing data after migrating it from Teradata/SQL Server to Hadoop. Creates static and dynamic partitions to load structured, unstructured, and semi-structured data into Hadoop. Designed a task scheduling program for use across numerous servers in an EC2 environment. Using the Parquet Avro format, I created Hive partitioned tables to boost query performance and maximize storage. Transferring the ETL pipelines from the SQL server to the Hadoop environment. For data extraction, transformation, and aggregation from various file formats and for analysis and transformation of the data to reveal insights into client usage patterns, Spark apps were developed using PySpark and Spark-SQL. Created data flow pipelines using SSIS, NIFI, Python scripts, and Spark Applications. Transformed data from old tables to Hive, HBase tables, and S3 buckets for transfer to business and data scientists. Complex database queries were coded, tested, debugged, and documented using advanced SQL techniques. Created and implemented Scala workflows for obtaining data from cloud-based applications and transforming it. Spark was used to analyze the data, and the final computation results were stored in HBase tables. Provide data analysis and data validation while resolving complicated production issues. Environment: SQL Server, Hadoop, ETL operations, Data Warehousing, Data Modelling, Cassandra, AWS Cloud computing architecture, EC2, S3, Advanced SQL methods, NiFi, Python, Linux, Apache Spark, Scala, Spark-SQL, HBase CAPEFOX TECHNOLOGIES PVT LTD Hyderabad, India ETL Developer June 2015 - May 2019 Responsibilities: Analyzing current business functionality and process flows and preparing ETL process flows for predictive analysis. Contributed to the conceptual, logical, and physical model design for the staging and target databases. Using erwin 9.6, created logical and physical dimensional data models. Developed design documents for each module and technical documentation for the ETL process. Extraction, Transformation, and Load (ETL) Process Design, Development, and Support. Based on the transformation rules, high-level and low-level design document requirements for source-target mapping were created. Developed complex mappings involving Slowly Changing Dimensions, Business Logic, and the capture of records that had been deleted from the source systems. Working knowledge of incremental updates to the data warehouse and bagging area's source systems. Extensively worked to enhance the effectiveness of Informatica mappings/sessions/workflows. Created PL/SQL stored procedures to update the target tables' indexes and perform database updates. Used the Control M tool to finalize the scheduling of workflows and database scripts in collaboration with production support. Performance-tuned mappings, sessions, databases, and ETL code. Simple Informatica administration tasks include setting up users, files, permissions, deployment groups, server optimization, etc. Environment: Informatica Power Center 9.1/8.6.1, DB2, TOAD, Oracle 11g, PL/SQL, UNIX, Microstrategy, Erwin 9.6. Education: Bachelor of Technology Computer Science Thankyou, (512) 599-9499| [email protected] Keywords: machine learning business intelligence sthree active directory rlang information technology microsoft procedural language Florida Kansas Texas |