Resume View

Home

Teja Ponnaluru - Senior Data Engineer

Location: Houston, Texas, USA

Relocation: no

Visa: H1B

Resume file: TEJA PONNALURU RESUME AWS_1745856427129.docx
Please check the file(s) for viruses. Files are checked manually and then made available for download.

TEJA PONNALURU
(832)-886-0785 | [email protected]

PROFESSIONAL SUMMARY
Over 10 years of IT experience across diverse domains, including Pharmaceutical, Oil & Gas, online Real Estate, and Supply Chain, demonstrating adaptability and expertise in various industry-specific technologies.
Extensive proficiency in AWS services such as EC2, S3, Redshift, and Glue, enabling efficient cloud-based data storage, processing, and management for large-scale enterprise data warehousing solutions.
Strong background in Databricks, leveraging its capabilities for collaborative data engineering and analytics, enhancing productivity through optimized Spark jobs and seamless integration with various data sources.
Proficient in the Hadoop ecosystem, utilizing tools like HDFS and MapReduce for big data processing, ensuring efficient data handling and analysis in large-scale environments.
Skilled in Python and PySpark for data manipulation and transformation, developing robust ETL processes that streamline data workflows and enhance data quality across multiple platforms.
Experience in Snowflake development, including the lift and shift of historic EDW star schemas, ensuring efficient data migration and optimized performance in cloud-based data warehousing.
Implemented Delta Lake for reliable data lakes, ensuring ACID transactions and scalable data management, enhancing data integrity and availability for analytics and reporting.
Developed and scheduled data flows using Apache Airflow on EC2 instances, automating complex workflows and ensuring timely data processing for business intelligence applications.
Created end-to-end automation for launching EC2 instances and installing services using CI/CD tools like CloudFormation and CodePipeline, facilitating seamless application migration to the cloud.
Proficient in integrating RESTful web services for efficient data exchange, enhancing interoperability between systems and enabling real-time data access for analytics and reporting.
Designed and developed user-friendly web interfaces for over 100 customers using Django, MySQL, and AngularJS, improving user experience and data visualization capabilities.
Developed a multi-threaded application wrapper in Python, optimizing performance and resource utilization for data processing tasks across various platforms.
Created frameworks in Python for data validation and monitoring, ensuring data quality and reliability throughout the ETL process and enhancing overall data governance.
Designed and implemented ETL flows in Python to transform and load data from diverse sources into target databases, ensuring efficient data integration and accessibility.
Collaborated with Data Science teams to integrate machine learning algorithms, utilizing clustering methods to derive insights and enhance data-driven decision-making processes.
Leveraged OCR and Tesseract technologies to process PDFs and images, enabling automated data extraction and enhancing data availability for analytics.
Utilized SOLR for custom business data searches, improving search capabilities and enabling users to access relevant information quickly and efficiently.
Actively participated in architectural discussions and designs for Data Mesh, contributing to innovative data management strategies that enhance data accessibility and collaboration.
Experienced in writing Linux/Unix/Shell scripts for automation, streamlining processes and improving operational efficiency across various IT environments.
Strong analytical and problem-solving skills, complemented by excellent communication abilities, enabling effective collaboration with technical and non-technical stakeholders in Agile and Waterfall methodologies.

EDUCATION
Bachelor of Technology in Computer Science and Engineering, JNTU Anantapur, India
IIIT Bangalore, India, PG Diploma in Data Science

TECHNICAL SKILLS:

Big Data Ecosystem Apache Spark, Hadoop, MapReduce, NiFi, YARN, Atlas, Cloudera Data Platform
Distributions Cloudera, Hortonworks, HDFS, Teradata
Databases HBase, Redis, Cassandra, MongoDB, MySQL, PostgreSQL, AlloyDB, Snowflake
Data Services Hive, Pig, Impala, Sqoop, Flume, Kafka, DBT, Tableau, Power BI, Informatica
Scheduling Tools Zookeeper, Oozie, Apache Airflow, Azure Data Factory (ADF), Atomic
Cloud Computing Tools AWS, Databricks, Azure, GCP, Snowflake
Google Cloud Platform Cloud Storage, BigQuery, BigTable, DataProc, Cloud Functions, Apache Beam, Cloud Shell, GSUTIL, Cloud Composer, IAM, DataFlow, Cloud Pub/Sub
Programming Languages Scala, Python, SQL, PL/SQL, HiveQL, Unix, Shell Scripting, Java
Microsoft Azure Blob Storage, SQL Server, Postgres, ADF, Synapse, ADLS, Azure Databricks, Cosmos DB, HDInsight, Event Hubs, Stream Analytics, SQL Database, Data Lake
Build Tools Jenkins, Maven, ANT, SBT, Docker, Kubernetes, DevOps
Version Control GitLab, GitHub, SVN, Bitbucket

PROFESSIONAL EXPERIENCE:

Company: NC Department of H&HS NOV 2021 - Current date
Project: Cloud migration.
Roles: Senior AWS data engineer. Responsibilities:
Provided a comprehensive design and infrastructure solution for the migration project, ensuring all configurations and storage components were optimized for performance and scalability.
Developed end-to-end automation processes to efficiently transform and load data from S3 into Glue Catalog, Redshift, and DynamoDB using AWS Lambda functions for seamless data integration.
Created PySpark scripts to process large volumes of data, generating dynamic tables that facilitated advanced analytics and reporting capabilities for business stakeholders.
Designed and implemented AWS Step Functions, triggering them based on events through Lambda, to serialize the user registration process and enhance user experience.
Built data pipelines utilizing AWS Data Pipeline and AWS Glue to facilitate the loading of data from S3 into Redshift, ensuring timely and accurate data availability for analytics.
Successfully migrated data from on-premise Oracle databases to Redshift using AWS Database Migration Service (DMS), ensuring data integrity and minimal downtime during the transition.
Conducted data validation processes using Amazon Athena, leveraging S3 as the data source to ensure accuracy and consistency of migrated datasets.
Established a data lake architecture using the Aurora database, enabling efficient data analysis and storage for various business intelligence applications.
Transformed data according to specific business logic and loaded it into Redshift, providing stakeholders with reliable data for reporting and decision-making purposes.
Managed IAM user creation and policy development, ensuring secure access controls and compliance with organizational security standards.
Developed Lambda scripts to automate schema creation and historical data loads, streamlining the data ingestion process and reducing manual intervention.
Created custom data pipelines to load data from S3 into DynamoDB tables using Python, enhancing data accessibility and performance for application developers.
Designed and implemented QuickSight reports and dashboards to monitor daily job performance and data loads, providing insights into operational efficiency.
Automated deployment processes using CI/CD pipelines via GitHub, CloudFormation, and CodePipeline, ensuring consistent and reliable infrastructure management.
Developed Jenkins pipelines to manage resource creation and change sets, preventing duplicate resource creation and optimizing deployment workflows using Python.
Designed and implemented Lakehouse architectures using Delta Lake, ensuring ACID compliance, schema enforcement, and time travel for efficient big data processing and real-time analytics.
Optimized Spark jobs through adaptive query execution, caching, partitioning, and broadcast joins, achieving a 30-50% reduction in processing time and improved cluster utilization.
Implemented Delta Lake optimization techniques, including Z-Ordering, data skipping, and Change Data Capture (CDC), to enhance query performance and maintain data consistency.
Built real-time data pipelines using Spark Structured Streaming and Autoloader, ingesting high-velocity event data from Kafka, Kinesis, and IoT devices for near-instantaneous insights.
Automated data engineering workflows with Databricks Workflows, scheduling ETL processes, machine learning model training, and data quality checks without relying on external orchestration tools like Airflow.

Environment/Tools: AWS, EC2, S3, Glue, Delta Lake, Databricks Workflows, Databricks Jobs, Data Pipeline, DMS, Lambda, Dynamo DB, Aurora, Athena, Cloud Formation, Code Pipeline, Python, Pyspark, Airflow, PostgreSQL, Oracle, Tableau, Unix, Shell Script, GitHub, oracle data modeler.

Company: Baker Hughes JUNE 2019 - OCT 2021
Project: Non-Conformities Similarities.
Roles: Senior Data and Information Architect. Responsibilities:
Gathered raw data from multiple sources using Sqoop, processing it with machine learning techniques, and indexing it into SOLR for efficient similarity searches across service requests.
Developed a machine learning algorithm to generate a list of similar issues based on historical service requests in ServiceNow, identifying clusters of similarities in datasets of escaping defects.
Designed and implemented an end-to-end architecture for the project, ensuring seamless integration of data sources, processing, and delivery of actionable insights.
Conducted data extraction, cleansing, and exploratory data analysis (EDA) using Python, preparing datasets for machine learning model training and evaluation.
Collaborated closely with the data science team to create a machine learning algorithm that calculates similarities using clustering techniques, improving the accuracy of defect identification.
Responsible for indexing all clustered data in SOLR, enabling effective search and visualization of similarities to support informed business decision-making.
Configured and deployed machine learning algorithms as operational models, ensuring they were effectively integrated into production environments for real-time insights.
Enhanced model upgrades and implementations in production, continuously improving the performance and accuracy of machine learning solutions.
Provided data insights to the engineering team to identify defects swiftly, enabling prompt decision-making to address day-to-day business challenges.
Developed scalable ETL/ELT pipelines using AWS services like Glue and Lambda, optimizing transformations for high-performance batch processing and data integration.
Designed and managed Databricks Workflows to automate ETL processes, significantly reducing processing time and enhancing operational efficiency.
Integrated Databricks Delta Lake to enhance data consistency, versioning, and reliability, ensuring high-quality analytics and reporting capabilities.
Developed and implemented logic to calculate similarity as a distance between vectors representing service requests, facilitating more accurate clustering and analysis.
Implemented Databricks Unity Catalog for fine-grained access control, column-level security, and lineage tracking, ensuring data security and compliance across multiple workspaces.
Designed end-to-end machine learning pipelines using Databricks MLflow, encompassing feature engineering, hyperparameter tuning, model versioning, and deployment for real-time AI applications.
Developed high-performance SQL queries and dashboards using Databricks SQL, optimizing query execution with materialized views, result caching, and query acceleration techniques.
Environment/Tools: Snowflake, Amazon Kinesis, Amazon S3, AWS Step Functions, AWS Glue, EMR, Redshift, DynamoDB, SQL, Python, Machine Learning, Apache Airflow, Unity Catalog, Databricks Workflows, Databricks Jobs, Cron Jobs, Python DAGs, Git, Jenkins, CI/CD Pipelines, Agile Methodology.
Project: Access knowledge.
Roles: Data engineer Specialist. Responsibilities:
Built scalable ELT pipelines to ingest CRM data into Snowflake using FiveTran, transforming OLTP data to OLAP format for Redshift storage and analytics.
Captured real-time IoT data via Amazon Kinesis and automated pipelines for timely ingestion and processing with Airflow orchestration.
Designed staging processes in Amazon S3 for real-time data, applying scalable transformations using EMR and Python.
Developed AWS Glue pipelines to process real-time data into Redshift and DynamoDB, employing advanced partitioning and clustering to enhance analytics capabilities.
Developed and optimized Databricks notebooks for large-scale data transformations in a cloud environment.
Orchestrated ETL pipelines across platforms using AWS Step Functions, integrating data with Python and Spark transformations.
Applied machine learning models like classification and recommendation systems to real-time datasets using Python and Scala for advanced predictive analytics.
Automated ETL pipelines with Apache Airflow, enhanced scheduling with cron jobs, and monitored workflows with comprehensive logging mechanisms.
Integrated Databricks Delta Lake for improved data consistency, versioning, and reliability in analytics.
Created Python DAGs in Airflow to orchestrate end-to-end data pipelines and integrated real-time datasets via EMR.
Streamlined CI/CD processes using Jenkins, ensuring automated deployment of data pipelines to production.
Version-controlled Python and Scala scripts in Git, enabling smooth collaboration and clean versioning within Agile teams.
Environment/Tools: Snowflake, Amazon Kinesis, Amazon S3, AWS Step Functions, AWS Glue, EMR, Redshift, DynamoDB, SQL, Python, Machine Learning, Apache Airflow, Unity Catalog, Databricks Workflows, Databricks Jobs, Cron Jobs, Python DAGs, Git, Jenkins, CI/CD Pipelines, Agile Methodology.
Project: Data Lineage (Graph database Neo4j). Roles: Data engineer Specialist.
Responsibilities:
Extracted synchronous and asynchronous unstructured data in JSON and RTF formats from the Ariba procurement tool using Python APIs, processing 50K+ records weekly.
Designed and implemented Python-based workflows for data extraction, cleansing, and preprocessing, improving data readiness by 40% for structured storage.
Transformed raw text and JSON data into structured PostgreSQL tables, reducing manual data handling efforts by 85%.
Engineered logic to dynamically flatten complex and nested JSON structures into relational table formats, handling over 100+ schema variations per dataset.
Automated daily creation of dynamic PostgreSQL tables and loading of flattened JSON data, maintaining 99.9% data availability for critical business reporting.
Developed and deployed 25+ custom PostgreSQL views to support downstream analytics, enhancing data accessibility for reporting and BI tools.
Built end-to-end Python automation pipelines, cutting manual intervention time from 6 hours weekly to near-zero across the ingestion and loading process.
Optimized Python scripts, improving table creation and data loading speeds by 60%, significantly reducing ETL processing windows.
Designed and implemented 15+ key performance indicators (KPIs) for procurement analysis using processed Ariba data, driving executive decision-making.
Ensured system scalability, maintainability, and error resilience through modular Python code design and robust PostgreSQL architectural best practices.
Integrated comprehensive logging and exception handling mechanisms into Python workflows, improving issue detection and recovery time by 50%.
Collaborated closely with business and technical stakeholders, reducing feedback loops by 30% through agile sprint-based development and early prototyping.
Environment/Tools: AWS, EC2, EMR, S3, Python, Neo4j, Cypher, PostgreSQL, Airflow, Oracle, Thought Spot, Unix, Shell Script, Windows, GitHub.

Project: Ariba & CMM.
Roles: Data engineer Specialist. Responsibilities:
Extracted synchronous and asynchronous unstructured data in JSON and RTF formats from the Ariba procurement tool using Python APIs.
Designed and implemented Python-based workflows for data extraction, cleansing, and preprocessing, ensuring readiness for structured storage.
Transformed raw text and JSON data into structured PostgreSQL tables, enabling seamless integration with analytical and reporting systems.
Engineered logic to dynamically flatten complex and nested JSON structures into relational table formats aligned with evolving business rules.
Automated daily creation of dynamic PostgreSQL tables and loading of flattened JSON data, ensuring continuous data availability for business processes.
Developed and deployed custom PostgreSQL views to provide downstream applications with clean, accessible, and business-ready datasets.
Built end-to-end Python automation to eliminate manual intervention across the data ingestion, transformation, and loading lifecycle.
Optimized Python scripts for high-performance table creation and data loading, significantly improving operational speed and resource efficiency.
Designed and implemented key performance indicators (KPIs) for business users, leveraging insights derived from processed Ariba procurement data.
Ensured system scalability, maintainability, and error resilience through modular Python code design and robust PostgreSQL architecture practices.
Environment/Tools: AWS, EC2, EMR, S3, Python, PostgreSQL, Oracle, Airflow, Thought Spot, Unix, Shell Script, Windows, GitHub.

Company: Accenture. SEP 2017 - JUNE 2019
Role: Application Development Analyst. Roles & Responsibilities:
Designed and delivered scalable data solutions using SQL, Python, and Java, aligning with business and technical requirements to ensure optimal system performance.
Executed data extraction, cleansing, modeling, seeding, and migration for master data with Talend and Informatica, achieving high data accuracy, integrity, and regulatory compliance.
Built efficient ETL pipelines using Apache NiFi and Talend, embedding complex business logic to streamline data flows and improve processing speed and reliability.
Developed data marts and applied robust data modeling strategies with ERwin and SQL Server, managing both normalized and de-normalized client datasets for advanced analytics.
Enhanced server performance and ensured high availability across AWS and Azure cloud environments by leveraging monitoring tools such as CloudWatch and Azure Monitor.
Architected and developed Hadoop applications with Hive, Pig, and Spark, delivering scalable big data solutions in adherence to performance and best practice standards.
Managed end-to-end installation, configuration, and support for Hadoop ecosystems, including HDFS and YARN, optimizing data storage and processing capabilities.
Translated complex business requirements into detailed technical designs using UML diagrams and Agile frameworks, driving alignment between development teams and stakeholders.
Developed responsive web applications with JavaScript, React, and RESTful APIs to enable rapid data querying and real-time tracking, significantly enhancing user experience.
Led documentation efforts and maintained system architecture repositories, ensuring continuous improvement, scalability, and knowledge sharing across development teams.
Environment/Tools: AWS, EC2, EMR, S3, Hive, Sqoop, Spark, Python, PostgreSQL, MySQL, Pentaho data integration, Informatica Power Center and Pentaho user console, SSIS, Unix, Windows, GitHub.

Company: India Property Online Pvt Limited. AUG-2014 - SEP - 2017
Role: Data management and analyst. Roles & Responsibilities:

Optimized MySQL database schemas for 5 TB+ property listings, reducing query times by 30% and enabling near real-time analytics for dynamic pricing.
Automated ingestion from 15+ data sources using Pentaho ETL, improving data freshness from 60% to 90% and eliminating 20 manual hours weekly.
Cut monthly reporting time from 8 hours to 45 minutes by tuning queries and implementing materialized views, accelerating executive decision-making.
Implemented Python-based data validation for 100K+ daily records, reducing reporting errors by 75% through automated quality checks.
Designed and implemented database architecture supporting real-time property valuation adjustments and trend analysis.
Developed complex stored procedures powering dynamic pricing models responsive to market fluctuations.
Migrated legacy static reports to interactive Tableau dashboards with drill-down capabilities, enhancing sales analysis depth.
Established scalable AWS data lake infrastructure for historical transaction storage and forecasting models.
Built a comprehensive backup and disaster recovery system ensuring continuous data availability and minimal downtime.
Partnered with the data science team to develop behavioral analytics models driving personalized customer recommendations.
Environment/Tools: AWS, EC2, EMR, S3, Hive, Sqoop, MySQL, Informatica Power Center, Pentaho data integration, Pentaho user console, Unix.
Keywords: continuous integration continuous deployment artificial intelligence business intelligence sthree database information technology procedural language Delaware North Carolina

To remove this resume please click here or send an email from [email protected] to [email protected] with subject as "delete" (without inverted commas)

[email protected];5373

Enter the captcha code and we will send and email at [email protected]
with a link to edit / delete this resume
Captcha Image: