Gopi Maguluri | ML Engineer & Data Scientist

(

)

;

I'm a Machine Learning Engineer and Data Scientist with proven expertise in developing and deploying end-to-end AI solutions for complex business challenges. Specialized in NLP, LLM fine-tuning, and reinforcement learning techniques, with a track record of creating production-ready systems that deliver measurable impact. My experience includes building natural language interfaces for databases, implementing text ranking solutions, and developing intelligent document processing systems that drive operational efficiency and enhance decision-making capabilities across organizations.

ABOUT ME

👋 Hey there! Gopi Maguluri here. Nice to meet you!

👨‍💻 Machine Learning Engineer, at ArangoDB.

🎓 Pursuing Master of Science in Data Science at University of San Francisco, San Francisco.

🔍 Specialized in LLM fine-tuning, NLP, Generative AI and Intelligent Document Processing.

💡 3+ years of experience building production-ready AI solutions across industries.

🏆 Winner of the Chess GPU Hackathon, with a $10,000 credit prize.

🔧 Skilled in Python, SQL, PyTorch, TensorFlow and deploying models with Docker and FastAPI.

🌐 Previously worked as a Data Scientist at Tatras Data, developing Gen AI, NLP solutions and Intelligent Document Processing systems.

📊 Passionate about creating AI systems that deliver measurable business impact.

🤝 Open to collaborating on innovative ML and AI projects.

SKILLS

Data Scientist

Python

PostgreSQL

Statistics

Data Modeling

Numpy

Pandas

Matplotlib

Seaborn

PyTorch

TensorFlow

WORK EXPERIENCE

Machine Learning Engineer

October 2024 - Present

ArangoDB • San Francisco, United States

• Developed a Natural Language to ArangoDB Query Language (AQL) system by fine-tuning LLMs using PEFT enabling seamless interaction with graph databases, improving query efficiency by 78% for customers.
• Engineered LLM fine-tuning pipelines with custom reward functions for reinforcement learning, optimizing GPU utilization and training efficiency, resulting in a 25% improvement in AQL generation accuracy.
• Developed an automated data generation pipeline for graph databases to fine-tune LLMs, built an LLM-as-a-Judge based validation system reducing manual effort by 85% during my practicum.

Data Scientist

July 2021 - June 2024

Tatras Data • New Delhi, India

• Developed an NLP solution to automate matching job categories based on job titles and industries by training Bi-Encoders, Cross-Encoders and fine-tuning Gemma-2B, achieving a 75% reduction in matching time.
• Developed a production-ready, dockerized API to interact with the databases in natural language by automating SQL query generation using LLMs and prompt engineering, delivering 80% accuracy.
• Developed and deployed a dockerized end-to-end solution for a text ranking use case using Word2Vec, FastText, GRUs, BERT, and Sentence Transformers, reducing customer's manual effort in ranking by 90%.
• Implemented and deployed an Intelligent Document Processing solution with document clustering and token classification pipeline by training LayoutLMv3, improving customer's document processing efficiency by 5x.

Data Scientist

January 2021 - June 2021

Smart Energy Water • Noida, India

• Developed a text classification solution for customer complaints using custom-trained FastText embeddings and an ensemble of Decision Trees, Random Forest and XGBoost algorithms, resulting in 92% accuracy.
• Accomplished a research based project that aimed at disaggregating household electricity consumption to the appliance level, involving data analysis, feature engineering and model training using CNNs.

Data Science Mentor

July 2021 - June 2024

Sabudh Foundation • New Delhi, India

• Mentored 20 interns in 2 Data Science projects involving Intelligent Document Processing and Raag Identification in Indian Classical Music, while also guiding them in Python and Data Science concepts.

Data Scientist, Intern

July 2020 - Decemeber 2020

Sabudh Foundation • New Delhi, India

• Developed and deployed a web application to predict Ragas from audios by extracting Mel spectrograms and pitch contours, and training CNNs, achieving 82% accuracy as Data Science Intern at Sabudh Foundation.

PROJECTS

Deep Learning based Chess Bot (Hackathon Winner)

Developed a chess bot that combines convolutional neural networks with self-attention and squeeze-excitation blocks. Unlike traditional chess engines relying on reinforcement learning, this approach treats board evaluation as a computer vision problem, processing 8×8 chess boards with piece encoding to produce a scalar position score. Trained on 350,000 Grandmaster games using a 48-GPU cluster, this innovative model won me and my team a $10,000 credit in a hackathon.

Source Code Blog Post

Raga Identification in Indian Classical Music

Built a deep learning application to automatically identify ragas in Indian classical music. It processes high-quality audio recordings to extract Mel spectrograms, MFCCs, and pitch contours, which are used to train a custom convolutional neural network, achieving 82% classification accuracy. Designed to support novice music learners, the tool highlights raga characteristics and aids recognition through real-time predictions. Deployed as an interactive web application using Flask.

Source Code Live Demo

Fine Tuning Gemma for Law Stack Exchange

Preprocessed 20,000+ legal forum records with NVIDIA NeMo Curator and fine-tuned the Gemma 2B model using PEFT implementation & NeMo microservices, improving tagging accuracy by 15% during the ODSC NVIDIA Hackathon. Applied filters for word count, score, PII removal, and repeated n-grams to clean and refine the Law Stack Exchange dataset. Converted the model to .nemo format and fine-tuned it on Brev GPU instances using bf16 precision and a cosine annealing learning rate schedule.

Source Code

EduGraphix

eduGRAPHIX is an AI-powered tool designed to enhance learning through visuals. It utilizes fine-tuned Stability AI diffusion and Flux models on GPUs to generate high-quality educational illustrations based on user input. Additionally, it leverages LLaMA 3 for generating text explanations. The entire system is wrapped in Gradio. This project aims to improve knowledge retention and engagement by making complex concepts more accessible through AI-generated visuals.

Source Code

Generalized Linear Models

Developed an interactive blog using Streamlit to explain Generalized Linear Models (GLMs), covering linear, logistic, Poisson, and gamma regression. The blog includes visualizations and examples to illustrate key statistical concepts and link functions, making complex topics more accessible. The project was built and deployed using Streamlit.

Source Code Live Demo

EDUCATION

Master of Science in Data Science

University of San Francisco • San Francisco, United States

July 2024 - June 2025

Relevant Courses:
Advanced Machine Learning, Deep Learning, Probability and Statistics, Linear Regression Analysis, Data Visualization, Data Acquisition, Distributed Computing & Data Systems, MLops, Ethics in Data Science

Bachelor of Technology in Electronics and Communication Engineering

BML Munjal University • New Delhi, India

August 2017 - June 2021

Relevant Courses:
Data Structures, Algorithms, Computer Architecture, Introduction to Machine Learning, Calculus, Linear Algebra

ACHIEVEMENTS

Deep Generative Models

Completed comprehensive training in deep generative models involving probabilistic deep learning, variational auto-encoders, GANs at Indian Institute of Science (IISc).