About
"Data is the new oil, and I'm the expert in refining it to uncover hidden treasures."

Data Analytics & Machine Learning
- Education: Master of Computer Science
- Location: San Francisco, CA, USA
- Email: paramps@g.clemson.edu
- DOB: December 5, 1998
I am a Machine Learning and Data Science professional with expertise in GenAI, data analytics, and AI-driven solutions that enhance business efficiency and decision-making.
With hands-on experience in developing predictive models, AI knowledge agents, and advanced data pipelines, I have streamlined workflows and reduced search times by 50% in research environments. I have successfully built and deployed predictive maintenance models on Azure, reducing IT downtime by 25%, and optimized product strategies at Maruti Suzuki, boosting market penetration confidence by 30%.
Skilled in leveraging cloud platforms like Azure and AWS, I bring a strong track record of delivering scalable, impactful solutions across various industries.
Facts
Years of Experience
Projects
Graduation
Time 100% Scholarship Awardee
Skills
I'm proficient in a broad array of technologies including AWS, Apache Spark, SQL, Python, and R. I'm a certified AWS Machine Learning Specialist skilled in advanced ML modeling techniques.
Resume
Data Enthusiast with expertise in Machine Learning, GenAI, and Data Analytics, committed to developing AI-driven solutions that enhance business efficiency and decision-making, with a proven ability to quickly adapt and deliver impactful results.
Summary
Parampreet Singh
An aspiring Data Scientist with a proven track record of Discipline and Perseverance, bringing 4 years of academic and professional experience in Machine Learning, Artificial_Intelligence and Analytics.
- United States of America
- paramps@g.clemson.edu
Education
Master of Computer Science
2022 - 2024
Clemson University, SC
Highlighted Coursework: Statistics, Applied Data Science, Artificial Intelligence, Advance Machine Learning, Deep Learning for Computer Vision, Database Management Systems, Data Mining, Data Analysis
Bachelor of Technology (Mechanical)
2016-2020
Punjab Engineering College (Deemed to be University), Chandigarh, India
Highlighted Coursework: Mathematics, Partial Differential Equations, Computer Programming, Economics, Ethics, Business Environment and Laws
Professional Experience
Data Scientist
Jan 2024 - Aug 2024
Clemson University, SC
- Boosted predictive accuracy of vacancy enthalpies by 15% by using a modified CGCNN, resulting in an R² score of 0.85+.
- Accelerated DFT-relaxed crystal structure prediction and cut computational costs by 7x by developing a deep learning model using Graph Convolutional Networks (GCNs) and Generative Adversarial Networks (GANs).
- Optimized data processing by designing a data pipeline in PyTorch, enabling efficient handling of variable-sized graph data.
- Streamlined research workflows by building a multi-source AI knowledge agent that intelligently routes and retrieves scientific research data between a Chroma DB and various online sources.
- Enhanced data retrieval and response accuracy by developing a custom vector-based search engine using LangGraph.
Project 1: Crystal-Graph CNN (CGCNN) and Generative Modeling
Project 2: AI Knowledge Agent for Material Science Data Retrieval
IT Data Engineer
Oct 2022 - Dec 2023
Clemson University, SC
- Reduced unexpected IT asset downtime by 25% by building a predictive maintenance model using Azure Machine Learning with KNN, Random Forest, and XGBoost.
- Preprocessed large-scale historical IT asset data through an ETL pipeline using Azure Databricks, Synapse Analytics, and PySpark, ensuring efficient data readiness for analysis.
- Increased operational efficiency by designing a Power BI dashboard visualizing maintenance insights and schedules.
- • Accelerated ticket resolution time by 20% developing a GenAI-powered IT assistant chatbot using Azure Functions, Azure Cognitive Search, and Azure OpenAI with RAG pattern for intelligent data retrieval.
Data Engineer
July 2020 - July 2022
Maruti Suzuki India Ltd, Gurugram, India
- Optimized lead prioritization by engineering a daily batch prediction pipeline using Azure Data Factory and Azure ML, processing over 10,000 sales opportunities daily.
- Implemented a bi-weekly model retraining workflow in Azure ML, improving prediction accuracy by 15% month-over-month and adapting the model to evolving sales patterns.
- Integrated prediction results into Azure SQL Database for CRM accessibility, increasing sales engagement efficiency by 20%.
- Designed an end-to-end data pipeline with Azure Data Factory, enabling seamless data ingestion, transformation, and daily batch predictions with 99% uptime.
Research Intern
Jan 2019 - May 2019
Mahindra Research Valley, Tamil Nadu, India
- Attained an impressive 92% R2 score, forecasting rear axle power loss split, using mathematical modeling in MS Excel.
- Proposed design improvements, reducing rear axle power loss by 3%, and published the findings in SAE International. [>]
Portfolio
Discover a showcase of innovation and technical prowess in my Portfolio section. Each project is a testament to my dedication in harnessing data to unravel complex problems and create impactful solutions.
- All
- Machine Learning
- Web Scraping
- Deep Learning
- Artificial Intelligence

Resume RAG
This project demonstrates a powerful Retrieval-Augmented Generation (RAG) system designed to create interactive resumes through a Q&A conversation interface. Using LangChain and Chroma, the tool integrates past chat history, allows retrieval of relevant data from documents, and provides users with a seamless experience for generating dynamic content from their resume PDFs.

Spectral Clustering for Image Segmentation
This project showcases the application of spectral clustering for segmenting images. I used affinity matrices, eigenvalue decomposition, and normalization techniques to segment various datasets, comparing the effectiveness of spectral clustering with k-means. The work highlights my skills in machine learning and advanced image processing.

Extended Linear Regression Analysis on Boston Housing Dataset
In this project, I conducted a detailed linear regression analysis on the Boston Housing Dataset using MATLAB. The work included data partitioning, standardization, and the application of linear, ridge, and Lasso regression models. I analyzed the effects of training size, feature expansion, and regularization on model accuracy, illustrated through MSE trends and weight adjustments, offering insights into effective predictive modeling techniques.

Image Captioning with RNNs and LSTMs
This project explored image captioning through the training of vanilla RNNs, LSTMs, and attention-based LSTMs on the COCO Captions dataset. This endeavor not only demonstrated my skill in neural network architectures but also my capability to integrate complex machine learning techniques for meaningful computer vision and natural language processing applications.

Advanced AI Strategies for 9-Piece Puzzle
In this project, I employed various AI search algorithms to tackle the classic 9-piece puzzle, using Python for state generation and manipulation. Key highlights include utilizing random actions to reach specific goal states, implementing and contrasting Breadth-First and Depth-First Searches, and exploring Uniform Cost Search under different cost structures.

Optimized Fully Connected Network for CIFAR-10
This project showcases the development of an optimized fully connected neural network for the CIFAR-10 dataset, featuring modular design, ReLU nonlinearity, and advanced optimization algorithms including SGD+Momentum, RMSProp, and Adam. I also integrated dropout regularization, demonstrating a nuanced approach to enhancing network performance and robustness in Python.

Deep CNN Development for Image Classification
In this project, I built and optimized deep CNNs for CIFAR-10 image classification, implementing from scratch in Python. Key achievements include creating efficient convolutional and pooling layers, integrating Kaiming initialization for weight optimization, and employing spatial and traditional batch normalization to enhance model training.