Parampreet's Portfolio

About

"Data is the new oil, and I'm the expert in refining it to uncover hidden treasures."

Data Analytics & Machine Learning

Education: Master of Computer Science

Location: San Francisco, CA, USA

Email: paramps@g.clemson.edu

DOB: December 5, 1998

I am a Machine Learning and Data Science professional with expertise in GenAI, data analytics, and AI-driven solutions that enhance business efficiency and decision-making.

With hands-on experience in developing predictive models, AI knowledge agents, and advanced data pipelines, I have streamlined workflows and reduced search times by 50% in research environments. I have successfully built and deployed predictive maintenance models on Azure, reducing IT downtime by 25%, and optimized product strategies at Maruti Suzuki, boosting market penetration confidence by 30%.

Skilled in leveraging cloud platforms like Azure and AWS, I bring a strong track record of delivering scalable, impactful solutions across various industries.

Facts

Years of Experience

Projects

August 2024

Graduation

Time 100% Scholarship Awardee

Skills

I'm proficient in a broad array of technologies including AWS, Apache Spark, SQL, Python, and R. I'm a certified AWS Machine Learning Specialist skilled in advanced ML modeling techniques.

Python (Pytorch, Tensorflow, Numpy, Pandas, Matplotlib) 95%

SQL (MySQL, MS SQL Server)95%

GenAI (Langchain, RAG, HuggingFace, AI Agents) 90%

C/C++ 80%

HTML/CSS 70%

Statistical Modelling 90%

Cloud Services (AWS, Azure, GCP) 85%

Tableau / Power BI95%

Microsoft Office 95%

Git (Version Control) 80%

MATLAB 90%

Hypothesis Testing 95%

Resume

Data Enthusiast with expertise in Machine Learning, GenAI, and Data Analytics, committed to developing AI-driven solutions that enhance business efficiency and decision-making, with a proven ability to quickly adapt and deliver impactful results.

Summary

Parampreet Singh

An aspiring Data Scientist with a proven track record of Discipline and Perseverance, bringing 4 years of academic and professional experience in Machine Learning, Artificial_Intelligence and Analytics.

United States of America
paramps@g.clemson.edu

Education

Master of Computer Science

2022 - 2024

Clemson University, SC

Highlighted Coursework: Statistics, Applied Data Science, Artificial Intelligence, Advance Machine Learning, Deep Learning for Computer Vision, Database Management Systems, Data Mining, Data Analysis

Bachelor of Technology (Mechanical)

2016-2020

Punjab Engineering College (Deemed to be University), Chandigarh, India

Highlighted Coursework: Mathematics, Partial Differential Equations, Computer Programming, Economics, Ethics, Business Environment and Laws

Professional Experience

Data Scientist

Jan 2024 - Aug 2024

Clemson University, SC

Project 1: Crystal-Graph CNN (CGCNN) and Generative Modeling

Boosted predictive accuracy of vacancy enthalpies by 15% by using a modified CGCNN, resulting in an R² score of 0.85+.
Accelerated DFT-relaxed crystal structure prediction and cut computational costs by 7x by developing a deep learning model using Graph Convolutional Networks (GCNs) and Generative Adversarial Networks (GANs).
Optimized data processing by designing a data pipeline in PyTorch, enabling efficient handling of variable-sized graph data.

Project 2: AI Knowledge Agent for Material Science Data Retrieval

Streamlined research workflows by building a multi-source AI knowledge agent that intelligently routes and retrieves scientific research data between a Chroma DB and various online sources.
Enhanced data retrieval and response accuracy by developing a custom vector-based search engine using LangGraph.

IT Data Engineer

Oct 2022 - Dec 2023

Clemson University, SC

Reduced unexpected IT asset downtime by 25% by building a predictive maintenance model using Azure Machine Learning with KNN, Random Forest, and XGBoost.
Preprocessed large-scale historical IT asset data through an ETL pipeline using Azure Databricks, Synapse Analytics, and PySpark, ensuring efficient data readiness for analysis.
Increased operational efficiency by designing a Power BI dashboard visualizing maintenance insights and schedules.
• Accelerated ticket resolution time by 20% developing a GenAI-powered IT assistant chatbot using Azure Functions, Azure Cognitive Search, and Azure OpenAI with RAG pattern for intelligent data retrieval.

Data Engineer

July 2020 - July 2022

Maruti Suzuki India Ltd, Gurugram, India

Optimized lead prioritization by engineering a daily batch prediction pipeline using Azure Data Factory and Azure ML, processing over 10,000 sales opportunities daily.
Implemented a bi-weekly model retraining workflow in Azure ML, improving prediction accuracy by 15% month-over-month and adapting the model to evolving sales patterns.
Integrated prediction results into Azure SQL Database for CRM accessibility, increasing sales engagement efficiency by 20%.
Designed an end-to-end data pipeline with Azure Data Factory, enabling seamless data ingestion, transformation, and daily batch predictions with 99% uptime.

Research Intern

Jan 2019 - May 2019

Mahindra Research Valley, Tamil Nadu, India

Attained an impressive 92% R2 score, forecasting rear axle power loss split, using mathematical modeling in MS Excel.
Proposed design improvements, reducing rear axle power loss by 3%, and published the findings in SAE International. [>]

Portfolio

Discover a showcase of innovation and technical prowess in my Portfolio section. Each project is a testament to my dedication in harnessing data to unravel complex problems and create impactful solutions.

All
Machine Learning
Web Scraping
Deep Learning
Artificial Intelligence

Car Value Prediction

Explore the Car Price Prediction Project: From Data Analysis to a Functional Web App. A practical approach to learning data science concepts and tools.

Resume RAG

This project demonstrates a powerful Retrieval-Augmented Generation (RAG) system designed to create interactive resumes through a Q&A conversation interface. Using LangChain and Chroma, the tool integrates past chat history, allows retrieval of relevant data from documents, and provides users with a seamless experience for generating dynamic content from their resume PDFs.

Enhancing Movie Recommendation Systems

A comparative analysis of matrix factorization techniques and collaborative filtering integration.

Predicting Hearing Thresholds from Brain MRI Scans

A comprehensive work on predicting hearing thresholds using advanced data mining techniques applied to Brain MRI scans. The project explores the evaluation of CNNs, SVR combined with PCA, and XGBoost to design effective predictive models.

Emotion Recognition from Multimodal Signals in Videos

This study investigates improving emotion recognition accuracy by merging audio-visual data from the OMG Dataset, targeting enhancements in affective computing for applications across marketing, robotics, and mental health.

Spectral Clustering for Image Segmentation

This project showcases the application of spectral clustering for segmenting images. I used affinity matrices, eigenvalue decomposition, and normalization techniques to segment various datasets, comparing the effectiveness of spectral clustering with k-means. The work highlights my skills in machine learning and advanced image processing.

Extended Linear Regression Analysis on Boston Housing Dataset

In this project, I conducted a detailed linear regression analysis on the Boston Housing Dataset using MATLAB. The work included data partitioning, standardization, and the application of linear, ridge, and Lasso regression models. I analyzed the effects of training size, feature expansion, and regularization on model accuracy, illustrated through MSE trends and weight adjustments, offering insights into effective predictive modeling techniques.

Image Captioning with RNNs and LSTMs

This project explored image captioning through the training of vanilla RNNs, LSTMs, and attention-based LSTMs on the COCO Captions dataset. This endeavor not only demonstrated my skill in neural network architectures but also my capability to integrate complex machine learning techniques for meaningful computer vision and natural language processing applications.

Advanced AI Strategies for 9-Piece Puzzle

In this project, I employed various AI search algorithms to tackle the classic 9-piece puzzle, using Python for state generation and manipulation. Key highlights include utilizing random actions to reach specific goal states, implementing and contrasting Breadth-First and Depth-First Searches, and exploring Uniform Cost Search under different cost structures.

Optimized Fully Connected Network for CIFAR-10

This project showcases the development of an optimized fully connected neural network for the CIFAR-10 dataset, featuring modular design, ReLU nonlinearity, and advanced optimization algorithms including SGD+Momentum, RMSProp, and Adam. I also integrated dropout regularization, demonstrating a nuanced approach to enhancing network performance and robustness in Python.

Deep CNN Development for Image Classification

In this project, I built and optimized deep CNNs for CIFAR-10 image classification, implementing from scratch in Python. Key achievements include creating efficient convolutional and pooling layers, integrating Kaiming initialization for weight optimization, and employing spatial and traditional batch normalization to enhance model training.

AI-Optimized Robot Navigation

Leveraged MDPs and Value Iteration in Python to guide a robot through a complex grid, accounting for noise and directionality. Achieved dynamic pathfinding showcasing the application of AI in strategic planning and execution

Reddit Comment Scrapper

The Reddit Comments Fetcher is a Python-based tool that captures and stores comments from Reddit posts using praw and pymongo. Designed for data analysts and researchers, it facilitates sentiment analysis and content archiving by efficiently handling complex comment threads.

Data Segregation with K-Means Clustering

Implemented a K-Means clustering algorithm to organize data into two groups, visualizing the results with Python’s Matplotlib and evaluating the model’s accuracy with the J function, reflecting precise and efficient data classification.