machine-learning-portfolio

πŸš€ Machine Learning & AI Engineering Portfolio

GitHub stars GitHub forks GitHub issues License: MIT

Welcome to my comprehensive machine-learning and AI engineering portfolio! This repository showcases end-to-end ML projects, from research and experimentation to production-ready deployments with complete MLOps pipelines.

πŸ‘¨β€πŸ’» About Me

I’m Marcus, a passionate Machine Learning Engineer and AI practitioner focused on building robust, scalable and production-ready AI systems. This portfolio demonstrates my expertise across the entire ML lifecycle, from data preprocessing and model development to deployment and monitoring. These projects showcase modern AI-augmented development practices, leveraging advanced AI assistants (Claude, Gemini, ChatGPT) to accelerate development cycles while maintaining enterprise-grade code quality and architectural excellence.

Core Competencies:

🧰 Overall Tech Stack Summary

The table below summarizes the key technologies used across my completed projects and coursework. Each entry is grouped by its place in the machine-learning pipeline and includes a brief explanation written in plain language.

Pipeline Stage Tool/Technology Usage (Project/Course) Simple Explanation
Data Storage & Sources AWS S3 Fraud-Detection MLOps, Edenred Invoice Assistant, GRC-LLM β€” stores datasets and model artifacts S3 is like a big cloud hard-drive. It keeps our data and trained models so we can load them later.
Β  DynamoDB Digital-Value-Chain serverless e-commerce β€” stores product offers and cart data DynamoDB is a fast cloud database. It keeps items (like products) in a table so the app can read and write quickly.
Β  PostgreSQL + pgvector CareCopilot Healthcare AI β€” stores medical documents with vector similarity search PostgreSQL with pgvector is a database that can store both text and number lists (embeddings) to find similar medical records.
Β  CSV/JSON files Pinecone Vector DB, Fraud-Detection MLOps, GRC-LLM, PromptOps Policy Coach β€” holds training tables and text data These are simple text files that hold tables or lists. They let us load training data from our computer.
Β  Audio files Speech-Recognition project β€” WAV/MP3 clips for speech-to-text Sound files are recordings. We feed them to the model to teach it to hear and transcribe speech.
Β  Synthetic Medical Data CareCopilot Healthcare AI β€” realistic clinical documents for HIPAA-safe development Fake but realistic medical records that look real but don’t contain actual patient information, keeping data safe.
Β  Policy Documents PromptOps Policy Coach β€” company policy documents for enterprise Q&A system Real workplace policies (expense, vacation, remote work) that employees ask questions about every day.
Data Preprocessing & Feature Engineering Pandas Bike-Rental Predictor, Pinecone Vector DB, Fraud-Detection β€” reading CSVs, cleaning and encoding data Pandas is like a spreadsheet for Python. It helps us read tables, clean them and get them ready for training.
Β  NumPy All projects, PromptOps Policy Coach β€” math operations and array manipulation NumPy lets us work with lists of numbers. It makes math operations fast and easy.
Β  scikit-learn Bike-Rental preprocessing, Fraud-Detection metrics & validation scikit-learn has tools to split data, scale numbers and measure how good a model is.
Β  Librosa / soundfile / pydub / wave Speech-Recognition project β€” loading audio and extracting features These libraries open sound files and turn them into numbers so a model can understand speech.
Β  sentence-transformers Pinecone Vector DB, PromptOps Policy Coach β€” converts text into numeric embeddings This library takes sentences and turns them into long lists of numbers so we can compare meanings.
Β  dotenv Pinecone Vector DB, PromptOps Policy Coach β€” reads API keys from .env files dotenv lets us keep secret keys in a file and load them into our program safely.
Β  Medical NLP Processing CareCopilot Healthcare AI β€” extracts conditions, medications, and medical entities from clinical notes Special tools that read doctor’s notes and pull out important medical information like diseases and medicines.
Β  Document Chunking PromptOps Policy Coach β€” splits policy documents into searchable pieces Breaks big documents into small pieces so the AI can find the right information quickly.
Embeddings & Vectorization Pinecone Pinecone Vector DB β€” cloud vector store for semantic search Pinecone is a special database that stores those long number lists (embeddings). It helps us search for similar texts.
Β  pgvector CareCopilot Healthcare AI β€” vector similarity search within PostgreSQL pgvector adds vector search to regular databases, so we can find similar medical records by meaning, not just keywords.
Β  Custom Vector Search PromptOps Policy Coach β€” numpy-based embedding system for policy documents A simple but effective way to search documents by meaning using basic math operations.
Β  Vector Similarity Search CareCopilot Healthcare AI, PromptOps Policy Coach β€” finds relevant documents using semantic matching This compares documents by meaning to find the most relevant ones for a user’s question.
Model Training PyTorch Bike-Rental prediction β€” neural network training PyTorch is a toolkit that lets us build and train neural networks. It teaches the computer to predict things.
Β  TensorFlow + Keras Simple neural network notebook β€” single-layer perceptron for MNIST digits TensorFlow and Keras help us build a simple β€œbrain” to recognize handwritten numbers.
Β  XGBoost (via SageMaker) Fraud-Detection MLOps β€” training the fraud classifier XGBoost is a tree-based algorithm. It learns to tell normal transactions from fraudulent ones.
Β  Transformers (BERT/GPT/XLNet) LLMs coursework β€” exploring large language models These models understand and generate text. We used them to learn about language processing.
Β  LoRA / PEFT GRC-LLM β€” efficient fine-tuning of TinyLlama LoRA adapts a big language model using small extra pieces, saving time and cost.
Β  Whisper & speech_recognition Speech-Recognition project β€” transcribes audio to text Whisper and the speech_recognition library help the app understand spoken words.
Β  OpenAI API LLMs coursework, LLM-Engineering app, PromptOps Policy Coach β€” chat and interview responses This API calls a chat model like ChatGPT to answer questions. It lets our apps have conversations.
Β  Mock AI Services CareCopilot Healthcare AI, PromptOps Policy Coach β€” realistic AI responses without real ML models Instead of expensive AI models, we use pre-written smart responses that act like real AI for demonstrations.
Model Evaluation & Explainability scikit-learn metrics Fraud-Detection MLOps β€” AUC-ROC, precision/recall calculations These measurements show how well the fraud model works.
Β  SHAP Fraud-Detection MLOps β€” global feature importance SHAP tells us which features are most important for the model’s decisions.
Β  LIME Fraud-Detection MLOps β€” local explanation for single predictions LIME explains why the model made a particular decision for one example.
Β  Matplotlib / Seaborn Fraud-Detection MLOps β€” plotting feature importance and ROC/PR curves These libraries draw charts to help us see model performance.
Β  Healthcare Similarity Metrics CareCopilot Healthcare AI β€” medical document relevance scoring with confidence percentages Measures how well medical documents match a doctor’s question, giving a confidence score like β€œ46.2% similar”.
Β  Prompt Framework Analytics PromptOps Policy Coach β€” comparing different AI reasoning approaches with performance metrics Tracks how well different prompt strategies work and helps choose the best approach for each question.
Deployment & Serving Flask Bike-Rental API β€” REST endpoint for predictions Flask lets us build a small web server so outside programs can ask for predictions.
Β  Streamlit GRC-LLM, LLM-Engineering app, CareCopilot Healthcare AI, PromptOps Policy Coach β€” interactive web front-ends Streamlit makes it easy to create a chat interface or dashboard from Python code.
Β  AWS SageMaker endpoints Fraud-Detection MLOps, GRC-LLM, Edenred Invoice Assistant β€” hosting trained models SageMaker runs our trained models in the cloud so users can send requests and get answers.
Β  AWS Lambda Digital-Value-Chain and Edenred Invoice Assistant β€” serverless backend functions Lambda runs small pieces of code only when needed. This saves money because there is no always-running server.
Β  AWS API Gateway Digital-Value-Chain and Invoice Assistant β€” routes HTTP requests to Lambda API Gateway receives web requests and sends them to the right Lambda function.
Β  AWS EC2 Bike-Rental API deployment, CareCopilot Healthcare AI β€” hosts the REST service and runs CI tests EC2 is a virtual machine in the cloud. We used it to run our bike-rental API and healthcare demo in production.
Β  Docker Bike-Rental project, PromptOps Policy Coach β€” containerizes the API for consistent deployment Docker packages our app and its dependencies so it runs the same everywhere.
Β  GitHub Actions Bike-Rental project β€” CI/CD pipeline for testing and deployment GitHub Actions automatically tests code and deploys it when we push changes.
Β  AWS SAM / CloudFormation Digital-Value-Chain β€” infrastructure as code for serverless stack SAM and CloudFormation are templates that tell AWS how to build all the resources we need.
Β  CloudWatch Edenred Invoice Assistant β€” monitoring and logging for Lambda CloudWatch records logs and metrics so we can see what our Lambda functions are doing.
Β  GitHub Pages Edenred Invoice Assistant β€” hosts the static chat interface GitHub Pages serves our HTML and JavaScript files so users can access the chatbot in a browser.
Β  Google Cloud Shell PromptOps Policy Coach β€” cloud-based development and deployment environment A free cloud computer with all the tools pre-installed for developing and testing applications.
Β  Stripe Digital-Value-Chain β€” handles payment checkout Stripe processes credit-card payments securely.
Β  Boto3 Digital-Value-Chain, GRC-LLM β€” Python SDK to access AWS services Boto3 lets our Python code talk to AWS services like DynamoDB, S3 and SageMaker.
Healthcare & Compliance FHIR Standards CareCopilot Healthcare AI β€” converts clinical notes to structured healthcare data format FHIR is the standard way hospitals share patient data. It turns doctor’s notes into organized information other systems can read.
Β  HIPAA Compliance Architecture CareCopilot Healthcare AI β€” healthcare data privacy and security design patterns HIPAA is the law that protects patient information. Our architecture follows rules to keep medical data safe and private.
Β  Clinical Terminology CareCopilot Healthcare AI β€” medical vocabulary and healthcare workflow understanding Using proper medical terms and understanding how doctors, nurses, and hospitals actually work day-to-day.
Β  Medical Document Processing CareCopilot Healthcare AI β€” discharge summaries, progress notes, clinical documentation Reading and understanding different types of medical records like when patients leave the hospital or daily care notes.
DevOps & Infrastructure Git All projects β€” version control and collaboration Git keeps track of code changes and lets multiple people work together.
Β  AWS IAM Fraud-Detection MLOps and Invoice Assistant β€” role-based access control IAM is a permission system. It decides who can use which AWS resources.
Β  Cost-optimization strategies Fraud-Detection MLOps, Edenred Invoice Assistant, CareCopilot Healthcare AI, PromptOps Policy Coach β€” turning off endpoints when idle To save money, we shut down cloud resources when they are not being used and restart them only when needed.
Front-end & User Interface React 18 + Vite Digital-Value-Chain β€” modern, responsive e-commerce dashboard React builds interactive web pages, and Vite makes development fast.
Β  HTML / CSS / JavaScript Edenred Invoice Assistant β€” static chat interface These are the basic building blocks of web pages.
Β  Healthcare UI/UX Design CareCopilot Healthcare AI β€” clinical workflow-optimized interface with accessibility Designing interfaces that doctors and nurses can actually use in hospitals, following healthcare design patterns.
Β  Enterprise UI/UX Design PromptOps Policy Coach β€” professional enterprise interface with comprehensive monitoring Creating business applications that look and feel like professional software used in Fortune 500 companies.
LLM Tools & Frameworks LangChain / LangGraph LangChain & LangGraph coursework β€” chain and graph structures for LLMs LangChain and LangGraph help build complex chat flows. They handle prompts, output parsing and memory.
Β  OpenAI Chat models (ChatGPT/GPT-4) LLM coursework & LLM-Engineering app, PromptOps Policy Coach β€” used for text generation and interviews These models chat with users, answer questions and conduct mock interviews.
Β  PEFT / LoRA GRC-LLM β€” parameter-efficient fine-tuning LoRA is a trick to train large models cheaply by adding small adapter layers.
Β  Multi-Framework Prompt Engineering PromptOps Policy Coach β€” CRAFT, CRISPE, Chain-of-Thought, Constitutional AI, ReAct frameworks Different ways to ask AI questions that get better and more consistent answers for business use.

🎯 Portfolio Objectives

This repository serves multiple purposes:

πŸ”¬ Research & Development

Exploring cutting-edge ML techniques, experimenting with new algorithms and implementing research papers to stay current with the latest advancements in AI.

πŸ—οΈ Production-Ready Solutions

Building complete MLOps pipelines that demonstrate enterprise-level practices including automated testing, containerization, CI/CD, monitoring and scalable deployment strategies.

πŸ₯ Healthcare AI Innovation

Developing HIPAA-compliant, FHIR-native AI systems that address real clinical workflows and demonstrate understanding of healthcare technology requirements.

🎯 Enterprise Prompt Engineering

Demonstrating production-grade prompt engineering patterns with multi-framework approaches, cost optimization, and enterprise AI governance suitable for Fortune 500 implementations.

πŸ“š Learning & Growth

Documenting my journey in machine learning, sharing knowledge through well-documented code and contributing to the ML community.

πŸ’Ό Professional Showcase

Demonstrating practical skills in machine learning engineering, data science, healthcare AI, prompt engineering and AI system architecture for potential collaborators and employers.

πŸ₯ CareCopilot - HIPAA-Ready Healthcare AI Platform

Enterprise Healthcare AI System: RAG + FHIR Agent for Clinical Workflows

Production-grade healthcare AI platform combining intelligent document retrieval with automated FHIR conversion, designed specifically for enterprise healthcare environments like PointClickCare’s 30,000+ provider ecosystem.

🎯 Highlights:

πŸ› οΈ Tech Stack: Streamlit, PostgreSQL+pgvector, FHIR R4, Python, AWS EC2, Healthcare NLP, Mock AI Services

# Example RAG Query - Medical Document Search
response = rag_system.query("What medications were prescribed for diabetes?")
print(f"Answer: {response['answer']}")
print(f"Similarity: {response['similarity']}%")
print(f"Source: {response['source_document']}")

# Example FHIR Conversion - Clinical Note to Structured Data
clinical_note = "Patient discharged with pneumonia, prescribed Azithromycin 250mg daily x5 days"
fhir_bundle = fhir_agent.convert_to_fhir(clinical_note)
print(f"Generated {fhir_bundle['total_resources']} FHIR resources")
print(f"Conditions: {fhir_bundle['conditions_detected']}")
print(f"Medications: {fhir_bundle['medications_detected']}")

πŸ“Š Production Performance:

πŸ€– PromptOps Policy Coach - Enterprise Prompt Engineering Platform

Fortune 500-Ready AI System: Multi-Framework Prompt Engineering with Production RAG Pipeline

Enterprise-grade prompt engineering platform demonstrating how Fortune 500 companies implement production-ready AI systems with standardized prompt frameworks, cost optimization, and measurable quality controls for corporate policy Q&A.

🎯 Highlights:

πŸ› οΈ Tech Stack: Streamlit, OpenAI GPT-4o-mini, NumPy, Docker, Google Cloud Shell, Python-dotenv

# Example Multi-Framework Query - Same Question, Different AI Reasoning
from prompt_coach import PolicyCoach

coach = PolicyCoach()
question = "Can I expense my gym membership?"

# CRAFT Framework (Structured Professional)
craft_response = coach.query(question, framework="CRAFT")
print(f"CRAFT: {craft_response['answer']}")

# ReAct Framework (Reasoning + Acting)  
react_response = coach.query(question, framework="ReAct")
print(f"ReAct: {react_response['answer']}")

# Chain of Thought (Step-by-step Analysis)
cot_response = coach.query(question, framework="Chain of Thought")
print(f"CoT: {cot_response['answer']}")

# Performance Metrics
print(f"Response Time: {coach.get_metrics()['avg_response_time']}s")
print(f"Total Cost: ${coach.get_metrics()['total_cost']:.4f}")
print(f"Framework Effectiveness: {coach.compare_frameworks()}")

πŸ“Š Production Performance:

πŸ›‘οΈ GRC Compliance LLM - AI-Powered Compliance Assistant

Production-Ready LoRA Fine-tuning with AWS SageMaker and Cost-Optimized Architecture

Enterprise-grade compliance question-answering system that fine-tunes a TinyLlama 1.1B model using LoRA (Low-Rank Adaptation) for governance, risk and compliance queries across SOC 2, ISO 27001 and HIPAA frameworks.

🎯 Highlights:

πŸ› οΈ Tech Stack: TinyLlama, LoRA/PEFT, AWS SageMaker, Streamlit, PyTorch, Transformers, EC2, S3

# Example Compliance Query
from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

# Load fine-tuned compliance model
model = AutoModelForCausalLM.from_pretrained("TinyLlama/TinyLlama-1.1B-Chat-v1.0")
model = PeftModel.from_pretrained(model, "outputs/compliance-tinyllama-lora")

# Ask compliance question
response = model.generate("Which SOC 2 control covers password requirements?")
# Output: "SOC 2 CC6.1 covers password requirements: organizations must implement complexity, length, and rotation policies."

πŸ“Š Production Performance:

πŸ€– Edenred Invoice Assistant - Production AI Chatbot

End-to-End ML Pipeline: Training to Production with Cost-Optimized AWS SageMaker

Complete production-ready AI chatbot for invoice and payment support, showcasing enterprise-level ML deployment with intelligent cost management and serverless architecture.

🎯 Highlights:

πŸ› οΈ Tech Stack: AWS SageMaker, Lambda, API Gateway, HuggingFace Transformers, Python, HTML/CSS/JS, CloudWatch

# Example API Usage – Production Endpoint with Intelligent Fallbacks
import requests
response = requests.post(
    'https://zg4ja3aub5lvqzsbomo7nrhw7m0rjqms.lambda-url.us-east-1.on.aws/',
    json={'message': 'How do I submit an invoice?'}
)
print(f"AI Response: {response.json()['response']}")

πŸ“Š Production Performance:

🚴 Bike Rental Prediction - MLOps Pipeline

Production-Ready ML System with Full CI/CD

A complete end-to-end MLOps pipeline for predicting hourly bike rental demand, showcasing enterprise-level practices.

🎯 Highlights:

πŸ› οΈ Tech Stack: PyTorch, Flask, Docker, AWS (EC2, ECR), GitHub Actions, NumPy, Pandas

# Example API Usage
import requests
response = requests.post('http://18.233.252.250/predict', json={'features': [0.1] * 53})
print(f"Predicted bike rentals: {response.json()['prediction']}")

πŸ•΅οΈ Fraud Detection β€” Enterprise MLOps with Explainability

Production-Ready Fraud Detection with SHAP/LIME and Cost-Optimized SageMaker Pipeline

Complete end-to-end MLOps pipeline for credit card fraud detection using AWS SageMaker, demonstrating enterprise-level practices with automated deployment, monitoring, model explainability and intelligent cost management for production-ready fraud prevention.

🎯 Highlights:

πŸ› οΈ Tech Stack: XGBoost, SageMaker, SHAP, LIME, Model Registry, CloudWatch, S3, Boto3

# Example Production Pattern – Reactivation Ready
import boto3
runtime = boto3.client('sagemaker-runtime')
response = runtime.invoke_endpoint(
    EndpointName='fraud-detection-endpoint-1755128252',
    ContentType='text/csv',
    Body='0.5,-1.2,0.8,...'  # PCA features
)
result = json.loads(response['Body'].read())
print(f"Fraud probability: {result['probability']:.3f}")
print(f"Decision: {'FRAUD' if result['prediction'] > 0.5 else 'LEGITIMATE'}")

πŸ“Š Production Performance:

🏒 Digital Value Chain β€” Enterprise Serverless E-commerce

Full-Stack Serverless Platform with Cost-Optimized Architecture and AI-Assisted Development

Complete serverless e-commerce platform demonstrating enterprise-level architecture, modern development practices, intelligent cost management and scalable cloud solutions built collaboratively with AI assistants.

🎯 Highlights:

πŸ› οΈ Tech Stack: React 18, AWS Lambda, API Gateway, DynamoDB, AWS SAM, Stripe, Vite, Python

# Example API Usage – Production Endpoints (Reactivation Ready)
import requests
api_base = 'https://f59moopdx0.execute-api.us-east-1.amazonaws.com'
# List all offers
print(requests.get(f'{api_base}/offers').json())
# Create a new offer
print(requests.post(f'{api_base}/offers', json={'sku': 'premium-001', 'name': 'Premium Plan', 'price': 99.99}).json())

πŸ“Š Production Performance & Evidence:

🎭 Sentiment Analysis Web App (Coming Soon)

Real-time sentiment analysis with modern transformers

Web application for analyzing sentiment in text using Hugging Face Transformers, deployed as a scalable REST API.

Planned Features:

πŸ–ΌοΈ Image Classifier on CIFAR-10 (Coming Soon)

CNN-based image classification with MLflow tracking

Deep learning image classifier using PyTorch CNNs with comprehensive model tracking and cloud storage integration.

Planned Features:

πŸ“ˆ Time Series Forecasting (Weather/Energy) (Coming Soon)

LSTM-based forecasting with automated scheduling

Stay tuned for more exciting projects!


πŸ“« Get In Touch


⭐ Star this repository if you find it helpful! Your support motivates me to keep building and sharing innovative ML solutions.