Ads Keyword Recommender & LLM Personalization (OBMedia)

Role

Lead Data Scientist & ML Engineer

Timeline

6 Months

Team

2 Data Scientists, 1 PM, Backend Engineering Team

My Focus

Architecture Design, End-to-End Pipeline, MLOps

PythonGCPBigQueryVertex AIBERTEmbeddingsCosine SimilarityETL

Business Impact

+5% RPC; +8% ROI

Scale

High-volume ads pipeline

Ads Keyword Recommender & LLM Personalization (OBMedia)

The Challenge

The Challenge: The Scale Bottleneck

Walmart needed to increase average order value by recommending relevant items to millions of users. However, the existing experience was static and generic.

The Bottleneck: Our legacy rule-based system could not scale to the massive catalog volume, leading to missed revenue opportunities.
The Goal: Build a scalable, semantic engine capable of understanding user intent in real-time.

The Architecture

I designed a Two-Tower Recommendation System to capture semantic relationships between users and products:

Data Processing: Utilized BigQuery and PySpark on Dataproc to process billions of historical transaction logs.
Model Logic: Implemented BERT embeddings to create vector representations of items, moving beyond simple keyword matching.
Serving: Deployed the final ranking algorithm (XGBoost) on Vertex AI Endpoints for low-latency real-time scoring.

System Architecture Diagram

graph LR
    A[Data Lake<br/>BigQuery] --> B[Feature Engineering<br/>PySpark/Dataproc]
    B --> C[BERT Embedding<br/>Layer]
    C --> D[Ranking Algorithm<br/>XGBoost + Rules]
    D --> E[Serving Infrastructure<br/>Vertex AI Endpoints]
    E --> F[Walmart.com<br/>Personalization]

    G[A/B Testing<br/>Framework] -.->|Metrics| E
    H[Retraining<br/>Pipeline] -.->|Daily| C

    style A fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style B fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style C fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style D fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style E fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style F fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style G fill:#666,stroke:#444,stroke-width:1px,color:#fff
    style H fill:#666,stroke:#444,stroke-width:1px,color:#fff

The Impact

We successfully shifted from a manual, maintenance-heavy system to an automated AI pipeline.

Metric	Legacy System	New Scale-Aware Engine
Methodology	Manual Rules (Hard to scale)	Deep Learning (BERT + XGBoost)
Personalization	Generic / Segment-based	1:1 Real-Time Personalization
Performance	Baseline	+10% Click-Through Rate
Recall	Limited Context	+25% Recall@K

Collaboration & MLOps

This project required tight alignment between Data Science and Product:

Product Alignment: I worked weekly with Product Managers to translate "user engagement" goals into technical optimization metrics (Recall@K).
Engineering Handoff: I built the A/B testing framework to ensure a safe rollout, working with backend engineers to ensure the API response stayed under 100ms.

Interested in similar solutions?

Let's discuss how we can build scalable ML systems for your business challenges.

View More Case Studies