Youngbin Kim

LLM Engineer / Data Scientist (Clinical AI)

View my work
// about

About

Who I Am

I'm a Senior Data Scientist at NewYork-Presbyterian, where I was the first dedicated LLM hire. I build clinical NLP pipelines within HIPAA-compliant LLM infrastructure (local and cloud) and design agentic workflows that help clinicians make better decisions. I earned my PhD in Biomedical Engineering from Columbia University in 2024, focusing on machine learning approaches to cardiac signal analysis in disease models using stem cell derived engineered heart tissues.

Before NYP and Columbia, I interned at Genentech building ML models and studied Bioengineering and EECS at UC Berkeley. I've published 9 peer-reviewed papers during my PhD and built BeatProfiler, an open-source ML platform for cardiac analysis adopted by external researchers. When I'm not wrangling data, you'll find me singing with the Young New Yorkers' Chorus, biking across the city, or planning my next trip.

What I Do

  • LLM Engineering

    HIPAA-compliant infrastructure, LangGraph pipelines, agentic workflows

  • Clinical NLP

    Medical text extraction, clinical decision support, EHR integration

  • Cardiac Signal Analysis

    ML for ECG-like/calcium transient signals, BeatProfiler platform

  • Data Science

    Python, deep learning, statistical modeling

// experience

Experience

2024 – Present

Senior Data Scientist

New York-Presbyterian

First dedicated LLM hire. Built clinical NLP pipelines, HIPAA-compliant LLM infrastructure, and agentic workflows.

2019 – 2024

PhD, Biomedical Engineering

Columbia University

ML for cardiac signal analysis. Created BeatProfiler. 9 peer-reviewed publications including IEEE.

2022

Machine Learning Intern

Genentech

Built multimodal ML models for biomarker discovery from Alzheimer's drug clinical trial data.

2015 – 2019

BS, Bioengineering & EECS

UC Berkeley

Foundation in engineering and computer science.

// projects

Projects

ChartExtract

In Development

Extracting structured data from clinical documents like discharge summaries, operative notes, and pathology reports is one of the most labor-intensive tasks in healthcare. ChartExtract is an agentic platform that lets clinical staff design, test, and optimize multi-step extraction pipelines through a visual interface and conversational AI instead of hand-building brittle review workflows.

Pipeline Builder Workflow

Design Loop
Test edge
Revision edge

Final Pipeline

Example generated DAG

ChainOfThought

Cancer Type Extraction

Extracts cancer type from clinical text (pathology report, discharge summary).

inputs

clinical_text

outputs

cancer_type
if cancer_type = "breast cancer"
ChainOfThought

Biomarker Extraction

Extracts HER2, ER, PR receptor status from pathology report for breast cancer cases.

inputs

clinical_textcancer_type

outputs

her2_statuser_statuspr_status
Workbench Loop
Optimize path
Optimize Loop
Skip optimization
Improved pipeline

BeatProfiler

Open Source

An end-to-end machine learning platform for automated cardiac signal analysis. BeatProfiler takes raw contractile videos or calcium transient / field potential recordings from stem cell-derived cardiomyocytes, processes them through automated pipelines, and classifies disease phenotypes — turning hours of manual analysis into seconds.

500+ downloads 📄 IEEE Published

Processing Pipeline

Calcium Transient Signal

AmplitudeTime