Research Engineer & Machine Learning Engineer
Research engineer focused on deep learning and scalable ML infrastructure. I'm drawn to the unsolved parts of the field, where the science is still being written and the engineering hasn't caught up yet, and building the systems that make those ideas real.
Skills
Experience
- Currently researching YORO, a novel LLM architecture that runs the main reasoning block once and reuses its output across all generated tokens, replacing repeated full-model passes with a lightweight auxiliary network
- Built a pipelined layer-streaming system enabling full LLM inference at a fraction of the model's VRAM footprint by overlapping disk, CPU, and GPU transfers in parallel
- Developed a runtime correctness checker for CUDA kernels using outlier-biased sampling with zero training graph impact
- Implemented a GPU-accelerated deep learning framework in Mojo with custom autograd and explicit GPU kernel implementations
- Built a distributed LLM inference system routing OpenAI-compatible requests across llama.cpp nodes with automatic model distribution and mutual TLS security
- Built agentic AI pipelines end-to-end, including monitoring and observability
- Played a major role in the company's agentic AI transition, contributing to technical direction and architecture decisions
- Implemented embedding model-based semantic indexing and retrieval systems
- Actively consulted on AI and data science decisions, shaping how the team approached and integrated intelligent systems
- Handled massive data volumes across distributed systems, ensuring data quality, feature reliability, and pipeline integrity that models directly depended on
- Actively consulted the ML team on causality analysis algorithms and contributing technical input throughout the model design and development process
- Managed complex DevOps workflows and large-scale deployment pipelines for global production systems
- Architected a scalable, production-grade system from scratch, owning the full pipeline from design through deployment
Education
Thesis:
Efficient Deep Single-Image Super-Resolution on Mobile
Devices
A study on deep learning-based methods for efficient mobile
SISR across multiple upscaling factors (×2, ×3, ×4), optimizing
first for PSNR then perceptual loss. Architectures were kept
deliberately shallow to minimize inference time, compensating
through increased width to maximize GPU parallelism.
Thesis:
Resource Streaming using a Peer-to-Peer Architecture
Proposed a decentralized P2P resource streaming framework with
no imposed hierarchy, eliminating single points of failure and
reducing privacy risk. The architecture uses locality-aware
distributed hash tables (LDHTs) for per-node network state and a
custom epidemic protocol for network-wide information
propagation.
Certifications