About

I am currently a Senior Systems Software Engineer on the RAPIDS team at NVIDIA. Most of my work is focused on the development of parallel algorithms for distributed IO and ETL (especially for systems with multiple GPUs). I contribute heavily to both Dask-Dataframe and Dask-CuDF, where I have been responsible for many improvements to the IO, shuffling, and join APIs. I also personally implemented the Multi-GPU NVTabular POC introduced at GTC 2020, and am now spearheading the ongoing development of parallel feature-engineering operations for recommender-system applications.

Most of my recent software contributions can be found in the following libraries:

  • Dask: A flexible parallel-computing library for Python (repo)
  • CuDF & Dask-CuDF: A Pandas-like API for GPU-accelerated DataFrames (repo)
  • NVTabular: A feature engineering and preprocessing library for tabular data (repo, blog)
  • PyNVML: A Python interface to GPU management and monitoring utilities (repo)
  • NVDashboard: A JupyterLab extension for displaying GPU dashboards (repo, blog)

Previous (Research) Experience

Before joining NVIDIA, I worked in the ALCF Data Science group at Argonne National Laboratory (January 2018 - April 2019). My research was focused on the development of parallel I/O algorithms, and software, for extreme-scale computing. Some notable projects include:

Before joining Argonne, I was a postdoctoral researcher (and then staff scientist) in the Theoretical Division at Los Alamos National Laboratory. During my postdoctoral appointment in the Physics and Chemistry of Materials Group, I worked under the supervision of Danny Perez and Arthur Voter, the pioneer of the accelerated molecular dynamics (AMD) approach to long-timescale simulations of interacting atoms. Through my wonderful experience at LANL, I had the opportunity to build a unique expertise in the development of parallel methodologies for high-fidelity materials simulation. Some notable work includes:

  • Parallel Methods for Accelerated Molecular Dynamics (AMD)
    • SpecTAD: Speculatively-parallel temperature-accelerated dynamics (repo)
    • Initial SpecTAD design/development guided by discrete-event simulation (paper)
    • TAD/SpecTAD Review Chapter
    • TAD/SpecTAD/ParTAD Algorithms paper
    • Parallel AMD Review Chapter
  • EXAALT: Exascale Atomistics for Accuracy, Length & Time (repo)
  • Performance Prediction Toolkit (PPT) (repo)

Publicly-accessible Repositories: Github, GitLab, xGitLab