Projects

These are some of the open-source projects I work on outside of research. Some of the topics I am interested in are : programming languages for machine learning, tools for learning and teaching data science, NLP visualization and data exploration, and efficient code for non-matrix based algorithms.

Llama2 Rust
llama2 in rust
github
LLM Training Puzzles
puzzles for learning about distributed training
github
Thinking Like Transformers
learn to think like a transformers
github
GPU-Puzzles
A series of puzzles for learning about the core aspects of modern deep learning coding. Includes puzzles for tensors, gpu's, and auto-differentiation.
github
Annotated S4
Annotated S4 is a pedagogical implementation of the S4 model for very long range sequnece modeling utilizing JAX as a method for explaining mathematically complex code.
github
PromptSource
PromptSource is an IDE for producing natural language prompts on real datasets. It was the basis of the T0 model for large-scale multitask training.
github
Break Through AI
Break Through AI is a free summer program for supporting female undergraduates to learn AI and ML skills in an applied environment. I teach an 8 week summer program on the core elements on ML in a coding first environment.
github
MiniConf
MiniConf is a project developed for ICLR as an easy-to-use tool for hosting fully remote asynchronous virtual conferences. It was heavily used in 2020 to host ACL, ICML, AKBC, AIStats, EMNLP, NeurIPS, and many other virtual conferences.
github
MiniTorch
MiniTorch is a DIY teaching library to walkthrough the process of building a tensor, autodifferentiation library from scratch. It is used to teach machine learning engineering at Cornell Tech.
github
Streambook
Streambook is a literate programming environment designed to make it easy to write publishable Jupyter notebooks without ever having to open a browser or break your github flow.
github
Named Tensor Notation
Named Tensor Notation was a follow-up to the named tensor proposal to develop a mathematical notation for more explicit multi-dimensional dot products when describing neural network interactions.
github
NLP Browser
NLP Browser is a web app that lets any easily browse through more than 150 datasets used in NLP and hosted by Hugging Face. The app is a pretty addictive way to casually learn about new datasets and challenges.
github
NamedTensor (Tensor Considered Harmful)
Named Tensor is a proposal for adding a new datastructure to mathematical libraries to tread tensors more like dicts and less like tuples. This blog post had the impact of getting PyTorch to add a NamedTensor annotation in v1.3 of the libary.
github
Torch Struct
Torch-Struct is a passion project of mine to test out whether deep learning libraries can be used to implement classical structured prediction. It includes heavily-tested reference reimplementations of many core NLP algorithms.
github
OpenNMT
A full service open-source neural machine translation system. Originally developed in Lua with Systran, since ported to PyTorch and TensorFlow and maintained externally.
github
The Annotated Transformer
The annotated transformer was an experiment in blogging based on literate papers. The idea was to teach researchers how an important model in NLP works by aligning the paper line-by-line with an implementation. The blog post was widely distributed, and there have been many follow-ups for new model.
github