Welcome to devinterp’s documentation!
DevInterp is a python library for conducting research on developmental interpretability, which is a novel AI safety research agenda rooted in Singular Learning Theory (SLT). DevInterp proposes tools for detecting, locating, and ultimately controlling the development of structure over training.
Read more about developmental interpretability here!
For an overview of papers that either used or inspired this package, click here.
For questions, it’s easiest to join the DevInterp discord!
Warning
This library is still in early development. Don’t expect things to work on a first attempt. We are actively working on improving the library and adding new features.
Installation
To install devinterp
, simply run pip install devinterp
.
Requirements: Python 3.8 or higher.
Minimal Example
from devinterp.slt import sample, LLCEstimator
from devinterp.optim import SGLD
# Assuming you have a PyTorch Module and DataLoader
llc_estimator = LLCEstimator(..., temperature=optimal_temperature(trainloader))
sample(model, trainloader, ..., callbacks = [llc_estimator])
llc_mean = llc_estimator.sample()["llc/mean"]
Examples
To see DevInterp in action, check out our example notebooks:
For more advanced usage, see the Diagnostics notebook and for a quick guide on picking hyperparameters, see the calibration notebook.
Known Issues
LLC Estimation is currently more of an art than a science. It will take some time and pain to get it work reliably.
If you run into issues not mentioned here, please first check the GitHub issues, then ask in the DevInterp discord, and only then make a new github issue.
Credits & Citations
This package was created by Timaeus. The main contributors to this package are Stan van Wingerden, Jesse Hoogland, and George Wang. Zach Furman, Matthew Farrugia-Roberts, William Zhou, Rohan Hitchcock and Edmund Lau also made valuable contributions or provided useful advice.
If this package was useful in your work, please cite it as:
@misc{devinterp2024,
title = {DevInterp},
author = {Stan van Wingerden, Jesse Hoogland, and George Wang},
year = {2024},
howpublished = {\url{https://github.com/timaeus-research/devinterp}},
}
Table of Contents
- Welcome to devinterp’s documentation!
- SLT Observables
- Sampling Methods
- Utils & Visualisation Methods