## Overview

I am a PhD student at the University of Washington,
working with Carlos Guestrin
on machine learning problems.
I plan to defend my dissertation in October.

Before graduate school, I studied engineering at the University of Michigan.
There I worked on some research projects with Clayton Scott.

In November, I will start a job at Apple!

## Publications

Training Deep Models Faster with Robust, Approximate Importance Sampling.

Tyler B. Johnson and Carlos Guestrin.

*NIPS*, 2018.

In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training instances. In practice, the cost of computing importances greatly limits the impact of importance sampling. We propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient descent. By approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead. Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and sometimes more than 2x in our comparisons. RAIS requires only simple changes to SGD, and the procedure depends minimally on hyperparameters.

@inproceedings{johnson2018rais,

author = {Tyler B. Johnson and Carlos Guestrin},

title = {Training Deep Models Faster with Robust, Approximate Importance Sampling},

booktitle = {Advances in Neural Information Processing Systems 31},

year = {2018}

}

A Fast, Principled Working Set Algorithm for Exploiting Piecewise Linear Structure in Convex Problems.

Tyler B. Johnson and Carlos Guestrin.

Preprint.

By reducing optimization to a sequence of smaller subproblems, working set algorithms achieve fast convergence times for many machine learning problems. Despite such performance, working set implementations often resort to heuristics to determine subproblem size, makeup, and stopping criteria. We propose BlitzWS, a working set algorithm with useful theoretical guarantees. Our theory relates subproblem size and stopping criteria to the amount of progress during each iteration. This result motivates strategies for optimizing algorithmic parameters and discarding irrelevant components as BlitzWS progresses toward a solution. BlitzWS applies to many convex problems, including training L1-regularized models and support vector machines. We showcase this versatility with empirical comparisons, which demonstrate BlitzWS is indeed a fast algorithm.

@inproceedings{johnson2018blitzws,

author = {Tyler B. Johnson and Carlos Guestrin},

title = {A Fast, Principled Working Set Algorithm for Exploiting Piecewise Linear Structure in Convex Problems},

howpublished = {arXiv:1807.08046},

year = {2018}

}

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent.

Tyler B. Johnson and Carlos Guestrin.

*ICML*, 2017.

Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning.
Despite this fact, CD can also be very computationally wasteful.
Due to sparsity in sparse regression problems, for example, often the majority of CD updates result in no progress toward the solution.
To address this inefficiency, we propose a modified CD algorithm named "StingyCD."
By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times.
Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem's sparsity.
For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively.
Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization.
In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for L1-regularized optimization problems.

@inproceedings{johnson2017stingycd,

author = {Tyler B. Johnson and Carlos Guestrin},

title = {StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent},

booktitle = {Proceedings of the 34th International Conference on Machine Learning},

year = {2017}

}

Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization.

Tyler B. Johnson and Carlos Guestrin.

*NIPS*, 2016.

We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulation—the minimization of a sum of piecewise functions—we describe a principled and general mechanism for exploiting piecewise linear structure in convex optimization. This result leads to a theoretically justified working set algorithm and a novel screening test, which generalize and improve upon many prior results on exploiting structure in convex optimization. In empirical comparisons, we study the scalability of our methods. We find that screening scales surprisingly poorly with the size of the problem, while our working set algorithm convincingly outperforms alternative approaches.

@inproceedings{johnson2016piecewise,

author = {Tyler B. Johnson and Carlos Guestrin},

title = {Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization},

booktitle = {Advances in Neural Information Processing Systems 29},

year = {2016}

}

Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization.

Tyler B. Johnson and Carlos Guestrin.

*ICML*, 2015.

By reducing optimization to a sequence of small subproblems, working set methods achieve fast convergence times for many challenging problems. Despite excellent performance, theoretical understanding of working sets is limited, and implementations often resort to heuristics to determine subproblem size, makeup, and stopping criteria. We propose Blitz, a fast working set algorithm accompanied by useful guarantees. Making no assumptions on data, our theory relates subproblem size to progress toward convergence. This result motivates methods for optimizing algorithmic parameters and discarding irrelevant variables as iterations progress. Applied to L1-regularized learning, Blitz convincingly outperforms existing solvers in sequential, limited-memory, and distributed settings. Blitz is not specific to L1-regularized learning, making the algorithm relevant to many applications involving sparsity or constraints.

@inproceedings{johnson2015blitz,

author = {Tyler B. Johnson and Carlos Guestrin},

title = {Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization},

booktitle = {Proceedings of the 32nd International Conference on Machine Learning},

year = {2015}

}