Tyler
Johnson

Graduate Student
Machine Learning

318 Allen Center
University of Washington

Overview

I am a PhD student at the University of Washington, working with Carlos Guestrin on machine learning problems. I plan to graduate at the end of summer.

I research optimization algorithms for scalable machine learning. My work focuses on practical but principled ways to train models faster by exploiting the problem's structure.

For my most recent project, I accelerated the training of deep models by adaptively prioritizing challenging training instances. Before that, I designed a fast and elegant algorithm for exploiting the sparsity of solutions in sparse optimization problems. You may like using my BlitzML solver to solve your Lasso-type and linear SVM problems.

During graduate school, I completed some interesting internships. At Amazon, I prototyped an important tool for broad use within the company. The project moved into production after the internship. At GM, I developed some ideas for autonomous vehicles near my hometown in Michigan.

Before graduate school, I studied engineering at the University of Michigan. There I worked on some research projects with Clayton Scott.

Publications

Training Deep Models Faster with Robust, Approximate Importance Sampling.

Tyler B. Johnson and Carlos Guestrin.

In preparation.

In theory, importance sampling speeds up stochastic gradient algorithms for supervised learning by prioritizing training instances. In practice, the cost of computing importances greatly limits the impact of importance sampling. We propose a robust, approximate importance sampling procedure (RAIS) for stochastic gradient descent. By approximating the ideal sampling distribution using robust optimization, RAIS provides much of the benefit of exact importance sampling with drastically reduced overhead. Empirically, we find RAIS-SGD and standard SGD follow similar learning curves, but RAIS moves faster through these paths, achieving speed-ups of at least 20% and as much as 2x in our comparisons. RAIS requires only simple changes to SGD, and the procedure depends minimally on hyperparameters.

A Fast, Principled Working Set Algorithm for Exploiting Piecewise Linear Structure in Convex Optimization.

Tyler B. Johnson and Carlos Guestrin.

In preparation.

StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent.

Tyler B. Johnson and Carlos Guestrin.

International Conference on Machine Learning, 2017.

Coordinate descent (CD) is a scalable and simple algorithm for solving many optimization problems in machine learning. Despite this fact, CD can also be very computationally wasteful. Due to sparsity in sparse regression problems, for example, often the majority of CD updates result in no progress toward the solution. To address this inefficiency, we propose a modified CD algorithm named "StingyCD." By skipping over many updates that are guaranteed to not decrease the objective value, StingyCD significantly reduces convergence times. Since StingyCD only skips updates with this guarantee, however, StingyCD does not fully exploit the problem's sparsity. For this reason, we also propose StingyCD+, an algorithm that achieves further speed-ups by skipping updates more aggressively. Since StingyCD and StingyCD+ rely on simple modifications to CD, it is also straightforward to use these algorithms with other approaches to scaling optimization. In empirical comparisons, StingyCD and StingyCD+ improve convergence times considerably for L1-regularized optimization problems.
@inproceedings{johnson2017stingycd,
author = {Tyler B. Johnson and Carlos Guestrin},
title = {StingyCD: Safely Avoiding Wasteful Updates in Coordinate Descent},
booktitle = {International Conference on Machine Learning},
year = {2017}
}

Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization.

Tyler B. Johnson and Carlos Guestrin.

Advances in Neural Information Processing Systems 29, 2016.

We develop methods for rapidly identifying important components of a convex optimization problem for the purpose of achieving fast convergence times. By considering a novel problem formulation—the minimization of a sum of piecewise functions—we describe a principled and general mechanism for exploiting piecewise linear structure in convex optimization. This result leads to a theoretically justified working set algorithm and a novel screening test, which generalize and improve upon many prior results on exploiting structure in convex optimization. In empirical comparisons, we study the scalability of our methods. We find that screening scales surprisingly poorly with the size of the problem, while our working set algorithm convincingly outperforms alternative approaches.
@inproceedings{johnson2016piecewise,
author = {Tyler B. Johnson and Carlos Guestrin},
title = {Unified Methods for Exploiting Piecewise Linear Structure in Convex Optimization},
booktitle = {Advances in Neural Information Processing Systems 29},
year = {2016}
}

Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization.

Tyler B. Johnson and Carlos Guestrin.

International Conference on Machine Learning, 2015.

By reducing optimization to a sequence of small subproblems, working set methods achieve fast convergence times for many challenging problems. Despite excellent performance, theoretical understanding of working sets is limited, and implementations often resort to heuristics to determine subproblem size, makeup, and stopping criteria. We propose Blitz, a fast working set algorithm accompanied by useful guarantees. Making no assumptions on data, our theory relates subproblem size to progress toward convergence. This result motivates methods for optimizing algorithmic parameters and discarding irrelevant variables as iterations progress. Applied to L1-regularized learning, Blitz convincingly outperforms existing solvers in sequential, limited-memory, and distributed settings. Blitz is not specific to L1-regularized learning, making the algorithm relevant to many applications involving sparsity or constraints.
@inproceedings{johnson2015blitz,
author = {Tyler B. Johnson and Carlos Guestrin},
title = {Blitz: A Principled Meta-Algorithm for Scaling Sparse Optimization},
booktitle = {International Conference on Machine Learning},
year = {2015}
}

Software

Use my BlitzML package to quickly solve Lasso, sparse logistic regression, and linear SVM problems.

Teaching

I have enjoyed helping teach the following courses: