Index
Why we need Machine Learning for Systems?
(Paraphrase from Jeff Dean's keynote talk in SysML) As we know, traditional low-level systems(e.g. operating systems, compilers, and storage systems) do not make extensive use of machine learning today. However, computer systems are filled with heuristics, which have to work well "in general cases.", but they generally don't adapt to actual patterns of usage and don't take into account available context.
For example, when BigTable receives a request to load data from a disk, it needs to decide whether to cache or not cache a particular block. Obviously, if the client is doing a sequential scan of the data, then he might never re-access that block, but if he is doing random access of the data, he will likely access the same block in the future. For example, jobs like MapReduce are very likely to do a sequential scan. We can not hard-code this type of information into the heuristics, but a learned system might actually take this information into account(e.g. the job name and the user).
In general, anywhere we're using heuristics to make a decision gives us an opportunity for using machine learning instead in an online manner.
Compilers: instruction scheduling, register allocation, loop nest parallelization strategies..
Networking: TCP window size decisions, back-off for retransmits, data compression...
Operating Systems: process scheduling, buffer cache insertion/replacement, file system prefetching...
Job scheduling system: which tasks/VM to co-locate on the same machine, which tasks to pre-empt...
ASIC design: physical circuit layout, test case selection.
And anywhere we have a huge number of tunable command-line flags!
Keys for success in these settings:
Having a numeric metric to measure and optimize
Having a clean interface to easily integrate learning into all these kinds of systems.
Database
The case for learned index structures* - Kraska et al., SIGMOD '18
SageDB: A Learned Database System* - Kraska et al., CIDR '19
Networking
Neural Adaptive Video Streaming with Pensieve - Mao et al., SIGCOMM '17
PCC Vivace: Online-Learning Congestion Control - Dong et al., NSDI '18
Neural Adaptive Content-aware Internet Video Delivery - Yeo et al., OSDI '18
Learning in situ: a randomized experiment in video streaming - Yan et al., NSDI '20
Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning - Kim et al., SIGCOMM '20
Interpreting Deep Learning-Based Networking Systems - Meng et al., SIGCOMM '20
Alohamora: Reviving HTTP/2 Push and Preload by Adapting Policies On the Fly - Kansal et al., NSDI '21
System
Device Placement Optimization with Reinforcement Learning - Mirhoseini et al., ICML '17
Learning Memory Access Patterns* - Hashemi et al., ICML '18
Parity Models: Erasure-Coded Resilience for Prediction Serving Systems - Kosaian et al, SOSP '19
Learning Scheduling Algorithms for Data Processing Clusters - Mao et al., SIGCOMM '19
Learning Relaxed Belady for Content Distribution Network Caching - Song et al., NSDI '20
Learning-based Memory Allocation for C++ Server Workloads - Maas et al., ASPLOS '20
LinnOS: Predictability on Unpredictable Flash Storage with a Light Neural Network - Hao et al, OSDI ' 20
*denotes papers that I plan to read
Last updated