Random Notes
  • Introduction
  • Reading list
  • Theory
    • Index
      • Impossibility of Distributed Consensus with One Faulty Process
      • Time, Clocks, and the Ordering of Events in a Distributed System
      • Using Reasoning About Knowledge to analyze Distributed Systems
      • CAP Twelve Years Later: How the “Rules” Have Changed
      • A Note on Distributed Computing
  • Operating System
    • Index
  • Storage
    • Index
      • Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks
      • Exploiting Commutativity For Practical Fast Replication
      • Don’t Settle for Eventual: Scalable Causal Consistency for Wide-Area Storage with COPS
      • Building Consistent Transactions with Inconsistent Replication
      • Managing Update Conflicts in Bayou, a Weakly Connected Replicated Storage System
      • Spanner: Google's Globally-Distributed Database
      • Bigtable: A Distributed Storage System for Structured Data
      • The Google File System
      • Dynamo: Amazon’s Highly Available Key-value Store
      • Chord: A Scalable Peer-to-peer Lookup Service for Internet Applications
      • Replicated Data Consistency Explained Through Baseball
      • Session Guarantees for Weakly Consistent Replicated Data
      • Flat Datacenter Storage
      • Small Cache, Big Effect: Provable Load Balancing forRandomly Partitioned Cluster Services
      • DistCache: provable load balancing for large-scale storage systems with distributed caching
      • Short Summaries
  • Coordination
    • Index
      • Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases
      • Paxos made simple
      • ZooKeeper: Wait-free coordination for Internet-scale systems
      • Just Say NO to Paxos Overhead: Replacing Consensus with Network Ordering
      • Keeping CALM: When Distributed Consistency is Easy
      • In Search of an Understandable Consensus Algorithm
      • A comprehensive study of Convergent and Commutative Replicated Data Types
  • Fault Tolerance
    • Index
      • The Mystery Machine: End-to-end Performance Analysis of Large-scale Internet Services
      • Gray Failure: The Achilles’ Heel of Cloud-Scale Systems
      • Capturing and Enhancing In Situ System Observability for Failure Detection
      • Check before You Change: Preventing Correlated Failures in Service Updates
      • Efficient Scalable Thread-Safety-Violation Detection
      • REPT: Reverse Debugging of Failures in Deployed Software
      • Redundancy Does Not Imply Fault Tolerance
      • Fixed It For You:Protocol Repair Using Lineage Graphs
      • The Good, the Bad, and the Differences: Better Network Diagnostics with Differential Provenance
      • Lineage-driven Fault Injection
      • Short Summaries
  • Cloud Computing
    • Index
      • Improving MapReduce Performance in Heterogeneous Environments
      • CLARINET: WAN-Aware Optimization for Analytics Queries
      • MapReduce: Simplified Data Processing on Large Clusters
      • Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks
      • Resource Management
      • Apache Hadoop YARN: Yet Another Resource Negotiator
      • Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center
      • Dominant Resource Fairness: Fair Allocation of Multiple Resource Types
      • Large-scale cluster management at Google with Borg
      • MapReduce Online
      • Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling
      • Reining in the Outliers in Map-Reduce Clusters using Mantri
      • Effective Straggler Mitigation: Attack of the Clones
      • Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing
      • Discretized Streams: Fault-Tolerant Streaming Computation at Scale
      • Sparrow: Distributed, Low Latency Scheduling
      • Making Sense of Performance in Data Analytics Framework
      • Monotasks: Architecting for Performance Clarity in Data Analytics Frameworks
      • Drizzle: Fast and Adaptable Stream Processing at Scale
      • Naiad: A Timely Dataflow System
      • The Dataflow Model:A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale
      • Interruptible Tasks:Treating Memory Pressure AsInterrupts for Highly Scalable Data-Parallel Program
      • PACMan: Coordinated Memory Caching for Parallel Jobs
      • Multi-Resource Packing for Cluster Schedulers
      • Other interesting papers
  • Systems for ML
    • Index
      • A Berkeley View of Systems Challenges for AI
      • Tiresias: A GPU Cluster Managerfor Distributed Deep Learning
      • Gandiva: Introspective Cluster Scheduling for Deep Learning
      • Workshop papers
      • Hidden Technical Debt in Machine Learning Systems
      • Inference Systems
      • Parameter Servers and AllReduce
      • Federated Learning at Scale - Part I
      • Federated Learning at Scale - Part II
      • Learning From Non-IID data
      • Ray: A Distributed Framework for Emerging AI Applications
      • PipeDream: Generalized Pipeline Parallelism for DNN Training
      • DeepXplore: Automated Whitebox Testingof Deep Learning Systems
      • Distributed Machine Learning Misc.
  • ML for Systems
    • Index
      • Short Summaries
  • Machine Learning
    • Index
      • Deep Learning with Differential Privacy
      • Accelerating Deep Learning via Importance Sampling
      • A Few Useful Things to Know About Machine Learning
  • Video Analytics
    • Index
      • Scaling Video Analytics on Constrained Edge Nodes
      • Focus: Querying Large Video Datasets with Low Latency and Low Cost
      • NoScope: Optimizing Neural Network Queriesover Video at Scale
      • Live Video Analytics at Scale with Approximation and Delay-Tolerance
      • Chameleon: Scalable Adaptation of Video Analytics
      • End-to-end Learning of Action Detection from Frame Glimpses in Videos
      • Short Summaries
  • Networking
    • Index
      • Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport
      • Learning in situ: a randomized experiment in video streaming
      • Short Summaries
  • Serverless
    • Index
      • Serverless Computing: One Step Forward, Two Steps Back
      • Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads
      • SAND: Towards High-Performance Serverless Computing
      • Pocket: Elastic Ephemeral Storage for Serverless Analytics
      • Fault-tolerant and Transactional Stateful Serverless Workflows
  • Resource Disaggregation
    • Index
  • Edge Computing
    • Index
  • Security/Privacy
    • Index
      • Differential Privacy
      • Honeycrisp: Large-Scale Differentially Private Aggregation Without a Trusted Core
      • Short Summaries
  • Misc.
    • Index
      • Rate Limiting
      • Load Balancing
      • Consistency Models in Distributed System
      • Managing Complexity
      • System Design
      • Deep Dive into the Spark Scheduler
      • The Actor Model
      • Python Global Interpreter Lock
      • About Research and PhD
Powered by GitBook
On this page
  • Datacenter Networks
  • Architecture
  • RDMA
  • Kernel
  • Programmable Networks
  • Wide Area Networks
  • Video Streaming
  • Misc

Was this helpful?

  1. Networking

Index

PreviousShort SummariesNextSalsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport

Last updated 3 years ago

Was this helpful?

Datacenter Networks

Architecture

  • - Al-Fares et al., SIGCOMM '08

RDMA

  • Anuj Kalia's provides a good overview of emerging network hardware/technology

  • - Mitchell et al., ATC '13

  • - Dragojević et al., NSDI '14

  • - Kalia et al., ATC '16

  • - Mittal et al., SIGCOMM '18

  • - Kalia et al., NSDI '19

  • - Kim et al., NSDI '19

Kernel

  • - Rizzo ATC '12

  • - Belay et al., OSDI '14

  • - Peter et al., OSDI '14

  • - Jeong et al., NSDI' 14

  • - Cai et al., SIGCOMM '21

Programmable Networks

    • Discussed what the in-network computing should be used for

    • Automattically partitions an input software middlebox into a P4 program that runs on a programmable switch and an x86 non-offloaded program that runs on a regular server

    • Allows NFs on programmable switches to look up large virtual address built on external DRAM.

    • Key ideas: RDMA + bounded linear probing

Wide Area Networks

Video Streaming

    • Proposes a tightly coupled codec and transport protocol

    • Exploits its codec's ability to save and restore its internal state

    • Three options when sending the next frame(lower quality frame/higher quality frame/skip)

    • Never send a frame unless the network is ready

    • Retransmit low-quality frames during high bandwidth period to improve QoE for delayed viewers

Misc

- Liu et al., ASPLOS '17

- Jin et al., SOSP '17

- Jin et al., NSDI '18

- Ming et al., SIGCOMM '19

- Ming et al., ATC '19

- Ports et al., HotOS' 19

- OSDI' 20

- Zhang et al., SIGCOMM '20

- Kim et al., SIGCOMM '20

- Sapio et al., NSDI '21

- Lao et al., NSDI '21

- Jiang et al., CoNEXT '12

- Huang et al., SIGCOMM '14

- Mao et al., SIGCOMM '17

- Fouladi et al., NSDI '18

- Yeo et al., OSDI '18

- Ray et al., SIGCOMM '19

- Yan et al., NSDI' 20

- Kim et al., SIGCOMM '20

- Clark SIGCOMM '88

- Feamster et al., CCR '14

A scalable, commodity, data center network architecture
PhD thesis
Using One-Sided RDMA Reads to Build a Fast, CPU-Efficient Key-Value Store
FaRM: Fast Remote Memory
Design Guidelines for High Performance RDMA Systems
Revisiting Network Support for RDMA
Datacenter RPCs can be General and Fast
FreeFlow: Software-based Virtual RDMA Networking for Containerized Clouds
netmap: a novel framework for fast packet I/O
IX: A protected dataplane operating system for high throughput and low latency
Arrakis: the operating system is the control plane
mTCP: a Highly Scalable User-level TCP Stack for Multicore Systems
Understanding Host Network Stack Overheads
IncBricks: Toward In-Network Computation with an In-Network Cache
NetCache: Balancing key-value stores with fast in-network caching
NetChain: Scale-Free Sub-RTT Coordination
Offloading Distributed Applications onto SmartNICs using iPipe
E3: Energy-Efficient Microservices on SmartNIC-Accelerated Servers
When Should The Network Be The Computer?
Pegasus: Tolerating Skewed Workloads in Distributed Storage with In-Network Coherence Directories
Gallium: Automated Software Middlebox Offloading to Programmable Switches
TEA: Enabling State-Intensive Network Functions on Programmable Switches
Scaling Distributed Machine Learning with In-Network Aggregation
ATP: In-network Aggregation for Multi-tenant Learning
Improving Fairness, Efficiency, and Stability in HTTP-based Adaptive Video Streaming with FESTIVE
A Buffer-Based Approach to Rate Adaptation: Evidence from a Large Video Streaming Service
Neural Adaptive Video Streaming with Pensieve
Salsify: Low-Latency Network Video through Tighter Integration between a Video Codec and a Transport Protocol
Neural Adaptive Content-aware Internet Video Delivery
Vantage: optimizing video upload for time-shifted viewing of social live streams
Learning in situ: a randomized experiment in video streaming
Neural-Enhanced Live Streaming: Improving Live Video Ingest via Online Learning
The Design Philosophy of the DARPA Internet Protocols
The Road to SDN: An Intellectual History of Programmable Networks