Index
- Cache the results to hide the network delivery and server processing latency
- Only send the frames that are largely different from the previous frames
- Starfish: Efficient Concurrency Support for Computer Vision Applications - LiKamWa et al., MobiSys '15
- Track identical library calls and reuse computed results across multiple applications
- Group cameras monitoring the same area into clusters
- Uploads frames with high “utility”(e.g., object count)
- Uploads frames that are different from previous frames(e.g., different object counts)
- MCDNN: An Approximation-Based Execution Framework for Deep Stream Processing Under Resource Constraints - Han et al., MobiSys '16
- Adaptively pick the best specialized model
- DeepEye: Resource Efficient Local Execution of Multiple Deep Vision Models using Wearable Commodity Hardware - Mathut et al., MobiSys '17
- A system that can run multiple cloud-scale DL models locally on wearable devices
- Interleave the loading of memory-intensive FC layers and the execution of compute-intensive convolution layers
- Leverage the short-term class skew using model cascade
- Train specialized video online
- Model cascade: difference detector(MSE) → cheap/specialized model → full model
- Objective: support efficient real-time analytics for multiple queries which have different quality and lag goals
- Offline Phase: use profiler to get a set of pareto-optimal configurations(a combination of knobs) from resource-quality space (with a variant of greedy hill-climbing)
- Online Phase: periodically change running queries’ configurations/placement/resource allocation to maximize total utility(quality + lag goals)
- Use edge server as a cache with compute resources(similar to CDN)
- Neurosurgeon: collaborative intelligence between the cloud and the mobile edge - Kang et al., ASPLOS '17 [Morning Paper Summary]
- Observed that 1) data transfer latency is often higher than mobile computation latency, especially on wireless networks. 2) inside a model, data size is decreasing at the front-end whereas per-layer latency is higher at the back-end.
- NOTE: 2) isn't necessarily true for recent networks with global average pooling
- Store videos as tables which are optimized for frame sampling on compressed videos
- Express frame operations as dataflow graphs
- Transfer learning → execute common layers only once
- Processing more frames with shared DNN vs. greater per-frame accuracy with specialized DNN
- Resource-accuracy tradeoff is affected by some persistent characteristics, so we can reuse configurations over time → temporal correlation
- Video cameras with the same characteristics share the same best configurations → cross-camera correlations
- Configuration knobs independently impact accuracy → reduce search space
- Divide cameras into groups → periodically re-profile “leader” videos
- Objective: low latency and high accuracy stream processing in WAN
- Ask programmers to write degradation functions and profile those configurations
- Adaptively change the configuration at runtime → react to congestions
- Enable low-latency and low-cost querying over large historical video datasets.
- At ingest time: classify objects using a cheap CNN, cluster similar objects(KNN search), and index each cluster using top-K most confident classification results.
- At query-time: looks up the ingest index for cluster centroids that match the class and classifies them using expensive CNN.
- On-Demand Deep Model Compression for Mobile Devices: A Usage-Driven Model Selection Framework - Liu et al., MobiSys '18
- Adaptively select DNN compression techniques based on user demand(Acc/Storage/Comp cost/Latency/Energy)
- novel straggler mitigation strategy
- Potluck: Cross-Application Approximate Deduplication for Computation-Intensive Mobile Applications - Guo et al., ASPLOS '18
- Objective: Indexing and query optimization for VDMS(For complex queries like join)
- A novel model for encoding, indexing and storing lineage
- Proposes a new “camera cluster” abstraction
- Saving computing resource
- Resource Pooling
- Improving analytics quality
- Hiding low-level intricacies
- Cracking open the DNN black-box: Video Analytics with DNNs across the Camera-Cloud Boundary - Emmons et al., HotEdgeVideo '19
- Split-brain inference
- Assumption: relevant events are rare.
- Filter frames by using a micro, binary classifier that extract feature maps from base DNN
- Down-sampling images are sometimes beneficial in terms of accuracy(e.g., removing background noise)
- Adaptively scaling video to improve both speed and accuracy of object detectors
- Dynamic RoI Encoding: decrease the encoding quality of uninterested areas(use the last processed frame as heuristic)
- (Dependency-aware) Parallel streaming and inference: divide frames into slices and parallelize the processing between slices
- Leverage cross-camera correlations to reduce resource usage and achieve higher inference accuracy
- An auto-generated benchmark that evaluates the performance of VDBMS
- Let users place an arbitrary number of cameras, each with configurable position, resolution, and field of view
- Composite queries and automatically generated ground truth labels
- Objective: Support (approximate) aggregate and limit queries over large video dataset
- At ingest time, run object detection on small samples of frames and store them
- For each query, use them to train a query-specific proxy model
- Iterative video processing driven by server-side DNN
- Reducto: On-Camera Filtering for Resource-Efficient Real-Time Video Analytics - Li et al., SIGCOMM '20
- Dynamically adapts filtering decisions based on feature type, threshold, etc.
- A system that let users generalize to unbounded vocabularies without manual retraining