The IID sampling of the training data is important to ensure that the stochastic gradient is an unbiased estimate of the full gradient. Worded differently, having IID data at the clients means that each mini-batch of data used for a client's local update is statistically identical to a uniformly drawn sample(with replacement) from the entire training dataset, which is the union of all local datasets at the clients). In practice, it is unrealistic to assume that the local data on each edge device is always IID. More specifically: