Conceptually, you can think of the watermark as a function, F(P) -> E, which takes a point in processing time and returns a point in event time. That point in event time, E, is the point up to which the system believes all inputs with event times less than E have been observed. In other words, it’s an assertion that no more data with event times less than E will ever be seen again. However, gor many distributed input sources, perfect knowledge of the input data is often impractical, in which case the best option is to provide a heuristic watermark. Heuristic watermarks use whatever information is available about the inputs (partitions, ordering within partitions if any, growth rates of files, etc.) to provide an estimate of progress that is as accurate as possible. Since it's not perfectly accurate, we need to find a way to deal with late data as it comes. We highlight two shortcomings of watermarks: