Paper published by Google: Large-scale Incremental Processing Using Distributed Transactions and Notifications.

Percolator is a system to handle small incremental updates on very large dataset, the cited example being Google indexing system in which few pages change every day compared to the whole set of indexed pages. Processing that kind of data with chained map/reduce operations requires processing time proportional to the size of the dataset, while percolator allows for processing time proportional to the difference. Still it does not require millisecond latency to scale well. It also supports ACID transactions so that it maintains repository invariants in a highly concurrent system.

It is based on Bigtable column oriented storage. Since BigTable itself only support single row transactions, a multirow transactional system by using additional column is built inside it (leveraging the fact that that sparse rows are cheap). The work is split into observers that run in a chain, taking input from the output of previous observers and putting their own result in other columns (this scheme is not strictly required, but the system works best when using it). Observers are triggered asynchronously when workers browsing the repository detect change in their input columns.