tdigest
t-digest algorithm data structure for accurate quantile estimation with configurable compression parameters
📝Syntax
td = tdigest()
td = tdigest(compression)
td = tdigest(X)
td = tdigest(compression, X)
📥Input Arguments
Parameter Description
compression compression factor: positive integer scalar.
X an array of double, single, integers, ...
📤Output Arguments
Parameter Description
td t-digest representation of the array elements.
📄Description

td = tdigest(compression, X) returns a t-digest representation of the array elements of X.

TDigest is a data structure for accurate on-line accumulation of rank-based statistics such as quantiles and cumulative distribution functions. It is particularly effective for large data sets and for estimating extreme quantiles. The algorithm is described in detail in the paper "Computing Extremely Accurate Quantiles Using t-Digests" by Ted Dunning and Otmar Ertl.

The t-digest is particularly useful for:

The compression factor (100 in the examples) controls the trade-off between accuracy and memory usage - higher values give more accuracy but use more memory.

Once you have a t-digest object, you can add new data points to it using the + operator, and compute percentiles or quantiles using the percentile or quantile methods.

For more details, see the original paper linked in the bibliography.

Methods available:

Properties:

💡Examples
M = rand(1, 15000);
td = tdigest(100, M);
td = td + [1:15000];
td.percentile([5, 50, 95])
td.quantile([0.05 0.5 0.95])
streaming updates
td = tdigest(100);
while(1)
  td = td + randn();
  td.percentile([5, 50, 95])
end
🔗See Also
mean
Used Functions
tdigest algorithm
📚Bibliography
https://www.sciencedirect.com/science/article/pii/S2665963820300403
🕔Version History
Version Description
1.15.0 initial version
Edit this page on GitHub