- Troubleshooting
- Resources
Static Drift Operator
Given dimension and population metrics, compute a timeseries of drift metrics that measures the dimensional drift of the current data compared to a static baseline.
This operator requires two inputs: a timeseries dataset of new data and baseline dataset without a time component.
The dimension values can be encoded in two ways: enumeration or linear buckets.
-
The enumeration encoder should be used for string values or low-cardinality numeric values. New values encountered in the current data are always mapped the same integer, which corresponds to the other category.
-
The linear buckets encoder should be used for numerical values that have high cardinality. The linear buckets encoder maps each dimension value to a bucket index. All values less than the smallest bucket are mapped to the smallest bucket. All values larger than the largest bucket are mapped to the largest bucket.
At each timestamp, the operator computes a normalized EMD of the distribution of dimension values of the current data compared to the baseline data.
The EMD distance function depends on the encoding:
-
For enumeration, the distance used is always 1 / number of unique values.
-
For bucketing, the distance used is the normalized distance between the buckets.
Inputs
targetProperty | description |
---|---|
currentData | data frame with columns |
baselineData | A data frame with columns |
Outputs
outputName | description |
---|---|
driftScoreData | The resulting data frame of drift metrics. The data frame has columns |
name | description |
---|---|
monitoringGranularity | The time granularity of the output timeseries. |
encoder | The encoder parameters json object. |
Encoder parameters
name | description |
---|---|
type | The type of encoder. One of |
linearBucket | The linear bucket parameters json object. |
Linear bucket parameters
name | description |
---|---|
bucketStart | The value corresponding to the first bucket. |
bucketSize | The size of each bucket. |
bucketCount | The number of buckets to use. |
Example
{
"name": "driftScore",
"type": "StaticBaselineDrift",
"params": {
"encoder": {
"type": "LINEAR_BUCKETS",
"linearBucket": {
"bucketStart": "0",
"bucketSize": "1",
"bucketCount": "100"
}
},
"monitoringGranularity": "P1D"
},
"inputs": [
{
"targetProperty": "currentData",
"sourcePlanNode": "currentDataFetcher",
"sourceProperty": "currentData"
},
{
"targetProperty": "baselineData",
"sourcePlanNode": "baselineDataFetcher",
"sourceProperty": "baselineData"
}
],
"outputs": [
{
"outputName": "driftScoreData"
}
]
}