Converting DateTime strings to Timestamps
In this recipe we’ll learn how to import DateTime strings into Pinot.
Prerequisites
To follow the code examples in this guide, you must install Docker(opens in a new tab) locally and download recipes.
Navigate to recipe
- If you haven’t already, download recipes.
- In terminal, go to the recipe by running the following command:
cd pinot-recipes/recipes/datetime-string-to-timestamp
Launch Pinot Cluster and Kafka
You can spin up a Pinot Cluster and Kafka Broker by running the following command:
docker-compose up
This command will run a single instance of the Pinot Controller, Pinot Server, Pinot Broker, Kafka, and Zookeeper. You can find the docker-compose.yml(opens in a new tab) file on GitHub.
DateTime Strings in Pinot
Pinot automatically creates Timestamps from DateTime strings that are in the following format:
yyyy-mm-dd hh:mm:ss[.fffffffff]
- Millis since epoch
If we have a DateTime value in another format we’ll need to convert that value using a transformation function.
We’re going to import the following message into Kafka:
{
"timestamp1": "2019-10-09 22:25:25",
"timestamp2": "1570656325000",
"timestamp3": "10/09/2019T22:25:25"
}
{
"schemaName": "dates",
"dimensionFieldSpecs": [
{"name": "Date1", "dataType": "TIMESTAMP"},
{"name": "Date2", "dataType": "TIMESTAMP"},
{"name": "Date3", "dataType": "TIMESTAMP"}
],
"dateTimeFieldSpecs": [
{
"name": "Date4",
"dataType": "TIMESTAMP",
"format" : "1:MILLISECONDS:EPOCH",
"granularity": "1:MILLISECONDS"
}
]
}
config/schema.json
{
"tableName":"dates",
"tableType":"REALTIME",
"segmentsConfig":{
"timeColumnName":"Date4",
"schemaName":"dates",
"replication":"1",
"replicasPerPartition":"1"
},
"ingestionConfig":{
"batchIngestionConfig":{
"segmentIngestionType":"APPEND",
"segmentIngestionFrequency":"DAILY"
},
"transformConfigs":[
{"columnName":"Date1", "transformFunction":"timestamp1"},
{"columnName":"Date2", "transformFunction":"timestamp2"},
{"columnName":"Date3", "transformFunction": "FromDateTime(timestamp3, 'MM/dd/yyyy''T''HH:mm:ss')"},
{"columnName":"Date4", "transformFunction":"timestamp1"}
]
},
"tableIndexConfig":{
"loadMode":"MMAP",
"streamConfigs":{
"streamType":"kafka",
"stream.kafka.topic.name":"dates",
"stream.kafka.broker.list":"kafka-datetime:9093",
"stream.kafka.consumer.type":"lowlevel",
"stream.kafka.consumer.prop.auto.offset.reset":"smallest",
"stream.kafka.consumer.factory.class.name":"org.apache.pinot.plugin.stream.kafka20.KafkaConsumerFactory",
"stream.kafka.decoder.class.name":"org.apache.pinot.plugin.stream.kafka.KafkaJSONMessageDecoder"
}
},
"tenants":{},
"metadata":{}
}
config/table.json
This table has multiple transform configs defined that:
- Map
timestamp1
from the data source into theDate1
column - Map
timestamp2
from the data source into theDate2
column - Map
timestamp3
from the data source into theDate3
column, with help from theFromDateTime
(opens in a new tab) function. - Map
timestamp4
from the data source into theDate1
column
You can create the schema and table by running the following command:
docker run \
--network datetime \
-v $PWD/config:/config \
apachepinot/pinot:1.0.0 AddTable \
-tableConfigFile /config/table.json \
-schemaFile /config/schema.json \
-controllerHost "pinot-controller-datetime" \
-exec
```bash
printf '{"timestamp1": "2019-10-09 22:25:25", "timestamp2": "1570656325000", "timestamp3": "10/09/2019T22:25:25"}\n' |
kcat -P -b localhost:9092 -t dates
Querying
Once that’s completed, navigate to localhost:9000/#/query(opens in a new tab) and click on the dates table or copy/paste the following query:
select *
from dates
limit 10
You will see the following output:
Query Results