How to query on timeseries data based on time window?
Suppose I have the following time series data: I want to calculate mean, std value inside a time window with the value of 2 for different name. I used to use spark to first group by name and then do the calculation in the list within the different group. This is actually quite fast. But I later on figured out that we can first self join based on the time window condition like below:
select a.name, a.load_datetime, b.name, b.load_datetime from temp_data as a join temp_data as b on cast(a.load_datetime as date) < cast(b.load_datetime as date) + integer '2'
Using SQL statement, a more elegant and shorter solution can be achieved.