Comparative Analysis of Unevenly Spaced Time-Series Data
Written on
We often encounter situations where we have two sensors monitoring different parameters of a battery pack, such as current and voltage. For effective analysis, we aim to calculate the power consumed over time (which is simply the product of current and voltage) and then integrate this signal to determine the energy used over a specified duration. However, upon reviewing the data, we find that while the overall time frames are identical, the timestamps for the individual data points do not align. What steps should we take now?
To illustrate this scenario, let’s simulate the data collection from two sensors: one measuring current every 10 seconds and another measuring voltage every 15 seconds, with the voltage data shifted by 3 seconds. We will utilize the pandas function date_range() to establish our time range, specifying a start time, an end time, and a frequency. We will introduce this constant time shift using pd.Timedelta(). To add some variability to our measurements, we will sample from a uniform distribution using numpy with ranges of -0.05 to 0.05 for current and -0.1 to 0.1 for voltage.
# Package Imports
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Current Data
np.random.seed(11)
t_current = pd.date_range("01:00:00", "02:00:00", freq="10s")
noise_current = 0.05 * (np.random.random(len(t_current)) - 0.5)
current = 5 + noise_current.cumsum()
# Voltage Data
np.random.seed(7)
t_voltage = pd.date_range("01:00:00", "02:00:00", freq="10s") + pd.Timedelta("3s")
noise_voltage = 0.05 * (np.random.random(len(t_voltage)) - 0.5)
voltage = 12 + noise_voltage.cumsum()
Next, we will convert these datasets into pandas dataframes and visualize the two signals:
# Create current and voltage dataframes
df_current = pd.DataFrame({"timestamp": t_current, "current": current})
df_voltage = pd.DataFrame({"timestamp": t_voltage, "voltage": voltage})
# Plot two signals
plt.style.use("seaborn-darkgrid")
plt.rcParams["font.family"] = "Poppins"
plt.rcParams["font.size"] = 16
# Create subplots and plot
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))
ax1.plot(df_current["timestamp"], df_current["current"])
ax2.plot(df_voltage["timestamp"], df_voltage["voltage"])
# Edit x-tick time format
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
# Add axis labels
ax1.set(ylabel="Current (A)")
ax2.set(ylabel="Voltage (V)")
plt.show()
Upon closer examination of a specific section of the plot with markers highlighting the actual timestamps, we realize the challenge we face in multiplying these two signals together:
# Create subplots and plot
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))
ax1.plot(df_current["timestamp"], df_current["current"])
ax2.plot(df_voltage["timestamp"], df_voltage["voltage"])
# Edit x-tick time format
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
# Add axis labels
ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
plt.show()
Since none of the timestamps between the two signals align, we must explore how to effectively multiply them. We will investigate two strategies: filling missing values and resampling.
Initially, we will merge our current and voltage dataframes. To achieve this, we will add a column called sensor to identify the measurement type, along with a value column containing the corresponding sensor data. This creates a consistent structure across both dataframes, enabling us to combine them easily.
# Update dataframe schemas
df_current["sensor"] = "current"
df_current = df_current.rename(columns={"current": "value"})
df_voltage["sensor"] = "voltage"
df_voltage = df_voltage.rename(columns={"voltage": "value"})
# Combine dataframes
df_sensor = df_current.append(df_voltage)
Now, we will create a pivot table from our combined dataframe, which allows us to present the unique values from the sensor column as new columns, facilitating a side-by-side comparison of the sensor measurements.
# Pivot the dataframe
df_sensor_pivot = df_sensor.pivot(index="timestamp", columns="sensor", values="value")
Here, we observe the issue clearly: there are no overlapping rows where both sensors have values—either the current or voltage is null. We will begin by addressing this through value filling.
Forward and Backward Filling
The simplest approach to tackle null values is to fill them with the last known value, either by forward filling (ffill()) or backward filling (bfill()). Typically, forward filling is preferred, as backward filling utilizes future data to infer past values. This filling creates a step-like signal, maintaining the last sensor value until a change occurs. We will implement this and visualize the results:
# Forward fill data
df_sensor_pivot = df_sensor_pivot.ffill()
# Create subplots and plot
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))
ax1.plot(df_sensor_pivot["current"], marker="o")
ax2.plot(df_sensor_pivot["voltage"], marker="o")
# Edit x-tick time format
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
# Add axis labels
ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
plt.show()
The application of forward filling results in a pronounced step-shaped signal, particularly for voltage due to its lower measurement frequency. However, with overlapping timestamps, we can now calculate power by multiplying the two signals:
# Calculate Power
df_sensor_pivot["power"] = df_sensor_pivot["current"] * df_sensor_pivot["voltage"]
# Create subplots and plot
fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, figsize=(12, 9))
ax1.plot(df_sensor_pivot["current"])
ax2.plot(df_sensor_pivot["voltage"])
ax3.plot(df_sensor_pivot["power"])
# Edit x-tick time format
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax3.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
# Add axis labels
ax1.set(ylabel="Current (A)")
ax2.set(ylabel="Voltage (V)")
ax3.set(ylabel="Power (W)")
plt.show()
Resampling Data
An alternative method involves resampling the data to a consistent time interval, ideally to the lower frequency of 15 seconds in this case. However, since current data points are collected more frequently (every 10 seconds), we need an aggregation function to summarize these values. We will use the mean of the values within each time window for this operation, assuming that df_sensor_pivot is in its state before the forward fill.
# Resample our dataframe to 15 seconds
df_sensor_pivot = df_sensor_pivot.resample("15s").mean()
# Create subplots and plot
fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))
ax1.plot(df_sensor_pivot["current"], marker="o")
ax2.plot(df_sensor_pivot["voltage"], marker="o")
# Edit x-tick time format
ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
# Add axis labels
ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])
plt.show()
After resampling, we now have evenly spaced data points for both current and voltage signals, allowing us to calculate power similarly to the previous example involving forward filling.
To compare the two methods, we will calculate the total energy, defined as the time integral of power. Utilizing trapezoidal integration, we compute the average of consecutive values, multiply by the time interval, and sum them up.
# Reset index to get timestamp column
df_sensor_pivot = df_sensor_pivot.reset_index()
# Get average of consecutive values
df_sensor_pivot["midpoint"] = (df_sensor_pivot["power"] + df_sensor_pivot["power"].shift()) / 2
# Get the time difference between rows
df_sensor_pivot["time_diff"] = (df_sensor_pivot["timestamp"] - df_sensor_pivot["timestamp"].shift()).apply(lambda t: t.total_seconds())
# Calculate the area of the trapezoid
df_sensor_pivot["energy_kJ"] = df_sensor_pivot["midpoint"] * df_sensor_pivot["time_diff"] / 1000
# Get total energy
total_energy = df_sensor_pivot["energy_kJ"].sum().round(3)
We obtain the following energy values using both methods:
Forward filling: 219.233 kJ Resampling: 219.229 kJ
The difference is a mere 0.004 kJ, indicating that either method yields satisfactory results. Generally, forward filling may lead to a higher integrated value since it retains the last known value until a change occurs, while resampling averages the values within a defined window. Both techniques can effectively process time-series data from multiple sources, even when their timestamps do not align.
Conclusion
Thank you for exploring this tutorial on time-series analysis for unevenly spaced signals. A notebook with the examples from this article will be available at this GitHub repository.
I appreciate your reading! Feel free to connect with me on Twitter or LinkedIn for more updates and articles.