# Comparative Analysis of Unevenly Spaced Time-Series Data

Written on

We often encounter situations where we have two sensors monitoring different parameters of a battery pack, such as current and voltage. For effective analysis, we aim to calculate the power consumed over time (which is simply the product of current and voltage) and then integrate this signal to determine the energy used over a specified duration. However, upon reviewing the data, we find that while the overall time frames are identical, the timestamps for the individual data points do not align. What steps should we take now?

To illustrate this scenario, letâ€™s simulate the data collection from two sensors: one measuring current every 10 seconds and another measuring voltage every 15 seconds, with the voltage data shifted by 3 seconds. We will utilize the pandas function date_range() to establish our time range, specifying a start time, an end time, and a frequency. We will introduce this constant time shift using pd.Timedelta(). To add some variability to our measurements, we will sample from a uniform distribution using numpy with ranges of -0.05 to 0.05 for current and -0.1 to 0.1 for voltage.

# Package Imports

import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

# Current Data

np.random.seed(11)

t_current = pd.date_range("01:00:00", "02:00:00", freq="10s")

noise_current = 0.05 * (np.random.random(len(t_current)) - 0.5)

current = 5 + noise_current.cumsum()

# Voltage Data

np.random.seed(7)

t_voltage = pd.date_range("01:00:00", "02:00:00", freq="10s") + pd.Timedelta("3s")

noise_voltage = 0.05 * (np.random.random(len(t_voltage)) - 0.5)

voltage = 12 + noise_voltage.cumsum()

Next, we will convert these datasets into pandas dataframes and visualize the two signals:

# Create current and voltage dataframes

df_current = pd.DataFrame({"timestamp": t_current, "current": current})

df_voltage = pd.DataFrame({"timestamp": t_voltage, "voltage": voltage})

# Plot two signals

plt.style.use("seaborn-darkgrid")

plt.rcParams["font.family"] = "Poppins"

plt.rcParams["font.size"] = 16

# Create subplots and plot

fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))

ax1.plot(df_current["timestamp"], df_current["current"])

ax2.plot(df_voltage["timestamp"], df_voltage["voltage"])

# Edit x-tick time format

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

# Add axis labels

ax1.set(ylabel="Current (A)")

ax2.set(ylabel="Voltage (V)")

plt.show()

Upon closer examination of a specific section of the plot with markers highlighting the actual timestamps, we realize the challenge we face in multiplying these two signals together:

# Create subplots and plot

fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))

ax1.plot(df_current["timestamp"], df_current["current"])

ax2.plot(df_voltage["timestamp"], df_voltage["voltage"])

# Edit x-tick time format

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

# Add axis labels

ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

plt.show()

Since none of the timestamps between the two signals align, we must explore how to effectively multiply them. We will investigate two strategies: filling missing values and resampling.

Initially, we will merge our current and voltage dataframes. To achieve this, we will add a column called sensor to identify the measurement type, along with a value column containing the corresponding sensor data. This creates a consistent structure across both dataframes, enabling us to combine them easily.

# Update dataframe schemas

df_current["sensor"] = "current"

df_current = df_current.rename(columns={"current": "value"})

df_voltage["sensor"] = "voltage"

df_voltage = df_voltage.rename(columns={"voltage": "value"})

# Combine dataframes

df_sensor = df_current.append(df_voltage)

Now, we will create a pivot table from our combined dataframe, which allows us to present the unique values from the sensor column as new columns, facilitating a side-by-side comparison of the sensor measurements.

# Pivot the dataframe

df_sensor_pivot = df_sensor.pivot(index="timestamp", columns="sensor", values="value")

Here, we observe the issue clearly: there are no overlapping rows where both sensors have valuesâ€”either the current or voltage is null. We will begin by addressing this through value filling.

## Forward and Backward Filling

The simplest approach to tackle null values is to fill them with the last known value, either by forward filling (ffill()) or backward filling (bfill()). Typically, forward filling is preferred, as backward filling utilizes future data to infer past values. This filling creates a step-like signal, maintaining the last sensor value until a change occurs. We will implement this and visualize the results:

# Forward fill data

df_sensor_pivot = df_sensor_pivot.ffill()

# Create subplots and plot

fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))

ax1.plot(df_sensor_pivot["current"], marker="o")

ax2.plot(df_sensor_pivot["voltage"], marker="o")

# Edit x-tick time format

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

# Add axis labels

ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

plt.show()

The application of forward filling results in a pronounced step-shaped signal, particularly for voltage due to its lower measurement frequency. However, with overlapping timestamps, we can now calculate power by multiplying the two signals:

# Calculate Power

df_sensor_pivot["power"] = df_sensor_pivot["current"] * df_sensor_pivot["voltage"]

# Create subplots and plot

fig, (ax1, ax2, ax3) = plt.subplots(nrows=3, ncols=1, figsize=(12, 9))

ax1.plot(df_sensor_pivot["current"])

ax2.plot(df_sensor_pivot["voltage"])

ax3.plot(df_sensor_pivot["power"])

# Edit x-tick time format

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax3.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

# Add axis labels

ax1.set(ylabel="Current (A)")

ax2.set(ylabel="Voltage (V)")

ax3.set(ylabel="Power (W)")

plt.show()

## Resampling Data

An alternative method involves resampling the data to a consistent time interval, ideally to the lower frequency of 15 seconds in this case. However, since current data points are collected more frequently (every 10 seconds), we need an aggregation function to summarize these values. We will use the mean of the values within each time window for this operation, assuming that df_sensor_pivot is in its state before the forward fill.

# Resample our dataframe to 15 seconds

df_sensor_pivot = df_sensor_pivot.resample("15s").mean()

# Create subplots and plot

fig, (ax1, ax2) = plt.subplots(nrows=2, ncols=1, figsize=(12, 6))

ax1.plot(df_sensor_pivot["current"], marker="o")

ax2.plot(df_sensor_pivot["voltage"], marker="o")

# Edit x-tick time format

ax1.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

ax2.xaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))

# Add axis labels

ax1.set(ylabel="Current (A)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

ax2.set(ylabel="Voltage (V)", xlim=[pd.Timestamp("01:00:00"), pd.Timestamp("01:03:00")])

plt.show()

After resampling, we now have evenly spaced data points for both current and voltage signals, allowing us to calculate power similarly to the previous example involving forward filling.

To compare the two methods, we will calculate the total energy, defined as the time integral of power. Utilizing trapezoidal integration, we compute the average of consecutive values, multiply by the time interval, and sum them up.

# Reset index to get timestamp column

df_sensor_pivot = df_sensor_pivot.reset_index()

# Get average of consecutive values

df_sensor_pivot["midpoint"] = (df_sensor_pivot["power"] + df_sensor_pivot["power"].shift()) / 2

# Get the time difference between rows

df_sensor_pivot["time_diff"] = (df_sensor_pivot["timestamp"] - df_sensor_pivot["timestamp"].shift()).apply(lambda t: t.total_seconds())

# Calculate the area of the trapezoid

df_sensor_pivot["energy_kJ"] = df_sensor_pivot["midpoint"] * df_sensor_pivot["time_diff"] / 1000

# Get total energy

total_energy = df_sensor_pivot["energy_kJ"].sum().round(3)

We obtain the following energy values using both methods:

**Forward filling:** 219.233 kJ
**Resampling:** 219.229 kJ

The difference is a mere 0.004 kJ, indicating that either method yields satisfactory results. Generally, forward filling may lead to a higher integrated value since it retains the last known value until a change occurs, while resampling averages the values within a defined window. Both techniques can effectively process time-series data from multiple sources, even when their timestamps do not align.

## Conclusion

Thank you for exploring this tutorial on time-series analysis for unevenly spaced signals. A notebook with the examples from this article will be available at this GitHub repository.

I appreciate your reading! Feel free to connect with me on Twitter or LinkedIn for more updates and articles.