datamatrix.series

What are series?

A SeriesColumn is a column with a depth. For example, imagine a table that combines the names of two cities with their populations during the past four years. Here, the names the cities are single values that fit into a normal table. But the population corresponds to a series of values for each city. This is where the SeriesColumn comes in.

Example:

from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn

NR_CITIES = 2
NR_YEARS = 4

dm = DataMatrix(length=NR_CITIES)
dm.city = 'Marseille', 'Lyon'
# Create a series for the population
dm.population = SeriesColumn(depth=NR_YEARS)
dm.population[0] = 850726, 850602, 851420, 797491 # Marseille
dm.population[1] = 484344, 479803, 474946, 445274 # Lyon
# Create a series for the years that correspond to the populations
dm.year = SeriesColumn(depth=NR_YEARS)
dm.year.setallrows( [2010, 2009, 2008, 1999])

print(dm)

plt.clf()
for row in dm:
    plt.plot(row.year, row.population, 'o-', label=row.city)
plt.legend(loc='upper left')
plt.xlabel('Year')
plt.ylabel('Population')
plt.xlim(1998, 2011)
plt.ylim(400000, 1000000)
plt.savefig('content/pages/img/series/series.png')

Output:

+---+-----------+---------------------------------------+-------------------------------+
| # |    city   |               population              |              year             |
+---+-----------+---------------------------------------+-------------------------------+
| 0 | Marseille | [ 850726.  850602.  851420.  797491.] | [ 2010.  2009.  2008.  1999.] |
| 1 |    Lyon   | [ 484344.  479803.  474946.  445274.] | [ 2010.  2009.  2008.  1999.] |
+---+-----------+---------------------------------------+-------------------------------+

/pages/img/series/series.png

Figure 1. The populations of Marseille and Lyon over time.

Data of this kind is very common. For example, imagine a psychology experiment in which participants see positive or negative pictures, while their brain activity is recorded using electroencephalography (EEG). Here, picture type (positive or negative) is a single value that could be stored in a normal table. But EEG activity is a continuous signal, and could be stored as SeriesColumn.

function baseline(series, baseline, bl_start=-100, bl_end=None, reduce_fnc=None, method=u'divisive')

Applies a baseline to a signal

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 5 # Number of rows
DEPTH = 10 # Depth (or length) of SeriesColumns

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
# First create five identical rows with a sinewave
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
# Add a random offset to the Y values
dm.y += np.random.random(LENGTH)
# And also a bit of random jitter
dm.y += .2*np.random.random( (LENGTH, DEPTH) )
# Baseline-correct the traces, This will remove the vertical
# offset
dm.y2 = series.baseline(dm.y, dm.y, bl_start=0, bl_end=10,
       method='subtractive')

plt.clf()
plt.subplot(121)
plt.title('Original')
plt.plot(dm.y.plottable)
plt.subplot(122)
plt.title('Baseline corrected')
plt.plot(dm.y2.plottable)
plt.savefig('content/pages/img/series/baseline.png')

/pages/img/series/baseline.png

Figure 2.

Arguments:

  • series -- The signal to apply a baseline to.
    • Type: SeriesColumn
  • baseline -- The signal to use as a baseline to.
    • Type: SeriesColumn

Keywords:

  • bl_start -- The start of the window from baseline to use.
    • Type: int
    • Default: -100
  • bl_end -- The end of the window from baseline to use, or None to go to the end.
    • Type: int, None
    • Default: None
  • reduce_fnc -- The function to reduce the baseline epoch to a single value. If None, np.nanmedian() is used.
    • Type: FunctionType, None
    • Default: None
  • method -- Specifies whether divisive or subtrace correction should be used. Divisive is the default for historical purposes, but subtractive is generally preferred.
    • Type: str
    • Default: 'divisive'

Returns:

A baseline-correct version of the signal.

function blinkreconstruct(series, vt=5, maxdur=500, margin=10, smooth_winlen=21, std_thr=3)

Reconstructs pupil size during blinks. This algorithm has been designed and tested largely with the EyeLink 1000 eye tracker.

Source:

Arguments:

  • series -- A signal to reconstruct.
    • Type: SeriesColumn

Keywords:

  • vt -- A pupil velocity threshold. Lower tresholds more easily trigger blinks.
    • Type: int, float
    • Default: 5
  • maxdur -- The maximum duration (in samples) for a blink. Longer blinks are not reconstructed.
    • Type: int
    • Default: 500
  • margin -- The margin to take around missing data.
    • Type: int
    • Default: 10
  • smooth_winlen -- No description
    • Default: 21
  • std_thr -- No description
    • Default: 3

Returns:

A reconstructed singal.

  • Type: SeriesColumn

function concatenate(*series)

Concatenates multiple series such that a new series is created with a depth that is equal to the sum of the depths of all input series.

Example:

from datamatrix import series as srs

dm = DataMatrix(length=1)
dm.s1 = SeriesColumn(depth=3)
dm.s1[:] = 1,2,3
dm.s2 = SeriesColumn(depth=3)
dm.s2[:] = 3,2,1
dm.s = srs.concatenate(dm.s1, dm.s2)
print(dm.s)

Output:

col[[ 1.  2.  3.  3.  2.  1.]]

Argument list:

  • *series: A list of series.

Returns:

A new series.

  • Type: SeriesColumn

function downsample(series, by, fnc=)

Downsamples a series by a factor, so that it becomes 'by' times shorter. The depth of the downsampled series is the highest multiple of the depth of the original series divided by 'by'. For example, downsampling a series with a depth of 10 by 3 results in a depth of 3.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 1 # Number of rows
DEPTH = 100 # Depth (or length) of SeriesColumns

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
dm.y2 = series.downsample(dm.y, by=10)

plt.clf()
plt.subplot(121)
plt.title('Original')
plt.plot(dm.y.plottable, 'o-')
plt.subplot(122)
plt.title('Downsampled')
plt.plot(dm.y2.plottable, 'o-')
plt.savefig('content/pages/img/series/downsample.png')

/pages/img/series/downsample.png

Figure 3.

Arguments:

  • series -- No description
  • by -- The downsampling factor.
    • Type: int

Keywords:

  • fnc -- The function to average the samples that are combined into 1 value. Typically an average or a median.
    • Type: callable
    • Default:

Returns:

A downsampled series.

  • Type: SeriesColumn

function endlock(series)

Locks a series to the end, so that any nan-values that were at the end are moved to the start.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 5 # Number of rows
DEPTH = 10 # Depth (or length) of SeriesColumns

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
# First create five identical rows with a sinewave
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
# Add a random offset to the Y values
dm.y += np.random.random(LENGTH)
# Set some observations at the end to nan
for i, row in enumerate(dm):
       row.y[-i:] = np.nan
# Lock the degraded traces to the end, so that all nans
# now come at the start of the trace
dm.y2 = series.endlock(dm.y)

plt.clf()
plt.subplot(121)
plt.title('Original (nans at end)')
plt.plot(dm.y.plottable)
plt.subplot(122)
plt.title('Endlocked (nans at start)')
plt.plot(dm.y2.plottable)
plt.savefig('content/pages/img/series/endlock.png')

/pages/img/series/endlock.png

Figure 4.

Arguments:

  • series -- The signal to end-lock.
    • Type: SeriesColumn

Returns:

An end-locked signal.

  • Type: SeriesColumn

function interpolate(series)

Linearly interpolates missing (nan) data.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 1 # Number of rows
DEPTH = 100 # Depth (or length) of SeriesColumns
MISSING = 50 # Nr of missing samples

# Create a sine wave with missing data
sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))
sinewave[np.random.choice(np.arange(DEPTH), MISSING)] = np.nan
# And turns this into a DataMatrix
dm = DataMatrix(length=LENGTH)
dm.y = SeriesColumn(depth=DEPTH)
dm.y = sinewave
# Now interpolate the missing data!
dm.i = srs.interpolate(dm.y)

# And plot the original data as circles and the interpolated data as dotted
# lines
plt.clf()
plt.plot(dm.i.plottable, ':')
plt.plot(dm.y.plottable, 'o')
plt.savefig('content/pages/img/series/interpolate.png')

/pages/img/series/interpolate.png

Figure 5.

Arguments:

  • series -- A signal to interpolate.
    • Type: SeriesColumn

Returns:

The interpolated signal.

  • Type: SeriesColumn

function lock(series, lock)

Shifts each row from a series by a certain number of steps along its depth. This is useful to lock, or align, a series based on a sequence of values.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series as srs

LENGTH = 5 # Number of rows
DEPTH = 10 # Depth (or length) of SeriesColumns

dm = DataMatrix(length=LENGTH)
# First create five traces with a partial cosinewave. Each row is
# offset slightly on the x and y axes
dm.y = SeriesColumn(depth=DEPTH)
dm.x_offset = -1
dm.y_offset = -1
for row in dm:
       row.x_offset = np.random.randint(0, DEPTH)
       row.y_offset = np.random.random()
       row.y = np.roll(np.cos(np.linspace(0, np.pi, DEPTH)),
               row.x_offset)+row.y_offset
# Now use the x offset to lock the traces to the 0 point of the cosine,
# i.e. to their peaks. 
dm.y2, zero_point = srs.lock(dm.y, lock=dm.x_offset)

plt.clf()
plt.subplot(121)
plt.title('Original')
plt.plot(dm.y.plottable)
plt.subplot(122)
plt.title('Locked to peak')
plt.plot(dm.y2.plottable)
plt.axvline(zero_point, color='black', linestyle=':')
plt.savefig('content/pages/img/series/lock.png')

/pages/img/series/lock.png

Figure 6.

Arguments:

  • series -- The signal to lock.
    • Type: SeriesColumn
  • lock -- A sequence of lock values with the same length as the Series. This can be a column, a list, a numpy array, etc.

Returns:

A (series, zero_point) tuple, in which series is a SeriesColumn and zero_point is the zero point to which the signal has been locked.

function reduce_(series, operation=)

Transforms series to single values by applying an operation (typically a mean) to each series.

Example:

import numpy as np
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 5 # Number of rows
DEPTH = 10 # Depth (or length) of SeriesColumns

dm = DataMatrix(length=LENGTH)
dm.y = SeriesColumn(depth=DEPTH)
dm.y = np.random.random( (LENGTH, DEPTH) )
dm.mean_y = series.reduce_(dm.y)

print(dm)

Output:

+---+-------------------------------------------------------+----------------+
| # |                           y                           |     mean_y     |
+---+-------------------------------------------------------+----------------+
| 0 | [ 0.23848985  0.27589314 ...  0.54389968  0.3990454 ] | 0.383525721162 |
| 1 | [ 0.37327191  0.61277239 ...  0.71862324  0.80175957] | 0.537008559895 |
| 2 | [ 0.23461541  0.13423965 ...  0.45392644  0.28868331] | 0.451322478327 |
| 3 | [ 0.22869476  0.55976666 ...  0.43865862  0.2388049 ] | 0.321977365512 |
| 4 | [ 0.03027954  0.05928762 ...  0.59054909  0.08383194] | 0.425470406609 |
+---+-------------------------------------------------------+----------------+

Arguments:

  • series -- The signal to reduce.
    • Type: SeriesColumn

Keywords:

  • operation -- The operation function to use for the reduction. This function should accept series as first argument, and axis=1 as keyword argument.
    • Default:

Returns:

A reduction of the signal.

  • Type: FloatColumn

function smooth(series, winlen=11, wintype=u'hanning')

Smooths a signal using a window with requested size.

This method is based on the convolution of a scaled window with the signal. The signal is prepared by introducing reflected copies of the signal (with the window size) in both ends so that transient parts are minimized in the begining and end part of the output signal.

Adapted from:

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 5 # Number of rows
DEPTH = 100 # Depth (or length) of SeriesColumns

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
# First create five identical rows with a sinewave
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
# And add a bit of random jitter
dm.y += np.random.random( (LENGTH, DEPTH) )
# Smooth the traces to reduce the jitter
dm.y2 = series.smooth(dm.y)

plt.clf()
plt.subplot(121)
plt.title('Original')
plt.plot(dm.y.plottable)
plt.subplot(122)
plt.title('Smoothed')
plt.plot(dm.y2.plottable)
plt.savefig('content/pages/img/series/smooth.png')

/pages/img/series/smooth.png

Figure 7.

Arguments:

  • series -- A signal to smooth.
    • Type: SeriesColumn

Keywords:

  • winlen -- The width of the smoothing window. This should be an odd integer.
    • Type: int
    • Default: 11
  • wintype -- The type of window from 'flat', 'hanning', 'hamming', 'bartlett', 'blackman'. A flat window produces a moving average smoothing.
    • Type: str
    • Default: 'hanning'

Returns:

A smoothed signal.

  • Type: SeriesColumn

function threshold(series, fnc, min_length=1)

Finds samples that satisfy some threshold criterion for a given period.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 1 # Number of rows
DEPTH = 100 # Depth (or length) of SeriesColumns

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
# First create five identical rows with a sinewave
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
# And also a bit of random jitter
dm.y += np.random.random( (LENGTH, DEPTH) )
# Threshold the signal by > 0 for at least 10 samples
dm.t = series.threshold(dm.y, fnc=lambda y: y > 0, min_length=10)

plt.clf()
# Mark the thresholded signal
plt.fill_between(np.arange(DEPTH), dm.t[0], color='black', alpha=.25)
plt.plot(dm.y.plottable)
plt.savefig('content/pages/img/series/threshold.png')

print(dm)

Output:

+---+-------------------------------------------------------+-----------------------+
| # |                           y                           |           t           |
+---+-------------------------------------------------------+-----------------------+
| 0 | [ 0.77613174  0.89270501 ...  0.41222954  0.37243534] | [ 1.  1. ...  0.  0.] |
+---+-------------------------------------------------------+-----------------------+

/pages/img/series/threshold.png

Figure 8.

Arguments:

  • series -- A signal to threshold.
    • Type: SeriesColumn
  • fnc -- A function that takes a single value and returns True if this value exceeds a threshold, and False otherwise.
    • Type: FunctionType

Keywords:

  • min_length -- The minimum number of samples for which fnc must return True.
    • Type: int
    • Default: 1

Returns:

A series where 0 indicates below threshold, and 1 indicates above threshold.

  • Type: SeriesColumn

function window(series, start=0, end=None)

Extracts a window from a signal.

Example:

import numpy as np
from matplotlib import pyplot as plt
from datamatrix import DataMatrix, SeriesColumn, series

LENGTH = 5 # Number of rows
DEPTH = 10 # Depth (or length) of SeriesColumnsplt.show()

sinewave = np.sin(np.linspace(0, 2*np.pi, DEPTH))

dm = DataMatrix(length=LENGTH)
# First create five identical rows with a sinewave
dm.y = SeriesColumn(depth=DEPTH)
dm.y.setallrows(sinewave)
# Add a random offset to the Y values
dm.y += np.random.random(LENGTH)
# Look only the middle half of the signal
dm.y2 = series.window(dm.y, start=DEPTH//4, end=-DEPTH//4)

plt.clf()
plt.subplot(121)
plt.title('Original')
plt.plot(dm.y.plottable)
plt.subplot(122)
plt.title('Window (middle half)')
plt.plot(dm.y2.plottable)
plt.savefig('content/pages/img/series/window.png')

/pages/img/series/window.png

Figure 9.

Arguments:

  • series -- The signal to get a window from.
    • Type: SeriesColumn

Keywords:

  • start -- The window start.
    • Type: int
    • Default: 0
  • end -- The window end, or None to go to the signal end.
    • Type: int, None
    • Default: None

Returns:

A window of the signal.

  • Type: SeriesColumn