datamatrix.operations

A set of operations to apply to columns and DataMatrix objects.

function auto_type(dm)

This modifies the DataMatrix in place.

Converts all columns of type MixedColumn to IntColumn if all values are integer numbers, or FloatColumn if all values are non-integer numbes.

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a'
dm.B = 1
dm.C = 1.1
operations.auto_type(dm)
print('dm.A: %s' % type(dm.A))
print('dm.B: %s' % type(dm.B))
print('dm.C: %s' % type(dm.C))

Output:

dm.A: <class 'datamatrix._datamatrix._mixedcolumn.MixedColumn'>
dm.B: <class 'datamatrix._datamatrix._numericcolumn.IntColumn'>
dm.C: <class 'datamatrix._datamatrix._numericcolumn.FloatColumn'>

Arguments:

function bin_split(col, bins)

Splits a DataMatrix into bins; that is, the DataMatrix is first sorted by a column, and then split into equal-size (or roughly equal-size) bins.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 1, 0, 3, 2, 4
dm.B = 'a', 'b', 'c', 'd', 'e'
for bin, dm in enumerate(operations.bin_split(dm.A, bins=3)):
       print('bin %d' % bin)
       print(dm)

Output:

bin 0
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
+---+---+---+
bin 1
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | a |
| 3 | 2 | d |
+---+---+---+
bin 2
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 3 | c |
| 4 | 4 | e |
+---+---+---+

Arguments:

  • col -- The column to split by.
    • Type: BaseColumn
  • bins -- The number of bins.
    • Type: int

Returns:

A generator that iterates over the bins.

function fullfactorial(dm, ignore=u'')

Requires numpy

Creates a new DataMatrix that uses a specified DataMatrix as the base of a full-factorial design. That is, each value of every row is combined with each value from every other row. For example:

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=2)
dm.A = 'x', 'y'
dm.B = 3, 4
dm = operations.fullfactorial(dm)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 3 |
| 1 | y | 3 |
| 2 | x | 4 |
| 3 | y | 4 |
+---+---+---+

Arguments:

  • dm -- The source DataMatrix.
    • Type: DataMatrix

Keywords:

  • ignore -- A value that should be ignored.
    • Default: ''

function group(dm, by)

Requires numpy

Groups the DataMatrix by unique values in a set of grouping columns. Grouped columns are stored as SeriesColumns. The columns that are grouped should contain numeric values.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=4)
dm.A = 'x', 'x', 'y', 'y'
dm.B = 0, 1, 2, 3
print('Original:')
print(dm)
dm = operations.group(dm, by=dm.A)
print('Grouped by A:')
print(dm)

Output:

Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 0 |
| 1 | x | 1 |
| 2 | y | 2 |
| 3 | y | 3 |
+---+---+---+
Grouped by A:
+---+-----------+---+
| # |     B     | A |
+---+-----------+---+
| 0 | [ 0.  1.] | x |
| 1 | [ 2.  3.] | y |
+---+-----------+---+

Arguments:

  • dm -- The DataMatrix to group.
    • Type: DataMatrix
  • by -- A column or list of columns to group by.
    • Type: BaseColumn, list

Returns:

A grouped DataMatrix.

  • Type: DataMatrix

function keep_only(dm, cols=[])

This modifies the DataMatrix in place.

Removes all columns from the DataMatrix, except those listed in cols.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
operations.keep_only(dm, [dm.A])
print(dm)

Output:

+---+---+
| # | A |
+---+---+
| 0 | a |
| 1 | b |
| 2 | c |
| 3 | d |
| 4 | e |
+---+---+

Arguments:

  • dm -- No description
    • Type: DataMatrix

Keywords:

  • cols -- A list of column names, or columns.
    • Type: list
    • Default: []

function shuffle(obj)

Shuffles a DataMatrix or a column. If a DataMatrix is shuffled, the order of the rows is shuffled, but values that were in the same row will stay in the same row.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = operations.shuffle(dm.A)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | e |
| 1 | b | c |
| 2 | c | d |
| 3 | d | b |
| 4 | e | a |
+---+---+---+

Arguments:

  • obj -- No description
    • Type: DataMatrix, BaseColumn

Returns:

The shuffled DataMatrix or column.

  • Type: DataMatrix, BaseColumn

function shuffle_horiz(*obj)

Shuffles a DataMatrix, or several columns from a DataMatrix, horizontally. That is, the values are shuffled between columns from the same row.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm = operations.shuffle_horiz(dm.A, dm.B)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | 0 |
| 1 | b | 1 |
| 2 | 2 | c |
| 3 | d | 3 |
| 4 | 4 | e |
+---+---+---+

Argument list:

  • *desc: A list of BaseColumns, or a single DataMatrix.
  • *obj: No description.

Returns:

The shuffled DataMatrix.

  • Type: DataMatrix

function sort(obj, by=None)

Sorts a column or DataMatrix. In the case of a DataMatrix, a column must be specified to determine the sort order. In the case of a column, this needs to be specified if the column should be sorted by another column.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 2, 0, 1
dm.B = 'a', 'b', 'c'
dm = operations.sort(dm, by=dm.A)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
| 2 | 1 | c |
| 0 | 2 | a |
+---+---+---+

Arguments:

  • obj -- No description
    • Type: DataMatrix, BaseColumn

Keywords:

  • by -- The sort key, that is, the column that is used for sorting the DataMatrix, or the other column.
    • Type: BaseColumn
    • Default: None

Returns:

The sorted DataMatrix, or the sorted column.

  • Type: DataMatrix, BaseColumn

function split(col)

Splits a DataMatrix by unique values in a column.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
for A, dm in operations.split(dm.A):
       print('col.A = %s' % A)
       print(dm)               

Output:

col.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
col.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+

Arguments:

  • col -- The column to split by.
    • Type: BaseColumn

Returns:

A iterator over (value, DataMatrix) tuples.

  • Type: Iterator

function tuple_split(col, *values)

Splits a DataMatrix by values in a column, and returns the split as a tuple of DataMatrix objects.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
dm0, dm1 = operations.tuple_split(dm.A, 0, 1)
print('dm.A = 0')
print(dm0)
print('dm.A = 1')
print(dm1)

Output:

dm.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
dm.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+

Arguments:

  • col -- The column to split by.
    • Type: BaseColumn

Argument list:

  • *values: A list values to split.

Returns:

A tuple of DataMatrix objects.

function weight(col)

Weights a DataMatrix by a column. That is, each row from a DataMatrix is repeated as many times as the value in the weighting column.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 1, 2, 0
dm.B = 'x', 'y', 'z'
print('Original:')
print(dm)
dm = operations.weight(dm.A)
print('Weighted by A:')
print(dm)

Output:

Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 0 | z |
+---+---+---+
Weighted by A:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 2 | y |
+---+---+---+

Arguments:

  • col -- The column to weight by.
    • Type: BaseColumn

Returns:

No description

  • Type: DataMatrix

function z(col)

Transforms a column into z scores.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.col = range(5)
dm.z = operations.z(dm.col)
print(dm)

Output:

+---+-----+-----------------+
| # | col |        z        |
+---+-----+-----------------+
| 0 |  0  |  -1.26491106407 |
| 1 |  1  | -0.632455532034 |
| 2 |  2  |       0.0       |
| 3 |  3  |  0.632455532034 |
| 4 |  4  |  1.26491106407  |
+---+-----+-----------------+

Arguments:

  • col -- The column to transform.
    • Type: BaseColumn

Returns:

No description

  • Type: BaseColumn