Advertisement Advertisement

datamatrix.operations

A set of operations to apply to columns and DataMatrix objects.

function auto_type(dm)

Requires fastnumbers

Converts all columns of type MixedColumn to IntColumn if all values are integer numbers, or FloatColumn if all values are non-integer numbes.

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a'
dm.B = 1
dm.C = 1.1
dm_new = operations.auto_type(dm)
print('dm_new.A: %s' % type(dm_new.A))
print('dm_new.B: %s' % type(dm_new.B))
print('dm_new.C: %s' % type(dm_new.C))

Output:

dm_new.A: <class 'datamatrix._datamatrix._mixedcolumn.MixedColumn'>
dm_new.B: <class 'datamatrix._datamatrix._numericcolumn.IntColumn'>
dm_new.C: <class 'datamatrix._datamatrix._numericcolumn.FloatColumn'>

Arguments:

  • dm -- No description
    • Type: DataMatrix

Returns:

No description

  • Type: DataMatrix

function bin_split(col, bins)

Splits a DataMatrix into bins; that is, the DataMatrix is first sorted by a column, and then split into equal-size (or roughly equal-size) bins.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 1, 0, 3, 2, 4
dm.B = 'a', 'b', 'c', 'd', 'e'
for bin, dm in enumerate(operations.bin_split(dm.A, bins=3)):
       print('bin %d' % bin)
       print(dm)

Output:

bin 0
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
+---+---+---+
bin 1
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | a |
| 3 | 2 | d |
+---+---+---+
bin 2
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 3 | c |
| 4 | 4 | e |
+---+---+---+

Arguments:

  • col -- The column to split by.
    • Type: BaseColumn
  • bins -- The number of bins.
    • Type: int

Returns:

A generator that iterates over the bins.

function fullfactorial(dm, ignore=u'')

Requires numpy

Creates a new DataMatrix that uses a specified DataMatrix as the base of a full-factorial design. That is, each value of every row is combined with each value from every other row. For example:

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=2)
dm.A = 'x', 'y'
dm.B = 3, 4
dm = operations.fullfactorial(dm)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 3 |
| 1 | y | 3 |
| 2 | x | 4 |
| 3 | y | 4 |
+---+---+---+

Arguments:

  • dm -- The source DataMatrix.
    • Type: DataMatrix

Keywords:

  • ignore -- A value that should be ignored.
    • Default: ''

function group(dm, by)

Requires numpy

Groups the DataMatrix by unique values in a set of grouping columns. Grouped columns are stored as SeriesColumns. The columns that are grouped should contain numeric values. The order in which groups appear in the grouped DataMatrix is unpredictable.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=4)
dm.A = 'x', 'x', 'y', 'y'
dm.B = 0, 1, 2, 3
print('Original:')
print(dm)
dm = operations.group(dm, by=dm.A)
print('Grouped by A:')
print(dm)

Output:

Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 0 |
| 1 | x | 1 |
| 2 | y | 2 |
| 3 | y | 3 |
+---+---+---+
Grouped by A:
+---+---+---------+
| # | A |    B    |
+---+---+---------+
| 0 | x | [0. 1.] |
| 1 | y | [2. 3.] |
+---+---+---------+

Arguments:

  • dm -- The DataMatrix to group.
    • Type: DataMatrix
  • by -- A column or list of columns to group by.
    • Type: BaseColumn, list

Returns:

A grouped DataMatrix.

  • Type: DataMatrix

function keep_only(dm, *cols)

Removes all columns from the DataMatrix, except those listed in cols.

Example:

from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm.C = range(5, 10)
dm_new = ops.keep_only(dm, dm.A, dm.C)
print(dm_new)

Output:

+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+

Arguments:

  • dm -- No description
    • Type: DataMatrix

Argument list:

  • *cols: A list of column names, or column objects.

function replace(col, mappings={})

Replaces values in a column by other values.

Example:

from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=3)
dm.old = 0, 1, 2
dm.new = ops.replace(dm.old, {0 : 'a', 2 : 'c'})
print(dm_new)

Output:

+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+

Arguments:

  • col -- The column to weight by.
    • Type: BaseColumn

Keywords:

  • mappings -- A dict where old values are keys and new values are values.
    • Type: dict
    • Default: {}

function shuffle(obj)

Shuffles a DataMatrix or a column. If a DataMatrix is shuffled, the order of the rows is shuffled, but values that were in the same row will stay in the same row.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = operations.shuffle(dm.A)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | e |
| 1 | b | c |
| 2 | c | b |
| 3 | d | d |
| 4 | e | a |
+---+---+---+

Arguments:

  • obj -- No description
    • Type: DataMatrix, BaseColumn

Returns:

The shuffled DataMatrix or column.

  • Type: DataMatrix, BaseColumn

function shuffle_horiz(*obj)

Shuffles a DataMatrix, or several columns from a DataMatrix, horizontally. That is, the values are shuffled between columns from the same row.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm = operations.shuffle_horiz(dm.A, dm.B)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 1 | b |
| 2 | c | 2 |
| 3 | 3 | d |
| 4 | e | 4 |
+---+---+---+

Argument list:

  • *desc: A list of BaseColumns, or a single DataMatrix.
  • *obj: No description.

Returns:

The shuffled DataMatrix.

  • Type: DataMatrix

function sort(obj, by=None)

Sorts a column or DataMatrix. In the case of a DataMatrix, a column must be specified to determine the sort order. In the case of a column, this needs to be specified if the column should be sorted by another column.

The sort order depends on the version of Python. Python 2 is more flexible, and allows comparisons between types such as str and int. Python 3 does not allow such comparisons.

In general, whenever incomparable values are encountered, all values are forced to float. Values that cannot be converted to float are considered inf.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 2, 0, 1
dm.B = 'a', 'b', 'c'
dm = operations.sort(dm, by=dm.A)
print(dm)

Output:

+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
| 2 | 1 | c |
| 0 | 2 | a |
+---+---+---+

Arguments:

  • obj -- No description
    • Type: DataMatrix, BaseColumn

Keywords:

  • by -- The sort key, that is, the column that is used for sorting the DataMatrix, or the other column.
    • Type: BaseColumn
    • Default: None

Returns:

The sorted DataMatrix, or the sorted column.

  • Type: DataMatrix, BaseColumn

function split(col, *values)

Splits a DataMatrix by unique values in a column.

Example:

from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
# If no values are specified, a (value, DataMatrix) iterator is
# returned.
for A, dm in ops.split(dm.A):
       print('dm.A = %s' % A)
       print(dm)
# If values are specific an iterator over DataMatrix objects is
# returned.
dm_a, dm_c = ops.split(dm.B, 'a', 'c')
print('dm.B == "a"')
print(dm_a)
print('dm.B == "c"')
print(dm_c)

Output:

dm.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
dm.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+
dm.B == "a"
+---+---+---+
| # | A | B |
+---+---+---+
+---+---+---+
dm.B == "c"
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
+---+---+---+

Arguments:

  • col -- The column to split by.
    • Type: BaseColumn

Argument list:

  • *values: Splits the DataMatrix based on these values. If this is provided, an iterator over DataMatrix objects is returned, rather than an iterator over (value, DataMatrix) tuples.

Returns:

A iterator over (value, DataMatrix) tuples if no values are provided; an iterator over DataMatrix objects if values are provided.

  • Type: Iterator

function weight(col)

Weights a DataMatrix by a column. That is, each row from a DataMatrix is repeated as many times as the value in the weighting column.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 1, 2, 0
dm.B = 'x', 'y', 'z'
print('Original:')
print(dm)
dm = operations.weight(dm.A)
print('Weighted by A:')
print(dm)

Output:

Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 0 | z |
+---+---+---+
Weighted by A:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 2 | y |
+---+---+---+

Arguments:

  • col -- The column to weight by.
    • Type: BaseColumn

Returns:

No description

  • Type: DataMatrix

function z(col)

Transforms a column into z scores.

Example:

from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.col = range(5)
dm.z = operations.z(dm.col)
print(dm)

Output:

+---+-----+---------------------+
| # | col |          z          |
+---+-----+---------------------+
| 0 |  0  | -1.2649110640673518 |
| 1 |  1  | -0.6324555320336759 |
| 2 |  2  |         0.0         |
| 3 |  3  |  0.6324555320336759 |
| 4 |  4  |  1.2649110640673518 |
+---+-----+---------------------+

Arguments:

  • col -- The column to transform.
    • Type: BaseColumn

Returns:

No description

  • Type: BaseColumn