datamatrix.operations
A set of operations to apply to columns and DataMatrix objects.
- function auto_type(dm)
- function bin_split(col, bins)
- function filter_(fnc, obj)
- function fullfactorial(dm, ignore=u'')
- function group(dm, by)
- function keep_only(dm, *cols)
- function map_(fnc, obj)
- function replace(col, mappings={})
- function setcol(dm, name, value)
- function shuffle(obj)
- function shuffle_horiz(*obj)
- function sort(obj, by=None)
- function split(col, *values)
- function weight(col)
- function z(col)
function auto_type(dm)
Requires fastnumbers
Converts all columns of type MixedColumn to IntColumn if all values are integer numbers, or FloatColumn if all values are non-integer numbes.
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a'
dm.B = 1
dm.C = 1.1
dm_new = operations.auto_type(dm)
print('dm_new.A: %s' % type(dm_new.A))
print('dm_new.B: %s' % type(dm_new.B))
print('dm_new.C: %s' % type(dm_new.C))
Output:
dm_new.A: <class 'datamatrix._datamatrix._mixedcolumn.MixedColumn'>
dm_new.B: <class 'datamatrix._datamatrix._numericcolumn.IntColumn'>
dm_new.C: <class 'datamatrix._datamatrix._numericcolumn.FloatColumn'>
Arguments:
dm-- No description- Type: DataMatrix
Returns:
No description
function bin_split(col, bins)
Splits a DataMatrix into bins; that is, the DataMatrix is first sorted by a column, and then split into equal-size (or roughly equal-size) bins.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 1, 0, 3, 2, 4
dm.B = 'a', 'b', 'c', 'd', 'e'
for bin, dm in enumerate(operations.bin_split(dm.A, bins=3)):
print('bin %d' % bin)
print(dm)
Output:
bin 0
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
+---+---+---+
bin 1
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | a |
| 3 | 2 | d |
+---+---+---+
bin 2
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 3 | c |
| 4 | 4 | e |
+---+---+---+
Arguments:
col-- The column to split by.- Type: BaseColumn
bins-- The number of bins.- Type: int
Returns:
A generator that iterates over the bins.
function filter_(fnc, obj)
Filters rows from a datamatrix or column based on filter function
(fnc).
If obj is a column, fnc should be a function that accepts a single
value. If obj is a datamatrix, fnc should be a function that accepts
a keyword dict, where column names are keys and cells are values. In
both cases, fnc should return a bool indicating whether the row or
value should be included.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=5)
dm.col = range(5)
# Create a column with only odd values
col_new = ops.filter_(lambda x: x % 2, dm.col)
print(col_new)
# Create a new datamatrix with only odd values in col
dm_new = ops.filter_(lambda **d: d['col'] % 2, dm)
print(dm_new)
Output:
col[1, 3]
+---+-----+
| # | col |
+---+-----+
| 1 | 1 |
| 3 | 3 |
+---+-----+
Arguments:
fnc-- A filter function.- Type: callable
obj-- A datamatrix or column to filter.- Type: BaseColumn, DataMatrix
Returns:
A new column or datamatrix.
- Type: BaseColumn, DataMatrix
function fullfactorial(dm, ignore=u'')
Requires numpy
Creates a new DataMatrix that uses a specified DataMatrix as the base of a full-factorial design. That is, each value of every row is combined with each value from every other row. For example:
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=2)
dm.A = 'x', 'y'
dm.B = 3, 4
dm = operations.fullfactorial(dm)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 3 |
| 1 | y | 3 |
| 2 | x | 4 |
| 3 | y | 4 |
+---+---+---+
Arguments:
dm-- The source DataMatrix.- Type: DataMatrix
Keywords:
ignore-- A value that should be ignored.- Default: ''
function group(dm, by)
Requires numpy
Groups the DataMatrix by unique values in a set of grouping columns. Grouped columns are stored as SeriesColumns. The columns that are grouped should contain numeric values.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=4)
dm.A = 'x', 'x', 'y', 'y'
dm.B = 0, 1, 2, 3
print('Original:')
print(dm)
dm = operations.group(dm, by=dm.A)
print('Grouped by A:')
print(dm)
Output:
Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 0 |
| 1 | x | 1 |
| 2 | y | 2 |
| 3 | y | 3 |
+---+---+---+
Grouped by A:
+---+-----------+---+
| # | B | A |
+---+-----------+---+
| 0 | [ 0. 1.] | x |
| 1 | [ 2. 3.] | y |
+---+-----------+---+
Arguments:
dm-- The DataMatrix to group.- Type: DataMatrix
by-- A column or list of columns to group by.- Type: BaseColumn, list
Returns:
A grouped DataMatrix.
- Type: DataMatrix
function keep_only(dm, *cols)
Removes all columns from the DataMatrix, except those listed in cols.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm.C = range(5, 10)
dm_new = ops.keep_only(dm, dm.A, dm.C)
print(dm_new)
Output:
+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+
Arguments:
dm-- No description- Type: DataMatrix
Argument list:
*cols: OrderedDict([('desc', 'A list of column names, or columns.')])
function map_(fnc, obj)
Maps a function (fnc) onto rows of datamatrix or cells of a column.
If obj is a column, the function fnc is mapped is mapped onto each
cell of the column, and a new column is returned. In this case,
fnc should be a function that accepts and returns a single value.
If obj is a datamatrix, the function fnc is mapped onto each row,
and a new datamatrix is returned. In this case, fnc should be a
function that accepts a keyword dict, where column names are keys and
cells are values. The return value should be another dict, again with
column names as keys, and cells as values. Columns that are not part of
the returned dict are left unchanged.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=3)
dm.old = 0, 1, 2
# Map a 2x function onto dm.old to create dm.new
dm.new = ops.map_(lambda i: i*2, dm.old)
print(dm)
# Map a 2x function onto the entire dm to create dm_new, using a fancy
# dict comprehension wrapped inside a lambda function.
dm_new = ops.map_(
lambda **d: {col : 2*val for col, val in d.items()},
dm)
print(dm_new)
Output:
+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 | 0 | 0 |
| 1 | 1 | 2 |
| 2 | 2 | 4 |
+---+-----+-----+
+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 | 0 | 0 |
| 1 | 2 | 4 |
| 2 | 4 | 8 |
+---+-----+-----+
Arguments:
fnc-- A function to map onto each row or each cell.- Type: callable
obj-- A datamatrix or column to mapfnconto.- Type: BaseColumn, DataMatrix
Returns:
A new column or datamatrix.
- Type: BaseColumn, DataMatrix
function replace(col, mappings={})
Replaces values in a column by other values.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=3)
dm.old = 0, 1, 2
dm.new = ops.replace(dm.old, {0 : 'a', 2 : 'c'})
print(dm_new)
Output:
+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 | 0 | 0 |
| 1 | 2 | 4 |
| 2 | 4 | 8 |
+---+-----+-----+
Arguments:
col-- The column to weight by.- Type: BaseColumn
Keywords:
mappings-- A dict where old values are keys and new values are values.- Type: dict
- Default: {}
function setcol(dm, name, value)
Returns a new DataMatrix to which a column has been added or modified.
The main difference with using a regular assignment (dm.col = 'x') is
that this does not modify the original DataMatrix, and is suitable for
use in lambda expressions.
Example:
from datamatrix import DataMatrix, operations as ops
dm1 = DataMatrix(length=5)
dm2 = ops.setcol(dm1, 'y', range(5))
print(dm2)
Output:
+---+---+
| # | y |
+---+---+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+
Arguments:
dm-- A DataMatrix.- Type: DataMatrix
name-- A column name.- Type: str
value-- The value to be assigned to the column. This can be any value this is valid for a regular column assignment.
Returns:
A new DataMatrix.
- Type: DataMatrix]
function shuffle(obj)
Shuffles a DataMatrix or a column. If a DataMatrix is shuffled, the order of the rows is shuffled, but values that were in the same row will stay in the same row.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = operations.shuffle(dm.A)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | a |
| 1 | b | d |
| 2 | c | e |
| 3 | d | c |
| 4 | e | b |
+---+---+---+
Arguments:
obj-- No description- Type: DataMatrix, BaseColumn
Returns:
The shuffled DataMatrix or column.
- Type: DataMatrix, BaseColumn
function shuffle_horiz(*obj)
Shuffles a DataMatrix, or several columns from a DataMatrix, horizontally. That is, the values are shuffled between columns from the same row.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm = operations.shuffle_horiz(dm.A, dm.B)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 1 | b |
| 2 | c | 2 |
| 3 | 3 | d |
| 4 | 4 | e |
+---+---+---+
Argument list:
*desc: A list of BaseColumns, or a single DataMatrix.*obj: No description.
Returns:
The shuffled DataMatrix.
- Type: DataMatrix
function sort(obj, by=None)
Sorts a column or DataMatrix. In the case of a DataMatrix, a column must be specified to determine the sort order. In the case of a column, this needs to be specified if the column should be sorted by another column.
The sort order depends on the version of Python. Python 2 is more
flexible, and allows comparisons between types such as str and int.
Python 3 does not allow such comparisons.
In general, whenever incomparable values are encountered, all values are
forced to float. Values that cannot be converted to float are
considered inf.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=3)
dm.A = 2, 0, 1
dm.B = 'a', 'b', 'c'
dm = operations.sort(dm, by=dm.A)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
| 2 | 1 | c |
| 0 | 2 | a |
+---+---+---+
Arguments:
obj-- No description- Type: DataMatrix, BaseColumn
Keywords:
by-- The sort key, that is, the column that is used for sorting the DataMatrix, or the other column.- Type: BaseColumn
- Default: None
Returns:
The sorted DataMatrix, or the sorted column.
- Type: DataMatrix, BaseColumn
function split(col, *values)
Splits a DataMatrix by unique values in a column.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
# If no values are specified, a (value, DataMatrix) iterator is
# returned.
for A, dm in ops.split(dm.A):
print('dm.A = %s' % A)
print(dm)
# If values are specific an iterator over DataMatrix objects is
# returned.
dm_a, dm_c = ops.split(dm.B, 'a', 'c')
print('dm.B == "a"')
print(dm_a)
print('dm.B == "c"')
print(dm_c)
Output:
dm.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
dm.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+
dm.B == "a"
+---+---+---+
| # | A | B |
+---+---+---+
+---+---+---+
dm.B == "c"
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
+---+---+---+
Arguments:
col-- The column to split by.- Type: BaseColumn
Argument list:
*values: Splits the DataMatrix based on these values. If this is provided, an iterator over DataMatrix objects is returned, rather than an iterator over (value, DataMatrix) tuples.
Returns:
A iterator over (value, DataMatrix) tuples if no values are provided; an iterator over DataMatrix objects if values are provided.
- Type: Iterator
function weight(col)
Weights a DataMatrix by a column. That is, each row from a DataMatrix is repeated as many times as the value in the weighting column.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=3)
dm.A = 1, 2, 0
dm.B = 'x', 'y', 'z'
print('Original:')
print(dm)
dm = operations.weight(dm.A)
print('Weighted by A:')
print(dm)
Output:
Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 0 | z |
+---+---+---+
Weighted by A:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 2 | y |
+---+---+---+
Arguments:
col-- The column to weight by.- Type: BaseColumn
Returns:
No description
- Type: DataMatrix
function z(col)
Transforms a column into z scores.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.col = range(5)
dm.z = operations.z(dm.col)
print(dm)
Output:
+---+-----+-----------------+
| # | col | z |
+---+-----+-----------------+
| 0 | 0 | -1.26491106407 |
| 1 | 1 | -0.632455532034 |
| 2 | 2 | 0.0 |
| 3 | 3 | 0.632455532034 |
| 4 | 4 | 1.26491106407 |
+---+-----+-----------------+
Arguments:
col-- The column to transform.- Type: BaseColumn
Returns:
No description
- Type: BaseColumn



