datamatrix.operations
A set of operations to apply to columns and DataMatrix
objects.
- function auto_type(dm)
- function bin_split(col, bins)
- function fullfactorial(dm, ignore=u'')
- function group(dm, by)
- function keep_only(dm, *cols)
- function replace(col, mappings={})
- function shuffle(obj)
- function shuffle_horiz(*obj)
- function sort(obj, by=None)
- function split(col, *values)
- function weight(col)
- function z(col)
function auto_type(dm)
Requires fastnumbers
Converts all columns of type MixedColumn to IntColumn if all values are integer numbers, or FloatColumn if all values are non-integer numbes.
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a'
dm.B = 1
dm.C = 1.1
dm_new = operations.auto_type(dm)
print('dm_new.A: %s' % type(dm_new.A))
print('dm_new.B: %s' % type(dm_new.B))
print('dm_new.C: %s' % type(dm_new.C))
Output:
dm_new.A: <class 'datamatrix._datamatrix._mixedcolumn.MixedColumn'>
dm_new.B: <class 'datamatrix._datamatrix._numericcolumn.IntColumn'>
dm_new.C: <class 'datamatrix._datamatrix._numericcolumn.FloatColumn'>
Arguments:
dm
-- No description- Type: DataMatrix
Returns:
No description
- Type: DataMatrix
function bin_split(col, bins)
Splits a DataMatrix into bins; that is, the DataMatrix is first sorted by a column, and then split into equal-size (or roughly equal-size) bins.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 1, 0, 3, 2, 4
dm.B = 'a', 'b', 'c', 'd', 'e'
for bin, dm in enumerate(operations.bin_split(dm.A, bins=3)):
print('bin %d' % bin)
print(dm)
Output:
bin 0
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
+---+---+---+
bin 1
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | a |
| 3 | 2 | d |
+---+---+---+
bin 2
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 3 | c |
| 4 | 4 | e |
+---+---+---+
Arguments:
col
-- The column to split by.- Type: BaseColumn
bins
-- The number of bins.- Type: int
Returns:
A generator that iterates over the bins.
function fullfactorial(dm, ignore=u'')
Requires numpy
Creates a new DataMatrix that uses a specified DataMatrix as the base of a full-factorial design. That is, each value of every row is combined with each value from every other row. For example:
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=2)
dm.A = 'x', 'y'
dm.B = 3, 4
dm = operations.fullfactorial(dm)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 3 |
| 1 | y | 3 |
| 2 | x | 4 |
| 3 | y | 4 |
+---+---+---+
Arguments:
dm
-- The source DataMatrix.- Type: DataMatrix
Keywords:
ignore
-- A value that should be ignored.- Default: ''
function group(dm, by)
Requires numpy
Groups the DataMatrix by unique values in a set of grouping columns. Grouped columns are stored as SeriesColumns. The columns that are grouped should contain numeric values.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=4)
dm.A = 'x', 'x', 'y', 'y'
dm.B = 0, 1, 2, 3
print('Original:')
print(dm)
dm = operations.group(dm, by=dm.A)
print('Grouped by A:')
print(dm)
Output:
Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 0 |
| 1 | x | 1 |
| 2 | y | 2 |
| 3 | y | 3 |
+---+---+---+
Grouped by A:
+---+---+-----------+
| # | A | B |
+---+---+-----------+
| 0 | x | [ 0. 1.] |
| 1 | y | [ 2. 3.] |
+---+---+-----------+
Arguments:
dm
-- The DataMatrix to group.- Type: DataMatrix
by
-- A column or list of columns to group by.- Type: BaseColumn, list
Returns:
A grouped DataMatrix.
- Type: DataMatrix
function keep_only(dm, *cols)
Removes all columns from the DataMatrix, except those listed in cols
.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm.C = range(5, 10)
dm_new = ops.keep_only(dm, dm.A, dm.C)
print(dm_new)
Output:
+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+
Arguments:
dm
-- No description- Type: DataMatrix
Argument list:
*cols
: A list of column names, or column objects.
function replace(col, mappings={})
Replaces values in a column by other values.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=3)
dm.old = 0, 1, 2
dm.new = ops.replace(dm.old, {0 : 'a', 2 : 'c'})
print(dm_new)
Output:
+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+
Arguments:
col
-- The column to weight by.- Type: BaseColumn
Keywords:
mappings
-- A dict where old values are keys and new values are values.- Type: dict
- Default: {}
function shuffle(obj)
Shuffles a DataMatrix or a column. If a DataMatrix is shuffled, the order of the rows is shuffled, but values that were in the same row will stay in the same row.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = operations.shuffle(dm.A)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | d |
| 1 | b | a |
| 2 | c | c |
| 3 | d | e |
| 4 | e | b |
+---+---+---+
Arguments:
obj
-- No description- Type: DataMatrix, BaseColumn
Returns:
The shuffled DataMatrix or column.
- Type: DataMatrix, BaseColumn
function shuffle_horiz(*obj)
Shuffles a DataMatrix, or several columns from a DataMatrix, horizontally. That is, the values are shuffled between columns from the same row.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm = operations.shuffle_horiz(dm.A, dm.B)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | b | 1 |
| 2 | 2 | c |
| 3 | 3 | d |
| 4 | 4 | e |
+---+---+---+
Argument list:
*desc
: A list of BaseColumns, or a single DataMatrix.*obj
: No description.
Returns:
The shuffled DataMatrix.
- Type: DataMatrix
function sort(obj, by=None)
Sorts a column or DataMatrix. In the case of a DataMatrix, a column must be specified to determine the sort order. In the case of a column, this needs to be specified if the column should be sorted by another column.
The sort order depends on the version of Python. Python 2 is more
flexible, and allows comparisons between types such as str
and int
.
Python 3 does not allow such comparisons.
In general, whenever incomparable values are encountered, all values are
forced to float
. Values that cannot be converted to float are
considered inf
.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=3)
dm.A = 2, 0, 1
dm.B = 'a', 'b', 'c'
dm = operations.sort(dm, by=dm.A)
print(dm)
Output:
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
| 2 | 1 | c |
| 0 | 2 | a |
+---+---+---+
Arguments:
obj
-- No description- Type: DataMatrix, BaseColumn
Keywords:
by
-- The sort key, that is, the column that is used for sorting the DataMatrix, or the other column.- Type: BaseColumn
- Default: None
Returns:
The sorted DataMatrix, or the sorted column.
- Type: DataMatrix, BaseColumn
function split(col, *values)
Splits a DataMatrix by unique values in a column.
Example:
from datamatrix import DataMatrix, operations as ops
dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
# If no values are specified, a (value, DataMatrix) iterator is
# returned.
for A, dm in ops.split(dm.A):
print('dm.A = %s' % A)
print(dm)
# If values are specific an iterator over DataMatrix objects is
# returned.
dm_a, dm_c = ops.split(dm.B, 'a', 'c')
print('dm.B == "a"')
print(dm_a)
print('dm.B == "c"')
print(dm_c)
Output:
dm.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
dm.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+
dm.B == "a"
+---+---+---+
| # | A | B |
+---+---+---+
+---+---+---+
dm.B == "c"
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
+---+---+---+
Arguments:
col
-- The column to split by.- Type: BaseColumn
Argument list:
*values
: Splits the DataMatrix based on these values. If this is provided, an iterator over DataMatrix objects is returned, rather than an iterator over (value, DataMatrix) tuples.
Returns:
A iterator over (value, DataMatrix) tuples if no values are provided; an iterator over DataMatrix objects if values are provided.
- Type: Iterator
function weight(col)
Weights a DataMatrix by a column. That is, each row from a DataMatrix is repeated as many times as the value in the weighting column.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=3)
dm.A = 1, 2, 0
dm.B = 'x', 'y', 'z'
print('Original:')
print(dm)
dm = operations.weight(dm.A)
print('Weighted by A:')
print(dm)
Output:
Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 0 | z |
+---+---+---+
Weighted by A:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 2 | y |
+---+---+---+
Arguments:
col
-- The column to weight by.- Type: BaseColumn
Returns:
No description
- Type: DataMatrix
function z(col)
Transforms a column into z scores.
Example:
from datamatrix import DataMatrix, operations
dm = DataMatrix(length=5)
dm.col = range(5)
dm.z = operations.z(dm.col)
print(dm)
Output:
+---+-----+-----------------+
| # | col | z |
+---+-----+-----------------+
| 0 | 0 | -1.26491106407 |
| 1 | 1 | -0.632455532034 |
| 2 | 2 | 0.0 |
| 3 | 3 | 0.632455532034 |
| 4 | 4 | 1.26491106407 |
+---+-----+-----------------+
Arguments:
col
-- The column to transform.- Type: BaseColumn
Returns:
No description
- Type: BaseColumn