# datamatrix.operations

A set of operations to apply to columns and `DataMatrix` objects.

## function auto_type(dm)

Requires fastnumbers

Converts all columns of type MixedColumn to IntColumn if all values are integer numbers, or FloatColumn if all values are non-integer numbes.

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a'
dm.B = 1
dm.C = 1.1
dm_new = operations.auto_type(dm)
print('dm_new.A: %s' % type(dm_new.A))
print('dm_new.B: %s' % type(dm_new.B))
print('dm_new.C: %s' % type(dm_new.C))
```

Output:

```dm_new.A: <class 'datamatrix._datamatrix._mixedcolumn.MixedColumn'>
dm_new.B: <class 'datamatrix._datamatrix._numericcolumn.IntColumn'>
dm_new.C: <class 'datamatrix._datamatrix._numericcolumn.FloatColumn'>
```

Arguments:

• `dm` -- No description
• Type: DataMatrix

Returns:

No description

## function bin_split(col, bins)

Splits a DataMatrix into bins; that is, the DataMatrix is first sorted by a column, and then split into equal-size (or roughly equal-size) bins.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 1, 0, 3, 2, 4
dm.B = 'a', 'b', 'c', 'd', 'e'
for bin, dm in enumerate(operations.bin_split(dm.A, bins=3)):
print('bin %d' % bin)
print(dm)
```

Output:

```bin 0
+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
+---+---+---+
bin 1
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | a |
| 3 | 2 | d |
+---+---+---+
bin 2
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 3 | c |
| 4 | 4 | e |
+---+---+---+
```

Arguments:

• `col` -- The column to split by.
• Type: BaseColumn
• `bins` -- The number of bins.
• Type: int

Returns:

A generator that iterates over the bins.

## function filter_(fnc, obj)

Filters rows from a datamatrix or column based on filter function (`fnc`).

If `obj` is a column, `fnc` should be a function that accepts a single value. If `obj` is a datamatrix, `fnc` should be a function that accepts a keyword `dict`, where column names are keys and cells are values. In both cases, `fnc` should return a `bool` indicating whether the row or value should be included.

Example:

```from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=5)
dm.col = range(5)
# Create a column with only odd values
col_new = ops.filter_(lambda x: x % 2, dm.col)
print(col_new)
# Create a new datamatrix with only odd values in col
dm_new = ops.filter_(lambda **d: d['col'] % 2, dm)
print(dm_new)
```

Output:

```col[1, 3]
+---+-----+
| # | col |
+---+-----+
| 1 |  1  |
| 3 |  3  |
+---+-----+
```

Arguments:

• `fnc` -- A filter function.
• Type: callable
• `obj` -- A datamatrix or column to filter.
• Type: BaseColumn, DataMatrix

Returns:

A new column or datamatrix.

• Type: BaseColumn, DataMatrix

## function fullfactorial(dm, ignore=u'')

Requires numpy

Creates a new DataMatrix that uses a specified DataMatrix as the base of a full-factorial design. That is, each value of every row is combined with each value from every other row. For example:

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=2)
dm.A = 'x', 'y'
dm.B = 3, 4
dm = operations.fullfactorial(dm)
print(dm)
```

Output:

```+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 3 |
| 1 | y | 3 |
| 2 | x | 4 |
| 3 | y | 4 |
+---+---+---+
```

Arguments:

• `dm` -- The source DataMatrix.
• Type: DataMatrix

Keywords:

• `ignore` -- A value that should be ignored.
• Default: ''

## function group(dm, by)

Requires numpy

Groups the DataMatrix by unique values in a set of grouping columns. Grouped columns are stored as SeriesColumns. The columns that are grouped should contain numeric values.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=4)
dm.A = 'x', 'x', 'y', 'y'
dm.B = 0, 1, 2, 3
print('Original:')
print(dm)
dm = operations.group(dm, by=dm.A)
print('Grouped by A:')
print(dm)
```

Output:

```Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | x | 0 |
| 1 | x | 1 |
| 2 | y | 2 |
| 3 | y | 3 |
+---+---+---+
Grouped by A:
+---+-----------+---+
| # |     B     | A |
+---+-----------+---+
| 0 | [ 0.  1.] | x |
| 1 | [ 2.  3.] | y |
+---+-----------+---+
```

Arguments:

• `dm` -- The DataMatrix to group.
• Type: DataMatrix
• `by` -- A column or list of columns to group by.
• Type: BaseColumn, list

Returns:

A grouped DataMatrix.

• Type: DataMatrix

## function keep_only(dm, *cols)

Removes all columns from the DataMatrix, except those listed in `cols`.

Example:

```from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm.C = range(5, 10)
dm_new = ops.keep_only(dm, dm.A, dm.C)
print(dm_new)
```

Output:

```+---+---+---+
| # | A | C |
+---+---+---+
| 0 | a | 5 |
| 1 | b | 6 |
| 2 | c | 7 |
| 3 | d | 8 |
| 4 | e | 9 |
+---+---+---+
```

Arguments:

• `dm` -- No description
• Type: DataMatrix

Argument list:

• `*cols`: OrderedDict([('desc', 'A list of column names, or columns.')])

## function map_(fnc, obj)

Maps a function (`fnc`) onto rows of datamatrix or cells of a column.

If `obj` is a column, the function `fnc` is mapped is mapped onto each cell of the column, and a new column is returned. In this case, `fnc` should be a function that accepts and returns a single value.

If `obj` is a datamatrix, the function `fnc` is mapped onto each row, and a new datamatrix is returned. In this case, `fnc` should be a function that accepts a keyword `dict`, where column names are keys and cells are values. The return value should be another `dict`, again with column names as keys, and cells as values. Columns that are not part of the returned `dict` are left unchanged.

Example:

```from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=3)
dm.old = 0, 1, 2
# Map a 2x function onto dm.old to create dm.new
dm.new = ops.map_(lambda i: i*2, dm.old)
print(dm)
# Map a 2x function onto the entire dm to create dm_new, using a fancy
# dict comprehension wrapped inside a lambda function.
dm_new = ops.map_(
lambda **d: {col : 2*val for col, val in d.items()},
dm)
print(dm_new)
```

Output:

```+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 |  0  |  0  |
| 1 |  1  |  2  |
| 2 |  2  |  4  |
+---+-----+-----+
+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 |  0  |  0  |
| 1 |  2  |  4  |
| 2 |  4  |  8  |
+---+-----+-----+
```

Arguments:

• `fnc` -- A function to map onto each row or each cell.
• Type: callable
• `obj` -- A datamatrix or column to map `fnc` onto.
• Type: BaseColumn, DataMatrix

Returns:

A new column or datamatrix.

• Type: BaseColumn, DataMatrix

## function replace(col, mappings={})

Replaces values in a column by other values.

Example:

```from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=3)
dm.old = 0, 1, 2
dm.new = ops.replace(dm.old, {0 : 'a', 2 : 'c'})
print(dm_new)
```

Output:

```+---+-----+-----+
| # | old | new |
+---+-----+-----+
| 0 |  0  |  0  |
| 1 |  2  |  4  |
| 2 |  4  |  8  |
+---+-----+-----+
```

Arguments:

• `col` -- The column to weight by.
• Type: BaseColumn

Keywords:

• `mappings` -- A dict where old values are keys and new values are values.
• Type: dict
• Default: {}

## function setcol(dm, name, value)

Returns a new DataMatrix to which a column has been added or modified.

The main difference with using a regular assignment (`dm.col = 'x'`) is that this does not modify the original DataMatrix, and is suitable for use in `lambda` expressions.

Example:

```from datamatrix import DataMatrix, operations as ops

dm1 = DataMatrix(length=5)
dm2 = ops.setcol(dm1, 'y', range(5))
print(dm2)
```

Output:

```+---+---+
| # | y |
+---+---+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+
```

Arguments:

• `dm` -- A DataMatrix.
• Type: DataMatrix
• `name` -- A column name.
• Type: str
• `value` -- The value to be assigned to the column. This can be any value this is valid for a regular column assignment.

Returns:

A new DataMatrix.

• Type: DataMatrix]

## function shuffle(obj)

Shuffles a DataMatrix or a column. If a DataMatrix is shuffled, the order of the rows is shuffled, but values that were in the same row will stay in the same row.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = operations.shuffle(dm.A)
print(dm)
```

Output:

```+---+---+---+
| # | A | B |
+---+---+---+
| 0 | a | a |
| 1 | b | d |
| 2 | c | e |
| 3 | d | c |
| 4 | e | b |
+---+---+---+
```

Arguments:

• `obj` -- No description
• Type: DataMatrix, BaseColumn

Returns:

The shuffled DataMatrix or column.

• Type: DataMatrix, BaseColumn

## function shuffle_horiz(*obj)

Shuffles a DataMatrix, or several columns from a DataMatrix, horizontally. That is, the values are shuffled between columns from the same row.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.A = 'a', 'b', 'c', 'd', 'e'
dm.B = range(5)
dm = operations.shuffle_horiz(dm.A, dm.B)
print(dm)
```

Output:

```+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 1 | b |
| 2 | c | 2 |
| 3 | 3 | d |
| 4 | 4 | e |
+---+---+---+
```

Argument list:

• `*desc`: A list of BaseColumns, or a single DataMatrix.
• `*obj`: No description.

Returns:

The shuffled DataMatrix.

• Type: DataMatrix

## function sort(obj, by=None)

Sorts a column or DataMatrix. In the case of a DataMatrix, a column must be specified to determine the sort order. In the case of a column, this needs to be specified if the column should be sorted by another column.

The sort order depends on the version of Python. Python 2 is more flexible, and allows comparisons between types such as `str` and `int`. Python 3 does not allow such comparisons.

In general, whenever incomparable values are encountered, all values are forced to `float`. Values that cannot be converted to float are considered `inf`.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 2, 0, 1
dm.B = 'a', 'b', 'c'
dm = operations.sort(dm, by=dm.A)
print(dm)
```

Output:

```+---+---+---+
| # | A | B |
+---+---+---+
| 1 | 0 | b |
| 2 | 1 | c |
| 0 | 2 | a |
+---+---+---+
```

Arguments:

• `obj` -- No description
• Type: DataMatrix, BaseColumn

Keywords:

• `by` -- The sort key, that is, the column that is used for sorting the DataMatrix, or the other column.
• Type: BaseColumn
• Default: None

Returns:

The sorted DataMatrix, or the sorted column.

• Type: DataMatrix, BaseColumn

## function split(col, *values)

Splits a DataMatrix by unique values in a column.

Example:

```from datamatrix import DataMatrix, operations as ops

dm = DataMatrix(length=4)
dm.A = 0, 0, 1, 1
dm.B = 'a', 'b', 'c', 'd'
# If no values are specified, a (value, DataMatrix) iterator is
# returned.
for A, dm in ops.split(dm.A):
print('dm.A = %s' % A)
print(dm)
# If values are specific an iterator over DataMatrix objects is
# returned.
dm_a, dm_c = ops.split(dm.B, 'a', 'c')
print('dm.B == "a"')
print(dm_a)
print('dm.B == "c"')
print(dm_c)
```

Output:

```dm.A = 0
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 0 | a |
| 1 | 0 | b |
+---+---+---+
dm.A = 1
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
| 3 | 1 | d |
+---+---+---+
dm.B == "a"
+---+---+---+
| # | A | B |
+---+---+---+
+---+---+---+
dm.B == "c"
+---+---+---+
| # | A | B |
+---+---+---+
| 2 | 1 | c |
+---+---+---+
```

Arguments:

• `col` -- The column to split by.
• Type: BaseColumn

Argument list:

• `*values`: Splits the DataMatrix based on these values. If this is provided, an iterator over DataMatrix objects is returned, rather than an iterator over (value, DataMatrix) tuples.

Returns:

A iterator over (value, DataMatrix) tuples if no values are provided; an iterator over DataMatrix objects if values are provided.

• Type: Iterator

## function weight(col)

Weights a DataMatrix by a column. That is, each row from a DataMatrix is repeated as many times as the value in the weighting column.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=3)
dm.A = 1, 2, 0
dm.B = 'x', 'y', 'z'
print('Original:')
print(dm)
dm = operations.weight(dm.A)
print('Weighted by A:')
print(dm)
```

Output:

```Original:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 0 | z |
+---+---+---+
Weighted by A:
+---+---+---+
| # | A | B |
+---+---+---+
| 0 | 1 | x |
| 1 | 2 | y |
| 2 | 2 | y |
+---+---+---+
```

Arguments:

• `col` -- The column to weight by.
• Type: BaseColumn

Returns:

No description

• Type: DataMatrix

## function z(col)

Transforms a column into z scores.

Example:

```from datamatrix import DataMatrix, operations

dm = DataMatrix(length=5)
dm.col = range(5)
dm.z = operations.z(dm.col)
print(dm)
```

Output:

```+---+-----+-----------------+
| # | col |        z        |
+---+-----+-----------------+
| 0 |  0  |  -1.26491106407 |
| 1 |  1  | -0.632455532034 |
| 2 |  2  |       0.0       |
| 3 |  3  |  0.632455532034 |
| 4 |  4  |  1.26491106407  |
+---+-----+-----------------+
```

Arguments:

• `col` -- The column to transform.
• Type: BaseColumn

Returns:

No description

• Type: BaseColumn