# Basic use

Ultra-short cheat sheet:

```
from datamatrix import DataMatrix
# Create a new DataMatrix
dm = DataMatrix(length=5)
# The first two rows
print(dm[:2])
# Create a new column and initialize it with the Fibonacci series
dm.fibonacci = 0, 1, 1, 2, 3
# A simple selection (remove 0 and 2)
dm = (dm.fibonacci != 0) & (dm.fibonacci != 2)
# The first two cells from the fibonacci column
print(dm.fibonacci[:2])
# Column mean
print('Mean: %s' % dm.fibonacci.mean)
# Multiply all fibonacci cells by 2
dm.fibonacci_times_two = dm.fibonacci * 2
# Loop through all rows
for row in dm:
print(row.fibonacci) # get the fibonacci cell from the row
# Loop through all columns
for colname, col in dm.columns:
for cell in col: # Loop through all cells in the column
print(cell) # do something with the cell
```

Slightly longer cheat sheet:

## Basic operations

### Creating a DataMatrix

Create a new `DataMatrix`

object, and add a column (named `col`

). By default, the column is of the `MixedColumn`

type, which can store numeric and string data.

```
from datamatrix import DataMatrix, __version__
dm = DataMatrix(length=2)
dm.col = ':-)'
print('These examples were generated with DataMatrix v%s\n' % __version__)
print(dm)
```

**Output:**

```
These examples were generated with DataMatrix v0.3.8
+---+-----+
| # | col |
+---+-----+
| 0 | :-) |
| 1 | :-) |
+---+-----+
```

You can change the length of the `DataMatrix`

later on. If you reduce the length, data will be lost. If you increase the length, empty cells will be added.

```
dm.length = 3
```

### Concatenating two DataMatrix objects

You can concatenate two `DataMatrix`

objects using the `<<`

operator. Matching columns will be combined. (Note that row 2 is empty. This is because we have increased the length of `dm`

in the previous step, causing an empty row to be added.)

```
dm2 = DataMatrix(length=2)
dm2.col = ';-)'
dm2.col2 = 10, 20
dm3 = dm << dm2
print(dm3)
```

**Output:**

```
+---+-----+------+
| # | col | col2 |
+---+-----+------+
| 0 | :-) | |
| 1 | :-) | |
| 2 | | |
| 3 | ;-) | 10 |
| 4 | ;-) | 20 |
+---+-----+------+
```

### Creating columns

You can change all cells in column to a single value. This creates a new column if it doesn't exist yet.

```
dm.col = 'Another value'
print(dm)
```

**Output:**

```
+---+---------------+
| # | col |
+---+---------------+
| 0 | Another value |
| 1 | Another value |
| 2 | Another value |
+---+---------------+
```

You can change all cells in a column based on a sequence. This creates a new column if it doesn't exist yet. This sequence must have the same length as the column (3 in this case).

```
dm.col = 1, 2, 3
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | 1 |
| 1 | 2 |
| 2 | 3 |
+---+-----+
```

If you do not know the name of a column, for example becaues it is defined by a variable, you can also refer to columns as though they are items of a `dict`

. However, this is *not* recommended, because it makes it less clear whether you are referring to column or a row.

```
dm['col'] = 'X'
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | X |
| 1 | X |
| 2 | X |
+---+-----+
```

### Renaming columns

```
dm.rename('col', 'col2')
print(dm)
```

**Output:**

```
+---+------+
| # | col2 |
+---+------+
| 0 | X |
| 1 | X |
| 2 | X |
+---+------+
```

### Deleting columns

You can delete a column using the `del`

keyword:

```
dm.col = 'x'
del dm.col2
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | x |
| 1 | x |
| 2 | x |
+---+-----+
```

### Changing column cells (and slicing)

Change one cell:

```
dm.col[1] = ':-)'
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | x |
| 1 | :-) |
| 2 | x |
+---+-----+
```

Change multiple cells. (This changes row 0 and 2. It is not a slice!)

```
dm.col[0,2] = ':P'
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | :P |
| 1 | :-) |
| 2 | :P |
+---+-----+
```

Change a slice of cells:

```
dm.col[1:] = ':D'
print(dm)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | :P |
| 1 | :D |
| 2 | :D |
+---+-----+
```

### Column properties

Basic numeric properties, such as the mean, can be accessed directly. Only numeric values are taken into account.

```
dm.col = 1, 2, 'not a number'
# Numeric descriptives
print('mean: %s' % dm.col.mean)
print('median: %s' % dm.col.median)
print('standard deviation: %s' % dm.col.std)
print('sum: %s' % dm.col.sum)
print('min: %s' % dm.col.min)
print('max: %s' % dm.col.max)
# Other properties
print('unique values: %s' % dm.col.unique)
print('number of unique values: %s' % dm.col.count)
print('column name: %s' % dm.col.name)
```

**Output:**

```
mean: 1.5
median: 1.5
standard deviation: 0.707106781187
sum: 3.0
min: 1.0
max: 2.0
unique values: [1, 2, u'not a number']
number of unique values: 3
column name: col
```

### Iterating over rows, columns, and cells

By iterating directly over a `DataMatrix`

object, you get successive `Row`

objects. From a `Row`

object, you can directly access cells.

```
dm.col = 'a', 'b', 'c'
for row in dm:
print(row)
print(row.col)
```

**Output:**

```
+------+-------+
| Name | Value |
+------+-------+
| col | a |
+------+-------+
a
+------+-------+
| Name | Value |
+------+-------+
| col | b |
+------+-------+
b
+------+-------+
| Name | Value |
+------+-------+
| col | c |
+------+-------+
c
```

By iterating over `DataMatrix.columns`

, you get successive `(column_name, column)`

tuples.

```
for colname, col in dm.columns:
print('%s = %s' % (colname, col))
```

**Output:**

```
col = col[u'a', u'b', u'c']
```

By iterating over a column, you get successive cells:

```
for cell in dm.col:
print(cell)
```

**Output:**

```
a
b
c
```

By iterating over a `Row`

object, you get (`column_name, cell`

) tuples:

```
row = dm[0] # Get the first row
for colname, cell in row:
print('%s = %s' % (colname, cell))
```

**Output:**

```
col = a
```

### Selecting data

You can select by directly comparing columns to values. This returns a new `DataMatrix`

object with only the selected rows.

```
dm = DataMatrix(length=10)
dm.col = range(10)
dm_subset = dm.col > 5
print(dm_subset)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 6 | 6 |
| 7 | 7 |
| 8 | 8 |
| 9 | 9 |
+---+-----+
```

You can select by multiple criteria using the `|`

(or), `&`

(and), and `^`

(xor) operators (but not the actual words 'and' and 'or'). Note the parentheses, which are necessary because `|`

and `&`

have priority over other operators.

```
dm_subset = (dm.col < 1) | (dm.col > 8)
print(dm_subset)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 0 | 0 |
| 9 | 9 |
+---+-----+
```

```
dm_subset = (dm.col > 1) & (dm.col < 8)
print(dm_subset)
```

**Output:**

```
+---+-----+
| # | col |
+---+-----+
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
| 6 | 6 |
| 7 | 7 |
+---+-----+
```

### Basic column operations (multiplication, addition, etc.)

You can apply basic mathematical operations on all cells in a column simultaneously. Cells with non-numeric values are ignored, except by the `+`

operator, which then results in concatenation.

```
dm = DataMatrix(length=3)
dm.col = 0, 'a', 20
dm.col2 = dm.col*.5
dm.col3 = dm.col+10
dm.col4 = dm.col-10
dm.col5 = dm.col/50
print(dm)
```

**Output:**

```
+---+-----+------+------+------+------+
| # | col | col2 | col3 | col4 | col5 |
+---+-----+------+------+------+------+
| 0 | 0 | 0.0 | 10 | -10 | 0.0 |
| 1 | a | a | a10 | a | a |
| 2 | 20 | 10.0 | 30 | 10 | 0.4 |
+---+-----+------+------+------+------+
```

## Working numeric data (requires numpy)

If you do not specify a column type (as in the examples above), the `MixedColumn`

will be used. When you work with large amounts of numeric data, you can use the `IntColumn`

or `FloatColumn`

to improve performance. These columns are built on top of `numpy`

arrays.

```
import numpy as np
from matplotlib import pyplot as plt
from datamatrix import IntColumn, FloatColumn
dm = DataMatrix(length=1000)
dm.x = IntColumn # Initialized with all 0 values
dm.x = np.arange(0, 1000)
dm.y = FloatColumn
dm.y = np.sin(np.linspace(0, 2*np.pi, 1000))
plt.plot(dm.x, dm.y)
plt.savefig('content/pages/img/basic/sinewave.png')
```

## Working with continuous data (requires numpy)

The `SeriesColumn`

is 2 dimensional; that is, each cell is by itself an array of values. Therefore, the `SeriesColumn`

can be used to work with sets of continuous data, such as EEG or eye-position traces.

For more information about series, see:

```
import numpy as np
from matplotlib import pyplot as plt
from datamatrix import SeriesColumn
length = 10 # Number of traces
depth = 50 # Size of each trace
x = np.linspace(0, 2*np.pi, depth)
sinewave = np.sin(x)
noise = np.random.random(depth)*2-1
dm = DataMatrix(length=length)
dm.series = SeriesColumn(depth=depth)
dm.series[0] = noise
dm.series[1:].setallrows(sinewave)
dm.series[1:] *= np.linspace(-1, 1, 9)
plt.xlim(x.min(), x.max())
plt.plot(x, dm.series.plottable, color='green', linestyle=':')
y1 = dm.series.mean-dm.series.std
y2 = dm.series.mean+dm.series.std
plt.fill_between(x, y1, y2, alpha=.2, color='blue')
plt.plot(x, dm.series.mean, color='blue')
plt.savefig('content/pages/img/basic/sinewave-series.png')
```