Advertisement

datamatrix.functional

A set of functions and decorators for functional programming.

What is functional programming?

Functional programming is a style of programming that is characterized by the following:

  • Lack of statements—In its purest form, functional programming does not use any statements. Statements are things like assignments (e.g. x = 1), for loops, if statements, etc. Instead of statements, functional programs are chains of function calls.
  • Short functions—In the purest form of functional programming, each function is a single expression. In Python, this can be implemented through lambda expressions.
  • Referential transparency—Functions are referentially transparent when they always return the same result given the same set of arguments (i.e. they are stateless), and when they do not alter the state of the program (i.e. they have no side effects).

function curry(fnc)

A currying decorator that turns a function with multiple arguments into a chain of partial functions, each of which takes at least a single argument. The input function may accept keywords, but the output function no longer does (i.e. currying turns all keywords into positional arguments).

Example:

from datamatrix import functional as fnc

@fnc.curry
def add(a, b, c):

       return a + b + c

print(add(1)(2)(3)) # Curried approach with single arguments
print(add(1, 2)(3)) # Partly curried approach
print(add(1)(2, 3)) # Partly curried approach
print(add(1, 2, 3)) # Original approach multiple arguments

Output:

6
6
6
6

Arguments:

  • fnc -- A function to curry.
    • Type: callable

Returns:

A curried function that accepts at least the first argument, and returns a function that accepts the second argument, etc.

function filter_(fnc, obj)

Filters rows from a datamatrix or column based on filter function (fnc).

If obj is a column, fnc should be a function that accepts a single value. If obj is a datamatrix, fnc should be a function that accepts a keyword dict, where column names are keys and cells are values. In both cases, fnc should return a bool indicating whether the row or value should be included.

Example:

from datamatrix import DataMatrix, functional as fnc

dm = DataMatrix(length=5)
dm.col = range(5)
# Create a column with only odd values
col_new = fnc.filter_(lambda x: x % 2, dm.col)
print(col_new)
# Create a new datamatrix with only odd values in col
dm_new = fnc.filter_(lambda **d: d['col'] % 2, dm)
print(dm_new)

Output:

col[1, 3]
+---+-----+
| # | col |
+---+-----+
| 1 |  1  |
| 3 |  3  |
+---+-----+

Arguments:

  • fnc -- A filter function.
    • Type: callable
  • obj -- A datamatrix or column to filter.
    • Type: BaseColumn, DataMatrix

Returns:

A new column or datamatrix.

  • Type: BaseColumn, DataMatrix

function map_(fnc, obj)

Maps a function (fnc) onto rows of datamatrix or cells of a column.

If obj is a column, the function fnc is mapped is mapped onto each cell of the column, and a new column is returned. In this case, fnc should be a function that accepts and returns a single value.

If obj is a datamatrix, the function fnc is mapped onto each row, and a new datamatrix is returned. In this case, fnc should be a function that accepts a keyword dict, where column names are keys and cells are values. The return value should be another dict, again with column names as keys, and cells as values. Columns that are not part of the returned dict are left unchanged.

Example:

from datamatrix import DataMatrix, functional as fnc

dm = DataMatrix(length=3)
dm.old = 0, 1, 2
# Map a 2x function onto dm.old to create dm.new
dm.new = fnc.map_(lambda i: i*2, dm.old)
print(dm)
# Map a 2x function onto the entire dm to create dm_new, using a fancy
# dict comprehension wrapped inside a lambda function.
dm_new = fnc.map_(
       lambda **d: {col : 2*val for col, val in d.items()},
       dm)
print(dm_new)

Output:

+---+-----+-----+
| # | new | old |
+---+-----+-----+
| 0 |  0  |  0  |
| 1 |  2  |  1  |
| 2 |  4  |  2  |
+---+-----+-----+
+---+-----+-----+
| # | new | old |
+---+-----+-----+
| 0 |  0  |  0  |
| 1 |  4  |  2  |
| 2 |  8  |  4  |
+---+-----+-----+

Arguments:

  • fnc -- A function to map onto each row or each cell.
    • Type: callable
  • obj -- A datamatrix or column to map fnc onto.
    • Type: BaseColumn, DataMatrix

Returns:

A new column or datamatrix.

  • Type: BaseColumn, DataMatrix

function memoize(fnc=None, key=None, persistent=False, lazy=False, debug=False)

A memoization decorator that stores the result of a function call, and returns the stored value when the function is called again with the same arguments. That is, memoization is a specific kind of caching that improves performance for expensive function calls.

This decorator only works for arguments and return values that can be serialized (i.e. arguments that you can pickle).

To clear memoization, either pass ~[function name] as a command line argument to a script, or pass memoclear=True as a keyword to the memoized function (not to the decorator).

For a more detailed description, see:

Example:

from datamatrix import functional as fnc

@fnc.memoize
def add(a, b):

       print('add(%d, %d)' % (a, b))
       return a + b

three = add(1, 2) # Storing result in memory
three = add(1, 2) # Re-using previous result
three = add(1, 2, memoclear=True) # Clear cache!

@fnc.memoize(persistent=True, key='persistent-add')
def persistent_add(a, b):

       print('persistent_add(%d, %d)' % (a, b))
       return a + b

three = persistent_add(1, 2) # Writing result to disk
three = persistent_add(1, 2) # Re-using previous result        

Output:

add(1, 2)
add(1, 2)
persistent_add(1, 2)

Keywords:

  • fnc -- A function to memoize.
    • Type: callable
    • Default: None
  • key -- Indicates a key that identifies the results. If no key is provided, a key is generated based on the function name, and the arguments passed to the function. However, this requires the arguments to be serialized, which can take some time.
    • Type: str, None
    • Default: None
  • persistent -- Indicates whether the result should be written to disk so that the result can be re-used when the script is run again. If set to True, the result is stored as a pickle in a .memoize subfolder of the working directory.
    • Type: bool
    • Default: False
  • lazy -- If True, any callable that is passed onto the memoized function is automatically called, and the memoized function receives the return value instead of the function object. This allows for lazy evaluation.
    • Type: bool
    • Default: False
  • debug -- If True, the memoized function returns a (retval, memkey, source) tuple, where retval is the function's return value, memkey is the key used for caching, and source is one of 'memory', 'disk', or 'function', indicating whether and how the return value was cached. This is mostly for debugging and testing.
    • Type: bool
    • Default: False

Returns:

A memoized version of fnc.

  • Type: callable

function setcol(dm, name, value)

Returns a new DataMatrix to which a column has been added or in which a column has been modified.

The main difference with regular assignment (dm.col = 'x') is that setcol() does not modify the original DataMatrix, and can be used in lambda expressions.

Example:

from datamatrix import DataMatrix, functional as fnc

dm1 = DataMatrix(length=5)
dm2 = fnc.setcol(dm1, 'y', range(5))
print(dm2)

Output:

+---+---+
| # | y |
+---+---+
| 0 | 0 |
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+---+---+

Arguments:

  • dm -- A DataMatrix.
    • Type: DataMatrix
  • name -- A column name.
    • Type: str
  • value -- The value to be assigned to the column. This can be any value this is valid for a regular column assignment.

Returns:

A new DataMatrix.

  • Type: DataMatrix