Visit http://datamatrix.cogsci.nl/statistics for the latest documentation

Statistics

Compatibility with Pandas and StatsModels
Creating a pivot table
Running a repeated measures ANOVA

Compatibility with Pandas and StatsModels

statsmodels is a Python library for statistics. It relies heavily on pandas.DataFrame objects. However, it is easy to use these two libraries in combination with DataMatrix objects.

Creating a pivot table

A pivot table is a table that contains aggregate data that is grouped in a certain way. For example, the data used below¹ is from a behavioral experiment in which participants, coded by subject_nr, pressed a key on each trial. The key-press response time is stored as RT_search. The experiment had different experimental conditions: condition and load.

A common way to summarize this kind of data is to put each participant in a different row, and each condition in a different column. The cells then contain the mean response time for a specific participant in a specific condition. That's a pivot table!

You can create a pivot table with pandas.pivot_table(). This function accepts a pandas.DataFrame as first argument, and also returns a pandas.DataFrame. By wrapping this function with the datamatrix.convert.wrap_pandas() decorator, you can modify the function so that it works with DataMatrix objects instead.

This sounds complicated, but it's actually really simple:

from datamatrix import io, convert as cnv
from pandas import pivot_table
pivot_table = cnv.wrap_pandas(pivot_table)  # Make compatible with DataMatrix

dm = io.readtxt('data/fratescu-replication-data-exp1.csv')
pm = pivot_table(
    dm,
    values='RT_search',
    index='subject_nr',
    columns=['condition', 'load']
)
print(pm)

Output:

+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| #  |    _rel-match_1   |    _rel-match_2   |     _rel-mis_1    |     _rel-mis_2    |        no_1       |        no_2       |
+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| 0  | 721.4264335302757 |  678.62248420715  | 685.3204861021579 | 703.0902484367584 |  663.93980185195  | 640.9299055734667 |
| 1  |    1.049748E+03   |    1.086552E+03   |    1.026647E+03   |    1.073280E+03   |    1.043109E+03   |    1.073110E+03   |
| 2  | 747.1289429170688 | 736.2263889634912 | 738.6820358142281 | 744.1116293270499 | 683.7063603479663 | 721.6051816940831 |
| 3  | 721.5824018210506 | 704.0504122852334 | 724.5789777125425 | 667.0987650908792 | 631.9008166862757 | 621.8372427916667 |
| 4  |    1.066829E+03   |    1.101238E+03   |    1.079018E+03   |    1.056681E+03   |    1.040665E+03   |    1.032087E+03   |
| 5  | 944.8047085385438 | 877.7510489187414 | 939.1616570818946 | 913.7586951925085 |  784.30217138565  | 781.7352213120665 |
| 6  | 778.3312959187457 | 767.7643134674997 | 807.1390118517368 | 750.4302243054747 | 714.1548334541525 | 707.1057035211404 |
| 7  | 645.0043094569484 |  622.780356490842 | 657.6860411125612 | 608.2251920538812 | 612.3927894391929 | 617.6175157229669 |
| 8  | 604.8138705167639 | 581.2789082527502 | 611.6210301716668 | 596.9222869191427 |  558.924202191678 | 541.8257834547288 |
| 9  | 648.7648608321693 | 597.8963051811694 | 606.5543829384917 | 601.6104015811035 | 591.4159889879137 | 547.5444670381725 |
| 10 | 925.7685151588449 | 927.5271563695864 | 988.1638811341933 | 917.9524664292982 | 820.8943605424137 | 803.0527729099997 |
| 11 | 840.0069568102066 | 779.0747596095669 | 795.9650817147334 | 712.3922798413394 | 729.0225658726833 | 719.8863909229138 |
| 12 | 785.4290803274558 | 829.8763324478275 | 880.3699945995425 | 852.3502020999309 | 744.7291530412242 | 804.1836475503965 |
| 13 | 811.9313416795176 | 768.2711858095346 |   822.6864349627  | 786.2689597969495 |  651.666963199417 | 686.4645410935345 |
| 14 | 639.2940345528929 | 608.4262710005933 | 638.9515802488793 |  660.65921024839  | 602.5885415030835 | 576.7948877834484 |
| 15 |    1.146967E+03   |    1.079700E+03   |    1.191826E+03   |    1.078569E+03   |    1.045561E+03   |    1.122266E+03   |
| 16 | 681.6418850396843 | 688.7799572607164 | 655.7796765503728 | 668.5310374830702 | 661.6116239557585 | 632.9321146466666 |
| 17 | 687.1236300064744 | 642.2102689742334 | 679.8497239748168 | 675.6232203099297 | 641.3620422626376 | 625.5237773313729 |
| 18 |   714.8987321726  |  686.699794571276 | 721.9998585333003 | 697.0831653612756 | 658.1154848856497 | 672.7054691563666 |
| 19 | 693.6501065890332 | 681.1448272906491 | 729.0235611430351 | 708.9501084952931 | 667.5677975018832 | 675.5407358470878 |
+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
(+ 3 columns not shown)
(+ 36 rows not shown)

Running a repeated measures ANOVA

A repeated measures ANOVA is a type of statistical analysis for within-subject designs, such as the one used for this tutorial. You typically run a repeated measures ANOVA on a dataset where one person contributes multiple data points.

You can perform a repeated measures ANVOA with statsmodels.stats.anova.AnovaRM.

http://www.statsmodels.org/stable/generated/statsmodels.stats.anova.AnovaRM.html

Let's see how this works:

from statsmodels.stats.anova import AnovaRM
AnovaRM = cnv.wrap_pandas(AnovaRM)  # Make compatible with DataMatrix

aov = AnovaRM(
    dm,
    depvar='RT_search',
    subject='subject_nr',
    within=['condition', 'load'],
    aggregate_func='mean'
)
print(aov.fit())

Output:

                    Anova
=============================================
               Num DF  Den DF  F Value Pr > F
---------------------------------------------
condition      3.0000 165.0000 84.0678 0.0000
load           1.0000  55.0000 29.8054 0.0000
condition:load 3.0000 165.0000  8.4493 0.0000
=============================================

The example data is adapted from Frătescu et al. (2018), Experiment 1. ↩