Statistics
Compatibility with Pandas and StatsModels
statsmodels
is a Python library for statistics. It relies heavily on pandas.DataFrame
objects. However, it is easy to use these two libraries in combination with DataMatrix
objects.
Creating a pivot table
A pivot table is a table that contains aggregate data that is grouped in a certain way. For example, the data used below1 is from a behavioral experiment in which participants, coded by subject_nr
, pressed a key on each trial. The key-press response time is stored as RT_search
. The experiment had different experimental conditions: condition
and load
.
A common way to summarize this kind of data is to put each participant in a different row, and each condition in a different column. The cells then contain the mean response time for a specific participant in a specific condition. That's a pivot table!
You can create a pivot table with pandas.pivot_table()
. This function accepts a pandas.DataFrame
as first argument, and also returns a pandas.DataFrame
. By wrapping this function with the datamatrix.convert.wrap_pandas()
decorator, you can modify the function so that it works with DataMatrix
objects instead.
- https://pandas.pydata.org/pandas-docs/stable/generated/pandas.pivot_table.html
- https://datamatrix.cogsci.nl/0.9/convert
This sounds complicated, but it's actually really simple:
from datamatrix import io, convert as cnv
from pandas import pivot_table
pivot_table = cnv.wrap_pandas(pivot_table) # Make compatible with DataMatrix
dm = io.readtxt('data/fratescu-replication-data-exp1.csv')
pm = pivot_table(
dm,
values='RT_search',
index='subject_nr',
columns=['condition', 'load']
)
print(pm)
Output:
+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| # | _rel-match_1 | _rel-match_2 | _rel-mis_1 | _rel-mis_2 | no_1 | no_2 |
+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
| 0 | 721.4264335302757 | 678.62248420715 | 685.3204861021579 | 703.0902484367584 | 663.93980185195 | 640.9299055734667 |
| 1 | 1.049748E+03 | 1.086552E+03 | 1.026647E+03 | 1.073280E+03 | 1.043109E+03 | 1.073110E+03 |
| 2 | 747.1289429170688 | 736.2263889634912 | 738.6820358142281 | 744.1116293270499 | 683.7063603479663 | 721.6051816940831 |
| 3 | 721.5824018210506 | 704.0504122852334 | 724.5789777125425 | 667.0987650908792 | 631.9008166862757 | 621.8372427916667 |
| 4 | 1.066829E+03 | 1.101238E+03 | 1.079018E+03 | 1.056681E+03 | 1.040665E+03 | 1.032087E+03 |
| 5 | 944.8047085385438 | 877.7510489187414 | 939.1616570818946 | 913.7586951925085 | 784.30217138565 | 781.7352213120665 |
| 6 | 778.3312959187457 | 767.7643134674997 | 807.1390118517368 | 750.4302243054747 | 714.1548334541525 | 707.1057035211404 |
| 7 | 645.0043094569484 | 622.780356490842 | 657.6860411125612 | 608.2251920538812 | 612.3927894391929 | 617.6175157229669 |
| 8 | 604.8138705167639 | 581.2789082527502 | 611.6210301716668 | 596.9222869191427 | 558.924202191678 | 541.8257834547288 |
| 9 | 648.7648608321693 | 597.8963051811694 | 606.5543829384917 | 601.6104015811035 | 591.4159889879137 | 547.5444670381725 |
| 10 | 925.7685151588449 | 927.5271563695864 | 988.1638811341933 | 917.9524664292982 | 820.8943605424137 | 803.0527729099997 |
| 11 | 840.0069568102066 | 779.0747596095669 | 795.9650817147334 | 712.3922798413394 | 729.0225658726833 | 719.8863909229138 |
| 12 | 785.4290803274558 | 829.8763324478275 | 880.3699945995425 | 852.3502020999309 | 744.7291530412242 | 804.1836475503965 |
| 13 | 811.9313416795176 | 768.2711858095346 | 822.6864349627 | 786.2689597969495 | 651.666963199417 | 686.4645410935345 |
| 14 | 639.2940345528929 | 608.4262710005933 | 638.9515802488793 | 660.65921024839 | 602.5885415030835 | 576.7948877834484 |
| 15 | 1.146967E+03 | 1.079700E+03 | 1.191826E+03 | 1.078569E+03 | 1.045561E+03 | 1.122266E+03 |
| 16 | 681.6418850396843 | 688.7799572607164 | 655.7796765503728 | 668.5310374830702 | 661.6116239557585 | 632.9321146466666 |
| 17 | 687.1236300064744 | 642.2102689742334 | 679.8497239748168 | 675.6232203099297 | 641.3620422626376 | 625.5237773313729 |
| 18 | 714.8987321726 | 686.699794571276 | 721.9998585333003 | 697.0831653612756 | 658.1154848856497 | 672.7054691563666 |
| 19 | 693.6501065890332 | 681.1448272906491 | 729.0235611430351 | 708.9501084952931 | 667.5677975018832 | 675.5407358470878 |
+----+-------------------+-------------------+-------------------+-------------------+-------------------+-------------------+
(+ 3 columns not shown)
(+ 36 rows not shown)
Running a repeated measures ANOVA
A repeated measures ANOVA is a type of statistical analysis for within-subject designs, such as the one used for this tutorial. You typically run a repeated measures ANOVA on a dataset where one person contributes multiple data points.
You can perform a repeated measures ANVOA with statsmodels.stats.anova.AnovaRM
.
Let's see how this works:
from statsmodels.stats.anova import AnovaRM
AnovaRM = cnv.wrap_pandas(AnovaRM) # Make compatible with DataMatrix
aov = AnovaRM(
dm,
depvar='RT_search',
subject='subject_nr',
within=['condition', 'load'],
aggregate_func='mean'
)
print(aov.fit())
Output:
Anova
=============================================
Num DF Den DF F Value Pr > F
---------------------------------------------
condition 3.0000 165.0000 84.0678 0.0000
load 1.0000 55.0000 29.8054 0.0000
condition:load 3.0000 165.0000 8.4493 0.0000
=============================================
-
The example data is adapted from Frătescu et al. (2018), Experiment 1. ↩