Tutorial 3: Cross-Task and Cross-Session Data Merging for Unified CAPs Analysis#

merge_dicts() combines timeseries data from different tasks and sessions, enabling analyses that identify similar CAPs across these tasks, sessions, or both. This is only useful when the tasks and sessions includes the same subjects. This function produces a merged dictionary only containing subject IDs present across all input dictionaries. Additionally, while the run IDs across task do not need to be similar, the timeseries of the same run-IDs across dictionaries will be appended. Note that successful merging requires all dictionaries to contain the same number of columns/ROIs.

[1]:

# Download packages
try:
    import neurocaps
except:
    !pip install neurocaps[windows,demo]

[2]:

import numpy as np

from neurocaps.analysis import merge_dicts
from neurocaps.utils import simulate_subject_timeseries

np.random.seed(0)

# Simulate two subject_timeseries dictionaries with different number of subjects and timepoints
subject_timeseries_session_pre = simulate_subject_timeseries(n_subs=8, n_runs=3, shape=(200, 400))
subject_timeseries_session_post = simulate_subject_timeseries(n_subs=6, n_runs=2, shape=(100, 400))

# The subject_timeseries_list also takes pickle files and can save the modified dictionaries
# as pickle files too.
subject_timeseries_merged = merge_dicts(
    subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
    return_merged_dict=True,
    return_reduced_dicts=False,
)

for subj_id in subject_timeseries_merged["merged"]:
    for run_id in subject_timeseries_merged["merged"][subj_id]:
        timeseries = subject_timeseries_merged["merged"][subj_id][run_id]
        print(f"sub-{subj_id}; {run_id} shape is {timeseries.shape}")

sub-0; run-0 shape is (300, 400)
sub-0; run-1 shape is (300, 400)
sub-0; run-2 shape is (200, 400)
sub-1; run-0 shape is (300, 400)
sub-1; run-1 shape is (300, 400)
sub-1; run-2 shape is (200, 400)
sub-2; run-0 shape is (300, 400)
sub-2; run-1 shape is (300, 400)
sub-2; run-2 shape is (200, 400)
sub-3; run-0 shape is (300, 400)
sub-3; run-1 shape is (300, 400)
sub-3; run-2 shape is (200, 400)
sub-4; run-0 shape is (300, 400)
sub-4; run-1 shape is (300, 400)
sub-4; run-2 shape is (200, 400)
sub-5; run-0 shape is (300, 400)
sub-5; run-1 shape is (300, 400)
sub-5; run-2 shape is (200, 400)

[3]:

# The original dictionaries can also be returned too. The only modifications done is that the
# originals will only contain the subjects present across all dictionaries in the list. Note that
# the "dict_#" IDs correspond to the index that the subject timeseries are in
# `subject_timeseries_list`.

merged_dicts = merge_dicts(
    subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
    return_merged_dict=True,
    return_reduced_dicts=True,
)

for dict_id in merged_dicts:
    for subj_id in merged_dicts[dict_id]:
        for run_id in merged_dicts[dict_id][subj_id]:
            timeseries = merged_dicts[dict_id][subj_id][run_id]
            print(f"For {dict_id} sub-{subj_id}; {run_id} shape is {timeseries.shape}")

For dict_0 sub-0; run-0 shape is (200, 400)
For dict_0 sub-0; run-1 shape is (200, 400)
For dict_0 sub-0; run-2 shape is (200, 400)
For dict_0 sub-1; run-0 shape is (200, 400)
For dict_0 sub-1; run-1 shape is (200, 400)
For dict_0 sub-1; run-2 shape is (200, 400)
For dict_0 sub-2; run-0 shape is (200, 400)
For dict_0 sub-2; run-1 shape is (200, 400)
For dict_0 sub-2; run-2 shape is (200, 400)
For dict_0 sub-3; run-0 shape is (200, 400)
For dict_0 sub-3; run-1 shape is (200, 400)
For dict_0 sub-3; run-2 shape is (200, 400)
For dict_0 sub-4; run-0 shape is (200, 400)
For dict_0 sub-4; run-1 shape is (200, 400)
For dict_0 sub-4; run-2 shape is (200, 400)
For dict_0 sub-5; run-0 shape is (200, 400)
For dict_0 sub-5; run-1 shape is (200, 400)
For dict_0 sub-5; run-2 shape is (200, 400)
For dict_1 sub-0; run-0 shape is (100, 400)
For dict_1 sub-0; run-1 shape is (100, 400)
For dict_1 sub-1; run-0 shape is (100, 400)
For dict_1 sub-1; run-1 shape is (100, 400)
For dict_1 sub-2; run-0 shape is (100, 400)
For dict_1 sub-2; run-1 shape is (100, 400)
For dict_1 sub-3; run-0 shape is (100, 400)
For dict_1 sub-3; run-1 shape is (100, 400)
For dict_1 sub-4; run-0 shape is (100, 400)
For dict_1 sub-4; run-1 shape is (100, 400)
For dict_1 sub-5; run-0 shape is (100, 400)
For dict_1 sub-5; run-1 shape is (100, 400)
For merged sub-0; run-0 shape is (300, 400)
For merged sub-0; run-1 shape is (300, 400)
For merged sub-0; run-2 shape is (200, 400)
For merged sub-1; run-0 shape is (300, 400)
For merged sub-1; run-1 shape is (300, 400)
For merged sub-1; run-2 shape is (200, 400)
For merged sub-2; run-0 shape is (300, 400)
For merged sub-2; run-1 shape is (300, 400)
For merged sub-2; run-2 shape is (200, 400)
For merged sub-3; run-0 shape is (300, 400)
For merged sub-3; run-1 shape is (300, 400)
For merged sub-3; run-2 shape is (200, 400)
For merged sub-4; run-0 shape is (300, 400)
For merged sub-4; run-1 shape is (300, 400)
For merged sub-4; run-2 shape is (200, 400)
For merged sub-5; run-0 shape is (300, 400)
For merged sub-5; run-1 shape is (300, 400)
For merged sub-5; run-2 shape is (200, 400)

CAPs can be derived using the merged subject timeseries data. This analysis will identify CAPs present across session or tasks.

[4]:

from neurocaps.analysis import CAP

cap_analysis = CAP()

# Deriving CAPs from the merged timeseries data
cap_analysis.get_caps(
    merged_dicts["merged"],
    n_clusters=range(2, 8),
    cluster_selection_method="davies_bouldin",
    show_figs=True,
)

2025-07-21 20:42:25,946 neurocaps.analysis.cap._internals.cluster [INFO] No groups specified. Using default group 'All Subjects' containing all subject IDs from `subject_timeseries`. The `groups` dictionary will remain fixed unless the `CAP` class is re-initialized or `clear_groups()` is used.
2025-07-21 20:42:26,543 neurocaps.analysis.cap._internals.cluster [INFO] [GROUP: All Subjects | METHOD: davies_bouldin] Optimal cluster size is 7.

[4]:

<neurocaps.analysis.cap.cap.CAP at 0x2036ddc6f10>

Then each reduced subject timeseries (representing a session or task) can be used to compute the temporal dynamics of the previously identified CAPs from the merged timeseries. These files can then be used to perform analyses assessing how to the same CAPs changed across time, tasks, or both time and tasks. Note that if standardize was set to True in CAP.get_caps(), then the column (ROI) means and standard deviations computed from the concatenated data used to obtain the CAPs are also used to standardize each subject in the timeseries data inputted into CAP.calculate_metrics(). This ensures proper CAP assignments for each subjects frames.

[5]:

cap_analysis.calculate_metrics(
    merged_dicts["dict_0"],
    continuous_runs=False,
    metrics=["persistence"],
    output_dir="neurocaps_demo",
    prefix_filename="session-pre",
)["persistence"]

[5]:

	Subject_ID	Group	Run	CAP-1	CAP-2	CAP-3	CAP-4	CAP-5	CAP-6	CAP-7
0	0	All Subjects	run-0	1.241379	1.080000	1.178571	1.166667	1.259259	1.055556	1.111111
1	0	All Subjects	run-1	1.133333	1.142857	1.192308	1.200000	1.095238	1.176471	1.200000
2	0	All Subjects	run-2	1.120000	1.150000	1.130435	1.269231	1.333333	1.035714	1.260870
3	1	All Subjects	run-0	1.033333	1.150000	1.074074	1.096774	1.160000	1.052632	1.172414
4	1	All Subjects	run-1	1.259259	1.260870	1.269231	1.200000	1.083333	1.000000	1.105263
5	1	All Subjects	run-2	1.285714	1.120000	1.291667	1.076923	1.105263	1.037037	1.166667
6	2	All Subjects	run-0	1.321429	1.294118	1.214286	1.240000	1.227273	1.095238	1.130435
7	2	All Subjects	run-1	1.205882	1.095238	1.125000	1.200000	1.192308	1.058824	1.153846
8	2	All Subjects	run-2	1.137931	1.166667	1.150000	1.272727	1.142857	1.333333	1.150000
9	3	All Subjects	run-0	1.290323	1.142857	1.166667	1.320000	1.181818	1.300000	1.217391
10	3	All Subjects	run-1	1.300000	1.333333	1.050000	1.200000	1.272727	1.083333	1.066667
11	3	All Subjects	run-2	1.088235	1.103448	1.105263	1.206897	1.083333	1.055556	1.111111
12	4	All Subjects	run-0	1.206897	1.160000	1.238095	1.250000	1.080000	1.238095	1.185185
13	4	All Subjects	run-1	1.360000	1.250000	1.130435	1.040000	1.318182	1.190476	1.304348
14	4	All Subjects	run-2	1.166667	1.058824	1.222222	1.166667	1.031250	1.000000	1.178571
15	5	All Subjects	run-0	1.250000	1.178571	1.142857	1.153846	1.117647	1.058824	1.047619
16	5	All Subjects	run-1	1.320000	1.631579	1.083333	1.321429	1.200000	1.181818	1.000000
17	5	All Subjects	run-2	1.250000	1.217391	1.166667	1.120000	1.230769	1.100000	1.230769

Note that due to each subject only having a single run, the run names do not change to “run-continuous”.

[6]:

cap_analysis.calculate_metrics(
    merged_dicts["dict_1"],
    continuous_runs=True,
    metrics=["persistence"],
    output_dir="neurocaps_demo",
    prefix_filename="session-post",
)["persistence"]

[6]:

	Subject_ID	Group	Run	CAP-1	CAP-2	CAP-3	CAP-4	CAP-5	CAP-6	CAP-7
0	0	All Subjects	run-continuous	1.115385	1.125000	1.285714	1.240000	1.150000	1.040000	1.156250
1	1	All Subjects	run-continuous	1.206897	1.100000	1.142857	1.238095	1.074074	1.130435	1.153846
2	2	All Subjects	run-continuous	1.304348	1.307692	1.200000	1.045455	1.259259	1.294118	1.100000
3	3	All Subjects	run-continuous	1.153846	1.173913	1.208333	1.500000	1.238095	1.100000	1.227273
4	4	All Subjects	run-continuous	1.350000	1.360000	1.250000	1.200000	1.117647	1.235294	1.235294
5	5	All Subjects	run-continuous	1.241379	1.294118	1.086957	1.080000	1.250000	1.272727	1.157895