Tutorial 3: Cross-Task and Cross-Session Data Merging for Unified CAPs Analysis#
merge_dicts() combines timeseries data from different tasks and sessions, enabling analyses that identify similar CAPs across these tasks, sessions, or both. This is only useful when the tasks and sessions includes the same subjects. This function produces a merged dictionary only containing subject IDs present across all input dictionaries. Additionally, while the run IDs across task do not need to be similar, the timeseries of the same run-IDs across dictionaries will be appended. Note
that successful merging requires all dictionaries to contain the same number of columns/ROIs.
[1]:
# Download packages
try:
import neurocaps
except:
!pip install neurocaps[windows,demo]
[2]:
import numpy as np
from neurocaps.analysis import merge_dicts
from neurocaps.utils import simulate_subject_timeseries
np.random.seed(0)
# Simulate two subject_timeseries dictionaries with different number of subjects and timepoints
subject_timeseries_session_pre = simulate_subject_timeseries(n_subs=8, n_runs=3, shape=(200, 400))
subject_timeseries_session_post = simulate_subject_timeseries(n_subs=6, n_runs=2, shape=(100, 400))
# The subject_timeseries_list also takes pickle files and can save the modified dictionaries
# as pickle files too.
subject_timeseries_merged = merge_dicts(
subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
return_merged_dict=True,
return_reduced_dicts=False,
)
for subj_id in subject_timeseries_merged["merged"]:
for run_id in subject_timeseries_merged["merged"][subj_id]:
timeseries = subject_timeseries_merged["merged"][subj_id][run_id]
print(f"sub-{subj_id}; {run_id} shape is {timeseries.shape}")
sub-0; run-0 shape is (300, 400)
sub-0; run-1 shape is (300, 400)
sub-0; run-2 shape is (200, 400)
sub-1; run-0 shape is (300, 400)
sub-1; run-1 shape is (300, 400)
sub-1; run-2 shape is (200, 400)
sub-2; run-0 shape is (300, 400)
sub-2; run-1 shape is (300, 400)
sub-2; run-2 shape is (200, 400)
sub-3; run-0 shape is (300, 400)
sub-3; run-1 shape is (300, 400)
sub-3; run-2 shape is (200, 400)
sub-4; run-0 shape is (300, 400)
sub-4; run-1 shape is (300, 400)
sub-4; run-2 shape is (200, 400)
sub-5; run-0 shape is (300, 400)
sub-5; run-1 shape is (300, 400)
sub-5; run-2 shape is (200, 400)
[3]:
# The original dictionaries can also be returned too. The only modifications done is that the
# originals will only contain the subjects present across all dictionaries in the list. Note that
# the "dict_#" IDs correspond to the index that the subject timeseries are in
# `subject_timeseries_list`.
merged_dicts = merge_dicts(
subject_timeseries_list=[subject_timeseries_session_pre, subject_timeseries_session_post],
return_merged_dict=True,
return_reduced_dicts=True,
)
for dict_id in merged_dicts:
for subj_id in merged_dicts[dict_id]:
for run_id in merged_dicts[dict_id][subj_id]:
timeseries = merged_dicts[dict_id][subj_id][run_id]
print(f"For {dict_id} sub-{subj_id}; {run_id} shape is {timeseries.shape}")
For dict_0 sub-0; run-0 shape is (200, 400)
For dict_0 sub-0; run-1 shape is (200, 400)
For dict_0 sub-0; run-2 shape is (200, 400)
For dict_0 sub-1; run-0 shape is (200, 400)
For dict_0 sub-1; run-1 shape is (200, 400)
For dict_0 sub-1; run-2 shape is (200, 400)
For dict_0 sub-2; run-0 shape is (200, 400)
For dict_0 sub-2; run-1 shape is (200, 400)
For dict_0 sub-2; run-2 shape is (200, 400)
For dict_0 sub-3; run-0 shape is (200, 400)
For dict_0 sub-3; run-1 shape is (200, 400)
For dict_0 sub-3; run-2 shape is (200, 400)
For dict_0 sub-4; run-0 shape is (200, 400)
For dict_0 sub-4; run-1 shape is (200, 400)
For dict_0 sub-4; run-2 shape is (200, 400)
For dict_0 sub-5; run-0 shape is (200, 400)
For dict_0 sub-5; run-1 shape is (200, 400)
For dict_0 sub-5; run-2 shape is (200, 400)
For dict_1 sub-0; run-0 shape is (100, 400)
For dict_1 sub-0; run-1 shape is (100, 400)
For dict_1 sub-1; run-0 shape is (100, 400)
For dict_1 sub-1; run-1 shape is (100, 400)
For dict_1 sub-2; run-0 shape is (100, 400)
For dict_1 sub-2; run-1 shape is (100, 400)
For dict_1 sub-3; run-0 shape is (100, 400)
For dict_1 sub-3; run-1 shape is (100, 400)
For dict_1 sub-4; run-0 shape is (100, 400)
For dict_1 sub-4; run-1 shape is (100, 400)
For dict_1 sub-5; run-0 shape is (100, 400)
For dict_1 sub-5; run-1 shape is (100, 400)
For merged sub-0; run-0 shape is (300, 400)
For merged sub-0; run-1 shape is (300, 400)
For merged sub-0; run-2 shape is (200, 400)
For merged sub-1; run-0 shape is (300, 400)
For merged sub-1; run-1 shape is (300, 400)
For merged sub-1; run-2 shape is (200, 400)
For merged sub-2; run-0 shape is (300, 400)
For merged sub-2; run-1 shape is (300, 400)
For merged sub-2; run-2 shape is (200, 400)
For merged sub-3; run-0 shape is (300, 400)
For merged sub-3; run-1 shape is (300, 400)
For merged sub-3; run-2 shape is (200, 400)
For merged sub-4; run-0 shape is (300, 400)
For merged sub-4; run-1 shape is (300, 400)
For merged sub-4; run-2 shape is (200, 400)
For merged sub-5; run-0 shape is (300, 400)
For merged sub-5; run-1 shape is (300, 400)
For merged sub-5; run-2 shape is (200, 400)
CAPs can be derived using the merged subject timeseries data. This analysis will identify CAPs present across session or tasks.
[4]:
from neurocaps.analysis import CAP
cap_analysis = CAP()
# Deriving CAPs from the merged timeseries data
cap_analysis.get_caps(
merged_dicts["merged"],
n_clusters=range(2, 8),
cluster_selection_method="davies_bouldin",
show_figs=True,
)
2025-07-21 20:42:25,946 neurocaps.analysis.cap._internals.cluster [INFO] No groups specified. Using default group 'All Subjects' containing all subject IDs from `subject_timeseries`. The `groups` dictionary will remain fixed unless the `CAP` class is re-initialized or `clear_groups()` is used.
2025-07-21 20:42:26,543 neurocaps.analysis.cap._internals.cluster [INFO] [GROUP: All Subjects | METHOD: davies_bouldin] Optimal cluster size is 7.
[4]:
<neurocaps.analysis.cap.cap.CAP at 0x2036ddc6f10>
Then each reduced subject timeseries (representing a session or task) can be used to compute the temporal dynamics of the previously identified CAPs from the merged timeseries. These files can then be used to perform analyses assessing how to the same CAPs changed across time, tasks, or both time and tasks. Note that if standardize was set to True in CAP.get_caps(), then the column (ROI) means and standard deviations computed from the concatenated data used to obtain the CAPs are also
used to standardize each subject in the timeseries data inputted into CAP.calculate_metrics(). This ensures proper CAP assignments for each subjects frames.
[5]:
cap_analysis.calculate_metrics(
merged_dicts["dict_0"],
continuous_runs=False,
metrics=["persistence"],
output_dir="neurocaps_demo",
prefix_filename="session-pre",
)["persistence"]
[5]:
| Subject_ID | Group | Run | CAP-1 | CAP-2 | CAP-3 | CAP-4 | CAP-5 | CAP-6 | CAP-7 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | All Subjects | run-0 | 1.241379 | 1.080000 | 1.178571 | 1.166667 | 1.259259 | 1.055556 | 1.111111 |
| 1 | 0 | All Subjects | run-1 | 1.133333 | 1.142857 | 1.192308 | 1.200000 | 1.095238 | 1.176471 | 1.200000 |
| 2 | 0 | All Subjects | run-2 | 1.120000 | 1.150000 | 1.130435 | 1.269231 | 1.333333 | 1.035714 | 1.260870 |
| 3 | 1 | All Subjects | run-0 | 1.033333 | 1.150000 | 1.074074 | 1.096774 | 1.160000 | 1.052632 | 1.172414 |
| 4 | 1 | All Subjects | run-1 | 1.259259 | 1.260870 | 1.269231 | 1.200000 | 1.083333 | 1.000000 | 1.105263 |
| 5 | 1 | All Subjects | run-2 | 1.285714 | 1.120000 | 1.291667 | 1.076923 | 1.105263 | 1.037037 | 1.166667 |
| 6 | 2 | All Subjects | run-0 | 1.321429 | 1.294118 | 1.214286 | 1.240000 | 1.227273 | 1.095238 | 1.130435 |
| 7 | 2 | All Subjects | run-1 | 1.205882 | 1.095238 | 1.125000 | 1.200000 | 1.192308 | 1.058824 | 1.153846 |
| 8 | 2 | All Subjects | run-2 | 1.137931 | 1.166667 | 1.150000 | 1.272727 | 1.142857 | 1.333333 | 1.150000 |
| 9 | 3 | All Subjects | run-0 | 1.290323 | 1.142857 | 1.166667 | 1.320000 | 1.181818 | 1.300000 | 1.217391 |
| 10 | 3 | All Subjects | run-1 | 1.300000 | 1.333333 | 1.050000 | 1.200000 | 1.272727 | 1.083333 | 1.066667 |
| 11 | 3 | All Subjects | run-2 | 1.088235 | 1.103448 | 1.105263 | 1.206897 | 1.083333 | 1.055556 | 1.111111 |
| 12 | 4 | All Subjects | run-0 | 1.206897 | 1.160000 | 1.238095 | 1.250000 | 1.080000 | 1.238095 | 1.185185 |
| 13 | 4 | All Subjects | run-1 | 1.360000 | 1.250000 | 1.130435 | 1.040000 | 1.318182 | 1.190476 | 1.304348 |
| 14 | 4 | All Subjects | run-2 | 1.166667 | 1.058824 | 1.222222 | 1.166667 | 1.031250 | 1.000000 | 1.178571 |
| 15 | 5 | All Subjects | run-0 | 1.250000 | 1.178571 | 1.142857 | 1.153846 | 1.117647 | 1.058824 | 1.047619 |
| 16 | 5 | All Subjects | run-1 | 1.320000 | 1.631579 | 1.083333 | 1.321429 | 1.200000 | 1.181818 | 1.000000 |
| 17 | 5 | All Subjects | run-2 | 1.250000 | 1.217391 | 1.166667 | 1.120000 | 1.230769 | 1.100000 | 1.230769 |
Note that due to each subject only having a single run, the run names do not change to “run-continuous”.
[6]:
cap_analysis.calculate_metrics(
merged_dicts["dict_1"],
continuous_runs=True,
metrics=["persistence"],
output_dir="neurocaps_demo",
prefix_filename="session-post",
)["persistence"]
[6]:
| Subject_ID | Group | Run | CAP-1 | CAP-2 | CAP-3 | CAP-4 | CAP-5 | CAP-6 | CAP-7 | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 0 | All Subjects | run-continuous | 1.115385 | 1.125000 | 1.285714 | 1.240000 | 1.150000 | 1.040000 | 1.156250 |
| 1 | 1 | All Subjects | run-continuous | 1.206897 | 1.100000 | 1.142857 | 1.238095 | 1.074074 | 1.130435 | 1.153846 |
| 2 | 2 | All Subjects | run-continuous | 1.304348 | 1.307692 | 1.200000 | 1.045455 | 1.259259 | 1.294118 | 1.100000 |
| 3 | 3 | All Subjects | run-continuous | 1.153846 | 1.173913 | 1.208333 | 1.500000 | 1.238095 | 1.100000 | 1.227273 |
| 4 | 4 | All Subjects | run-continuous | 1.350000 | 1.360000 | 1.250000 | 1.200000 | 1.117647 | 1.235294 | 1.235294 |
| 5 | 5 | All Subjects | run-continuous | 1.241379 | 1.294118 | 1.086957 | 1.080000 | 1.250000 | 1.272727 | 1.157895 |