TimeseriesExtractor.get_bold#
- TimeseriesExtractor.get_bold(bids_dir, task, session=None, runs=None, condition=None, condition_tr_shift=0, tr=None, slice_time_ref=0.0, run_subjects=None, exclude_subjects=None, exclude_niftis=None, pipeline_name=None, n_cores=None, parallel_log_config=None, verbose=True, flush=False, progress_bar=False)[source]#
Retrieve Preprocessed BOLD Data from BIDS Datasets.
Extracts the timeseries data from preprocessed BOLD images located in the derivatives folder of a BIDS-compliant dataset. The timeseries data of all subjects are appended to a single dictionary
self.subject_timeseries.Important
For proper querying, a “dataset_description.json” file must be located in the root of the BIDs directory and the pipeline directory (located in the derivatives folder).
Refer to NeuroCAPs’ BIDS Structure and Entities documentation for additional information about the expected directory structure and file naming scheme (entities) needed for querying.
This pipeline is most optimized for BOLD data preprocessed by fMRIPrep.
- Parameters:
bids_dir (
str) – Path to a BIDS compliant directory. A “dataset_description.json” file must be located in this directory or an error will be raised.task (
str) – Name of task to extract timeseries data from (i.e “rest”, “n-back”, etc).session (
int,str, orNone, default=None) – The session ID to extract timeseries data from. Only a single session can be extracted at a time. The value can be an integer (e.g.session=2) or a string (e.g.session="001").runs (
int,str,list[int],list[str], orNone, default=None) – List of run numbers to extract timeseries data from (e.g.runs=["000", "001"]). Extracts all runs if unspecified.condition (
strorNone, default=None) – Isolates the timeseries data corresponding to a specific condition (listed in the “trial_type” column of the “events.tsv” file) after the timeseries has been extracted and subjected to nuisance regression. Only a single condition can be extracted at a time.condition_tr_shift (
int, default=0) – Number of TR units to units to offset both the start and end scan indices of a condition to account for a fixed hemodynamic delay. This parameter only applies when aconditionis specified. For more details about how this offset affects the calculation of task conditions, see the “Extraction of Task Conditions” section below.tr (
int,floatorNone, default=None) – Repetition time (TR), in seconds, for the specified task. If not provided, the TR will be automatically extracted from the first BOLD metadata file found for the task, searching first in the pipeline directory, then in thebids_dirif not found.slice_time_ref (
intorfloat, default=0.0) – The reference slice expressed as a fraction of thetrthat is subtracted from the condition onset times to adjust for slice time correction whenconditionis not None. Values can range from 0 to 1. For more details, see the “Extraction of Task Conditions” section below.run_subjects (
str,list[str]orNone, default=None) – A string (if single subject) or list of subject IDs to process (e.g.run_subjects=["01", "02"]). Processes all subjects if None.exclude_subjects (
str,list[str]orNone, default=None) – A string (if single subject) or list of subject IDs to exclude (e.g.exclude_subjects=["01", "02"]).exclude_niftis (
str,list[str]orNone, default=None) – A string (if single file) or List of the specific preprocessed NIfTI files to exclude, preventing their timeseries data from being extracted. Used if there are specific runs across different participants that need to be excluded.pipeline_name (
strorNone, default=None) – The name of the pipeline folder in the derivatives folder containing the preprocessed data. Used if multiple pipeline folders exist in the derivatives folder or the pipeline folder is nested (e.g. “fmriprep/fmriprep-20.0.0”).n_cores (
intorNone, default=None) – The number of cores to use for multiprocessing with Joblib. The “loky” backend is used.parallel_log_config (
dict[str, multiprocessing.Manager.Queue | int]) –Passes a user-defined managed queue and logging level to the internal timeseries extraction function when parallel processing (
n_cores) is used. Available dictionary keys are:”queue”: The instance of
multiprocessing.Manager.Queueto pass toQueueHandler. If not specified, all logs will output tosys.stdout.”level”: The logging level (e.g.
logging.INFO,logging.WARNING). If not specified, the default level islogging.INFO.
Refer to the NeuroCAPs’ Logging documentation for a detailed example setting up this parameter.
verbose (
bool, default=True) – If True, logs detailed subject-specific information including: subjects skipped due to missing required files, current subject being processed for timeseries extraction, confounds identified for nuisance regression in addition to requested confounds that are missing for a subject, and additional warnings encountered during the timeseries extraction process.flush (
bool, default=False) – If True, flushes the logged subject-specific information produced during the timeseries extraction process.progress_bar (
bool, default=False) – If True, displays a progress bar.
- Returns:
self
- Raises:
BIDSQueryError – Occurs when subject IDs were not found during querying.
See also
neurocaps.typing.SubjectTimeseriesType definition representing the structure of the subject timeseries.
Important
Subject Timeseries Dictionary: This function stores the extracted timeseries of all subjects in the
subject_timeseriesproperty and can be deleted usingdel self.subject_timeseries(Note thatself.timeseries_to_pickle()andself.visualize_bold()need this property in order to be used). The structure is a dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as NumPy array. Refer to documentation forSubjectTimeseriesin the “See Also” section for an example structure.Data/Property Persistence: Each time this function is called, it’s associated properties such as
self.subject_timeseries,self.task_info,self.qc, etc, are automatically initialized/overwritten to create a clean state for the subsequent analysis. To save, the subject timeseries dictionary,self.timeseries_to_pickle()can be used. Additionally, to save the quality control dictionary,self.report_qc()can be used.NifTI Files Without “run-” Entity: By default, “run-0” will be used as a placeholder, if run IDs are not specified in the NifTI file.
Parcellation & Nuisance Regression: For timeseries extraction, nuisance regression, and spatial dimensionality reduction using a parcellation, Nilearn’s
NiftiLabelsMaskerfunction is used. If requested, dummy scans are removed from the NIfTI images and confound dataset prior to timeseries extraction. For volumes exceeding a specified framewise displacement (FD) threshold, if the “use_sample_mask” key in thefd_thresholddictionary is set to True, then a boolean sample mask is generated (where False indicates the high motion volumes) and passed to thesample_maskparameter in Nilearn’sNiftiLabelsMasker. If, “use_sample_mask” key is False or not specified in thefd_thresholddictionary, then censoring is done after nuisance regression, which is the default behavior.Extraction of Task Conditions: The formula used for computing the scan indices corresponding to the corresponding to a specific condition:
adjusted_onset = condition_df.loc[i, "onset"] - slice_time_ref * tr onset_scan = math.floor(adjusted_onset / tr) onset_scan += condition_tr_shift end_scan = math.ceil((adjusted_onset + condition_df.loc[i, "duration"]) / tr) end_scan += condition_tr_shift onset_scan = max([0, onset_scan]) end_scan = max([0, end_scan]) scans.extend(range(onset_scan, end_scan)) scans = sorted(list(set(scans)))
Changed in version 0.28.4: Max check done for
onset_scanandend_scaninstead ofadjusted_onset.When partial scans are computed,
math.flooris used to round down for the beginning scan index andmath.ceilis used to round up for the ending scan index. Negative scan indices are set to 0 to avoid unintentional negative indexing. For simplicity, note that whenslice_time_refandcondition_tr_shiftare 0, the formula simplifies to:start_scan = math.floor(onset / tr) end_scan = math.ceil((onset + duration) / tr) scans.extend(range(onset_scan, end_scan)) scans = sorted(list(set(scans)))
Filtering a specific condition from the timeseries is done after nuisance regression. The indices corresponding to the condition are used to extract the TRs (the timepoints that fall within the the event window(s) adjusted by the slice timing reference (
slice_time_ref) and a fixed hemodynamic delay (condition_tr_shift) if specified) from the timeseries.If the “use_sample_mask” key in the
fd_thresholddictionary is set to True, the truncated 2D timeseries is temporarily padded to ensure the correct rows corresponding to the condition are obtained.If the “interpolate” key in the
fd_thresholddictionary is set to True, interpolation is performed using the full timeseries data (excluding dummy volumes) to replace only the censored (high-motion) volumes. Then, the indices corresponding to the condition are extracted from the timeseries, excluding any frames that do not have non-censored data at both edges.