neurocaps.extraction.TimeseriesExtractor.get_bold
- TimeseriesExtractor.get_bold(bids_dir, task, session=None, runs=None, condition=None, tr=None, run_subjects=None, exclude_subjects=None, exclude_niftis=None, pipeline_name=None, n_cores=None, parallel_log_config=None, verbose=True, flush=False)[source]
Retrieve Preprocessed BOLD Data from BIDS Datasets.
This function uses pybids for querying and requires the BOLD data directory (specified in
bids_dir) to be BIDS-compliant, including a "dataset_description.json" file. It assumes the dataset contains a derivatives folder with BOLD data preprocessed using a standard pipeline, specifically fMRIPrep. The pipeline directory must also include a "dataset_description.json" file for proper querying.The timeseries data of all subjects are appended to a single dictionary
self.subject_timeseries. Additional information regarding the structure of this dictionary can be found in the "Note" section.Basic BIDS directory:
bids_root/ ├── dataset_description.json ├── sub-<subject_label>/ │ └── func/ │ └── *task-*_events.tsv ├── derivatives/ │ └── fmriprep-<version_label>/ │ ├── dataset_description.json │ └── sub-<subject_label>/ │ └── func/ │ ├── *confounds_timeseries.tsv │ ├── *brain_mask.nii.gz │ └── *preproc_bold.nii.gz
BIDS directory with session-level organization:
bids_root/ ├── dataset_description.json ├── sub-<subject_label>/ │ └── ses-<session_label>/ │ └── func/ │ └── *task-*_events.tsv ├── derivatives/ │ └── fmriprep-<version_label>/ │ ├── dataset_description.json │ └── sub-<subject_label>/ │ └── ses-<session_label>/ │ └── func/ │ ├── *confounds_timeseries.tsv │ ├── *brain_mask.nii.gz │ └── *preproc_bold.nii.gz
Note: Only the preprocessed BOLD file is required. Additional files such as the confounds tsv (needed for denoising), mask, and task timing tsv file (needed for filtering a specific task condition) depend on the specific analyses. As mentioned previously, the "dataset_description.json" is required in both the bids root and pipeline directories for querying with pybids.
This pipeline is most optimized for BOLD data preprocessed by fMRIPrep.
- Parameters:
bids_dir (
os.PathLike) -- Path to a BIDS compliant directory. A "dataset_description.json" file must be located in this directory or an error will be raised.task (
str) -- Name of task to extract timeseries data from (i.e "rest", "n-back", etc).session (
int,str, orNone, default=None) -- Session ID to extract timeseries data from. Only a single session can be extracted at a time. While files having session IDs are not mandatory, this parameter must be specified if the dataset has multiple sessions . Ifsessionis None and multiple sessions are detected when the preprocessed NifTI files are queried, an error will be raised. The value can be an integer (e.g.session=2) or a string (e.g.session="001").runs (
int,str,list[int],list[str], orNone, default=None) -- List of run numbers to extract timeseries data from. Extracts all runs if unspecified. For instance, extract only "run-0" and "run-1", useruns=[0, 1]. For non-integer run IDs, use strings:runs=["000", "001"].condition (
strorNone, default=None) -- Isolates the timeseries data corresponding to a specific condition, only after the timeseries has been extracted and subjected to nuisance regression. Only a single condition can be extracted at a time.tr (
int,float, orNone, default=None) -- Repetition time (TR) for the specified task. If not provided, the TR will be automatically extracted from the first BOLD metadata file found for the task, searching first in the pipeline directory, then in thebids_dirif not found.run_subjects (
list[str]orNone, default=None) -- List of subject IDs to process (e.g.run_subjects=["01", "02"]). Processes all subjects if None.exclude_subjects (
list[str]orNone, default=None) -- List of subject IDs to exclude (e.g.exclude_subjects=["01", "02"]).exclude_niftis (
list[str]orNone, default=None) --List of the specific preprocessed NIfTI files to exclude, preventing their timeseries data from being extracted. Used if there are specific runs across different participants that need to be excluded.
Changed in version 0.18.0: moved from being the second to last parameter, to being underneath
exclude_subjectspipeline_name (
strorNone, default=None) -- The name of the pipeline folder in the derivatives folder containing the preprocessed data. If None,BIDSLayoutwill default to using thebids_dirwithderivatives=True. This parameter should be used if multiple pipelines exist or when the pipeline folder containing the "dataset_description.json" file is nested within another folder. The specified folder must contain the "dataset_description.json" file in its root level. For instance, if the json file is in "path/to/bids/derivatives/fmriprep/fmriprep-20.0.0", thenpipeline_name = "fmriprep/fmriprep-20.0.0".n_cores (
intorNone, default=None) -- The number of cores to use for multiprocessing with joblib. The default backend for joblib is used.parallel_log_config (
dict[str, Union[multiprocessing.Manager.Queue, int]]) --Passes a user-defined managed queue and logging level to the internal timeseries extraction function when parallel processing (
n_cores) is used. Note, if parallel processing is used, global logging configurations won't be passed to the child processes. Thus, to prevent the child processes from using the default logging behavior, this parameter must be used. Additionally, this parameter must be a dictionary and the available keys are:"queue": The instance of
multiprocessing.Manager.Queueto pass toQueueHandler. If not specified, all logs will output tosys.stdout."level": The logging level (e.g.
logging.INFO,logging.WARNING). If not specified, the default level islogging.INFO.
import logging from logging.handlers import QueueListener from multiprocessing import Manager # Configure root with FileHandler root_logger = logging.getLogger() root_logger.setLevel(logging.INFO) file_handler = logging.FileHandler('neurocaps.log') file_handler.setFormatter(logging.Formatter('%(asctime)s %(name)s [%(levelname)s] %(message)s')) root_logger.addHandler(file_handler) if __name__ == "__main__": # Import the TimeseriesExtractor from neurocaps.extraction import TimeseriesExtractor # Setup managed queue manager = Manager() queue = manager.Queue() # Set up the queue listener listener = QueueListener(queue, *root_logger.handlers) # Start listener listener.start() extractor = TimeseriesExtractor() # Use the `parallel_log_config` parameter to pass queue and the logging level extractor.get_bold(bids_dir="path/to/bids/dir", task="rest", tr=2, n_cores=5, parallel_log_config = {"queue": queue, "level": logging.WARNING}) # Stop listener listener.stop()
Added in version 0.17.8.
Changed in version 0.18.0: moved from being the last parameter, to being underneath
n_coresverbose (
bool, default=True) -- If True, logs detailed subject-specific information including: subjects skipped due to missing required files, current subject being processed for timeseries extraction, confounds identified for nuisance regression in addition to requested confounds that are missing for a subject, and additional warnings encountered during the timeseries extraction process.flush (
bool, default=False) --If True, flushes the logged subject-specific information produced during the timeseries extraction process.
Changed in version 0.17.0: Changed from
flush_printtoflush.
Note
Subject Timeseries Dictionary: This method stores the extracted timeseries of all subjects in
self.subject_timeseries. The structure is a dictionary mapping subject IDs to their run IDs and their associated timeseries (TRs x ROIs) as a numpy array:subject_timeseries = { "101": { "run-0": np.array([timeseries]), # Shape: TRs x ROIs "run-1": np.array([timeseries]), # Shape: TRs x ROIs "run-2": np.array([timeseries]), # Shape: TRs x ROIs }, "102": { "run-0": np.array([timeseries]), # Shape: TRs x ROIs "run-1": np.array([timeseries]), # Shape: TRs x ROIs } }
By default, "run-0", will be used if run IDs are not specified in the NifTI file.
Parcellation & Nuisance Regression: For timeseries extraction, nuisance regression, and spatial dimensionality reduction using a parcellation, nilearn's
NiftiLabelsMaskerfunction is used. If requested, dummy scans are removed from the NIfTI images and confound dataset prior to timeseries extraction. For volumes exceeding a specified framewise displacement (FD) threshold, if the "use_sample_mask" key in thefd_thresholddictionary is set to True, then a boolean sample mask is generated (where False indicates the high motion volumes) and passed to thesample_maskparameter in nilearn'sNiftiLabelsMasker. If, "use_sample_mask" key is False or not specified in thefd_thresholddictionary, then censoring is done after nuisance regression, which is the default behavior.Extraction of Task Conditions: when extracting specific conditions,
intto round down for the beginning scan indexstart_scan = int(onset/tr)andmath.ceilis used to round up for the ending scan indexend_scan = math.ceil((onset + duration)/tr). Filtering a specific condition from the timeseries is done after nuisance regression. Additionally, if the "use_sample_mask" key in thefd_thresholddictionary is set to True, then the truncated 2D timeseries is temporarily padded to ensure the correct rows corresponding to the condition are obtained.