sm.engine package¶

Subpackages¶

sm.engine.msm_basic package

Submodules¶

sm.engine.dataset module¶

class sm.engine.dataset.Dataset(sc, id, name, drop, input_path, wd_manager, db, es)[source]¶

Bases: object

A class representing a mass spectrometry dataset. Backed by a couple of plain text files containing coordinates and spectra.

copy_read_data()[source]¶: Read/convert input data. Read/update metadata/config if needed

get_dims()[source]¶

Returns:

: tuple

A pair of int values. Number of rows and columns

get_norm_img_pixel_inds()[source]¶

Returns:

: ndarray

One-dimensional array of indexes for dataset pixels taken in row-wise manner

get_sample_area_mask()[source]¶

Returns:

: ndarray

One-dimensional bool array of pixel indices where spectra were sampled

get_spectra()[source]¶

Returns:

: pyspark.rdd.RDD

Spark RDD with spectra. One spectrum per RDD entry.

static txt_to_spectrum_non_cum(s)[source]¶

sm.engine.db module¶

synopsis:	Database interface

class sm.engine.db.DB(config, autocommit=False)[source]¶

Bases: object

Postgres database access provider

alter(*args, **kwargs)[source]¶: Execute alter query

close()[source]¶: Close the connection to the database

copy(*args, **kwargs)[source]¶: Copy data from a file to a table

insert(*args, **kwargs)[source]¶: Execute insert query

insert_return(*args, **kwargs)[source]¶: Execute insert query

select(*args, **kwargs)[source]¶: Execute select query

select_one(sql, *args)[source]¶: Execute select query and take the first row

sm.engine.db.db_decor(func)[source]¶

sm.engine.es_export module¶

class sm.engine.es_export.ESExporter(sm_config)[source]¶

create_index()[source]¶

delete_ds(ds_id)[source]¶

delete_index()[source]¶

index_ds(db, ds_id)[source]¶

sm.engine.fdr module¶

class sm.engine.fdr.FDR(job_id, db_id, decoy_sample_size, target_adducts, db)[source]¶

Bases: object

clean_target_decoy_table()[source]¶

decoy_adduct_selection()[source]¶

estimate_fdr(msm_df)[source]¶

sm.engine.formulas_segm module¶

class sm.engine.formulas_segm.FormulasSegm(job_id, db_id, ds_config, db)[source]¶

Bases: object

A class representing a molecule database to search through. Provides several data structured used in the engine to speedup computation

static check_formula_uniqueness(sf_df)[source]¶

get_sf_adduct_peaksn()[source]¶

Returns:

: list

An array of triples (formula id, adduct, number of theoretical peaks)

get_sf_adduct_sorted_df()[source]¶

get_sf_peak_df()[source]¶

get_sf_peak_ints()[source]¶

static sf_peak_gen(sf_df)[source]¶

sm.engine.imzml_txt_converter module¶

synopsis:	Converter of ImzML into a text format accessible from pyspark

class sm.engine.imzml_txt_converter.ImzmlTxtConverter(imzml_path, txt_path, coord_path=None)[source]¶

Bases: object

Converts spectra from imzML/ibd to plain text files for later access from Spark

convert(preprocess=False, print_progress=True)[source]¶: Converts MS imaging data provided by given parser to a text-based format. Optionally writes the coordinates into a coordinate file.

parse_save_spectrum(i, x, y)[source]¶: Parse and save to files spectrum with index i and its coordinates x,y

sm.engine.imzml_txt_converter.encode_coord_line(index, x, y)[source]¶: Encodes given coordinate into a csv line: “index,x,y”

sm.engine.imzml_txt_converter.encode_data_line(index, mzs, ints, decimals=3)[source]¶: Encodes given spectrum into a line in a text-based format: “index|int_1 int_2 ... int_n|mz_1 mz_2 ... mz_n”

sm.engine.imzml_txt_converter.get_track_progress(n_points, step, active=False)[source]¶

sm.engine.imzml_txt_converter.preprocess_spectrum(mzs, ints)[source]¶

sm.engine.imzml_txt_converter.to_space_separated_string(seq)[source]¶

sm.engine.isocalc_wrapper module¶

class sm.engine.isocalc_wrapper.Centroids(isotope_pattern, resolving_power, pts_per_mz=None)[source]¶

Bases: object

empty¶

spectrum_chart(n_peaks=4)[source]¶

class sm.engine.isocalc_wrapper.IsocalcWrapper(isocalc_config)[source]¶

Bases: object

Wrapper around pyMSpec.pyisocalc.pyisocalc used for getting theoretical isotope peaks’ centroids and profiles for a sum formula.

formatted_iso_peaks(sf, adduct)[source]¶

Returns:

: str

A one line string with tab separated lists. Every list is a comma separated string.

isotope_peaks(sf, adduct)[source]¶

Returns:

: Centroids

In case of any errors returns object with empty ‘mzs’ and ‘ints’ fields

static slice_array(mzs, lower, upper)[source]¶

sm.engine.isocalc_wrapper.list_of_floats_to_str(l)[source]¶

sm.engine.isocalc_wrapper.trim_centroids(mzs, intensities, k)[source]¶

sm.engine.metabolights module¶

class sm.engine.metabolights.BatchConfig(study_id, metadata_template)[source]¶: Bases: object

class sm.engine.metabolights.MetabolightsBatch(batch_config, tmp_dir='/tmp', s3bucket='sm-engine-icl-data')[source]¶

Bases: object

cleanup()[source]¶: Removes temporary files

run(job_queue, remove_previous_results=False)[source]¶: Submits job descriptions into the queue.

study()[source]¶: Information about the study returned by ISATab parser

sm.engine.metabolights.setupQueue(sm_config_path)[source]¶

sm.engine.metabolights.sm_engine_config(meta_json, mass_accuracy_ppm=2)[source]¶

sm.engine.queue module¶

class sm.engine.queue.Queue(config, qname)[source]¶

Bases: object

close()[source]¶

publish(msg)[source]¶

start_consuming(callback)[source]¶

sm.engine.search_algorithm module¶

class sm.engine.search_algorithm.SearchAlgorithm(sc, ds, formulas, fdr, ds_config)[source]¶

Bases: object

calc_metrics(sf_images)[source]¶

estimate_fdr(all_sf_metrics_df)[source]¶

filter_sf_images(sf_images, sf_metrics_df)[source]¶

filter_sf_metrics(sf_metrics_df)[source]¶

search()[source]¶

sm.engine.search_job module¶

synopsis:	Molecular search job driver

class sm.engine.search_job.SearchJob(ds_id, ds_name, drop, input_path, sm_config_path, no_clean=False)[source]¶

Bases: object

Main class responsible for molecule search. Uses other modules of the engine.

run(ds_config_path=None)[source]¶

Entry point of the engine. Molecule search is completed in several steps:

Copying input data to the engine work dir
Conversion input data (imzML+ibd) to plain text format. One line - one spectrum data
Generation and saving to the database theoretical peaks for all formulas from the molecule database
Molecules search. The most compute intensive part. Spark is used to run it in distributed manner.
Saving results (isotope images and their metrics of quality for each putative molecule) to the database

store_job_meta()[source]¶: Store search job metadata in the database

sm.engine.search_results module¶

class sm.engine.search_results.SearchResults(sf_db_id, ds_id, job_id, sf_adduct_peaksn, db, sm_config, ds_config)[source]¶

Bases: object

Container for molecule search results

store()[source]¶

store_sf_img_metrics()[source]¶: Store formula image metrics in the database

store_sf_iso_images()[source]¶: Store formula images in the database

sm.engine.theor_peaks_gen module¶

class sm.engine.theor_peaks_gen.TheorPeaksGenerator(sc, sm_config, ds_config)[source]¶

Bases: object

Generator of theoretical isotope peaks for all molecules in a database.

find_sf_adduct_cand(sf_list, stored_sf_adduct)[source]¶

Returns:

: list

List of (formula id, formula, adduct) triples which don’t have theoretical patterns saved in the database

generate_theor_peaks(sf_adduct_cand)[source]¶

Returns:

: list

List of strings with formatted theoretical peaks data

run()[source]¶: Starts peaks generation. Checks all formula peaks saved in the database and generates peaks only for new ones

sm.engine.util module¶

class sm.engine.util.SMConfig[source]¶

Bases: object

Engine configuration manager

classmethod get_conf()[source]¶

Returns:

: dict

SM engine configuration

classmethod set_path(path)[source]¶

Set path for a SM configuration file

Parameters:	path : String

sm.engine.util.cmd(template, *args)[source]¶

sm.engine.util.cmd_check(template, *args)[source]¶

sm.engine.util.init_logger(log_config=None)[source]¶

sm.engine.util.local_path(path)[source]¶

sm.engine.util.proj_root()[source]¶

sm.engine.util.read_json(path)[source]¶

sm.engine.util.s3_path(path)[source]¶

sm.engine.work_dir module¶

synopsis:	Access to datasets stored in a local directory or on S3

class sm.engine.work_dir.LocalWorkDir(base_path, ds_name)[source]¶

Bases: object

clean()[source]¶

coord_path¶

copy(source, dest, is_file=False)[source]¶

ds_config_path¶

ds_metadata_path¶

exists(path)[source]¶

imzml_path¶

txt_path¶

class sm.engine.work_dir.S3WorkDir(base_path, ds_name, s3, s3transfer)[source]¶

Bases: object

clean()[source]¶

coord_path¶

copy(local, remote)[source]¶

ds_config_path¶

exists(path)[source]¶

txt_path¶

class sm.engine.work_dir.WorkDirManager(ds_id)[source]¶

Bases: object

Provides access to the work directory of the target dataset

clean()[source]¶

coord_path¶

copy_input_data(input_data_path, ds_config_path)[source]¶: Copy imzML/ibd/config/meta files from input path to a dataset work directory

ds_config_path¶

ds_metadata_path¶

exists(path)[source]¶

txt_path¶

upload_to_remote()[source]¶

sm.engine.work_dir.split_local_path(path)[source]¶

sm.engine.work_dir.split_s3_path(path)[source]¶

sm.engine package¶

Subpackages¶

Submodules¶

sm.engine.dataset module¶

sm.engine.db module¶

sm.engine.es_export module¶

sm.engine.fdr module¶

sm.engine.formulas_segm module¶

sm.engine.imzml_txt_converter module¶

sm.engine.isocalc_wrapper module¶

sm.engine.metabolights module¶

sm.engine.queue module¶

sm.engine.search_algorithm module¶

sm.engine.search_job module¶

sm.engine.search_results module¶

sm.engine.theor_peaks_gen module¶

sm.engine.util module¶

sm.engine.work_dir module¶

Module contents¶