sm.engine package

Submodules

sm.engine.dataset module

class sm.engine.dataset.Dataset(sc, id, name, drop, input_path, wd_manager, db, es)[source]

Bases: object

A class representing a mass spectrometry dataset. Backed by a couple of plain text files containing coordinates and spectra.

copy_read_data()[source]

Read/convert input data. Read/update metadata/config if needed

get_dims()[source]
Returns:

: tuple

A pair of int values. Number of rows and columns

get_norm_img_pixel_inds()[source]
Returns:

: ndarray

One-dimensional array of indexes for dataset pixels taken in row-wise manner

get_sample_area_mask()[source]
Returns:

: ndarray

One-dimensional bool array of pixel indices where spectra were sampled

get_spectra()[source]
Returns:

: pyspark.rdd.RDD

Spark RDD with spectra. One spectrum per RDD entry.

static txt_to_spectrum_non_cum(s)[source]

sm.engine.db module

synopsis:Database interface
class sm.engine.db.DB(config, autocommit=False)[source]

Bases: object

Postgres database access provider

alter(*args, **kwargs)[source]

Execute alter query

close()[source]

Close the connection to the database

copy(*args, **kwargs)[source]

Copy data from a file to a table

insert(*args, **kwargs)[source]

Execute insert query

insert_return(*args, **kwargs)[source]

Execute insert query

select(*args, **kwargs)[source]

Execute select query

select_one(sql, *args)[source]

Execute select query and take the first row

sm.engine.db.db_decor(func)[source]

sm.engine.es_export module

class sm.engine.es_export.ESExporter(sm_config)[source]
create_index()[source]
delete_ds(ds_id)[source]
delete_index()[source]
index_ds(db, ds_id)[source]

sm.engine.fdr module

class sm.engine.fdr.FDR(job_id, db_id, decoy_sample_size, target_adducts, db)[source]

Bases: object

clean_target_decoy_table()[source]
decoy_adduct_selection()[source]
estimate_fdr(msm_df)[source]

sm.engine.formulas_segm module

class sm.engine.formulas_segm.FormulasSegm(job_id, db_id, ds_config, db)[source]

Bases: object

A class representing a molecule database to search through. Provides several data structured used in the engine to speedup computation

static check_formula_uniqueness(sf_df)[source]
get_sf_adduct_peaksn()[source]
Returns:

: list

An array of triples (formula id, adduct, number of theoretical peaks)

get_sf_adduct_sorted_df()[source]
get_sf_peak_df()[source]
get_sf_peak_ints()[source]
static sf_peak_gen(sf_df)[source]

sm.engine.imzml_txt_converter module

synopsis:Converter of ImzML into a text format accessible from pyspark
class sm.engine.imzml_txt_converter.ImzmlTxtConverter(imzml_path, txt_path, coord_path=None)[source]

Bases: object

Converts spectra from imzML/ibd to plain text files for later access from Spark

convert(preprocess=False, print_progress=True)[source]

Converts MS imaging data provided by given parser to a text-based format. Optionally writes the coordinates into a coordinate file.

parse_save_spectrum(i, x, y)[source]

Parse and save to files spectrum with index i and its coordinates x,y

sm.engine.imzml_txt_converter.encode_coord_line(index, x, y)[source]

Encodes given coordinate into a csv line: “index,x,y”

sm.engine.imzml_txt_converter.encode_data_line(index, mzs, ints, decimals=3)[source]

Encodes given spectrum into a line in a text-based format: “index|int_1 int_2 ... int_n|mz_1 mz_2 ... mz_n”

sm.engine.imzml_txt_converter.get_track_progress(n_points, step, active=False)[source]
sm.engine.imzml_txt_converter.preprocess_spectrum(mzs, ints)[source]
sm.engine.imzml_txt_converter.to_space_separated_string(seq)[source]

sm.engine.isocalc_wrapper module

class sm.engine.isocalc_wrapper.Centroids(isotope_pattern, resolving_power, pts_per_mz=None)[source]

Bases: object

empty
spectrum_chart(n_peaks=4)[source]
class sm.engine.isocalc_wrapper.IsocalcWrapper(isocalc_config)[source]

Bases: object

Wrapper around pyMSpec.pyisocalc.pyisocalc used for getting theoretical isotope peaks’ centroids and profiles for a sum formula.

formatted_iso_peaks(sf, adduct)[source]
Returns:

: str

A one line string with tab separated lists. Every list is a comma separated string.

isotope_peaks(sf, adduct)[source]
Returns:

: Centroids

In case of any errors returns object with empty ‘mzs’ and ‘ints’ fields

static slice_array(mzs, lower, upper)[source]
sm.engine.isocalc_wrapper.list_of_floats_to_str(l)[source]
sm.engine.isocalc_wrapper.trim_centroids(mzs, intensities, k)[source]

sm.engine.metabolights module

class sm.engine.metabolights.BatchConfig(study_id, metadata_template)[source]

Bases: object

class sm.engine.metabolights.MetabolightsBatch(batch_config, tmp_dir='/tmp', s3bucket='sm-engine-icl-data')[source]

Bases: object

cleanup()[source]

Removes temporary files

run(job_queue, remove_previous_results=False)[source]

Submits job descriptions into the queue.

study()[source]

Information about the study returned by ISATab parser

sm.engine.metabolights.setupQueue(sm_config_path)[source]
sm.engine.metabolights.sm_engine_config(meta_json, mass_accuracy_ppm=2)[source]

sm.engine.queue module

class sm.engine.queue.Queue(config, qname)[source]

Bases: object

close()[source]
publish(msg)[source]
start_consuming(callback)[source]

sm.engine.search_algorithm module

class sm.engine.search_algorithm.SearchAlgorithm(sc, ds, formulas, fdr, ds_config)[source]

Bases: object

calc_metrics(sf_images)[source]
estimate_fdr(all_sf_metrics_df)[source]
filter_sf_images(sf_images, sf_metrics_df)[source]
filter_sf_metrics(sf_metrics_df)[source]
search()[source]

sm.engine.search_job module

synopsis:Molecular search job driver
class sm.engine.search_job.SearchJob(ds_id, ds_name, drop, input_path, sm_config_path, no_clean=False)[source]

Bases: object

Main class responsible for molecule search. Uses other modules of the engine.

run(ds_config_path=None)[source]
Entry point of the engine. Molecule search is completed in several steps:
  • Copying input data to the engine work dir
  • Conversion input data (imzML+ibd) to plain text format. One line - one spectrum data
  • Generation and saving to the database theoretical peaks for all formulas from the molecule database
  • Molecules search. The most compute intensive part. Spark is used to run it in distributed manner.
  • Saving results (isotope images and their metrics of quality for each putative molecule) to the database
store_job_meta()[source]

Store search job metadata in the database

sm.engine.search_results module

class sm.engine.search_results.SearchResults(sf_db_id, ds_id, job_id, sf_adduct_peaksn, db, sm_config, ds_config)[source]

Bases: object

Container for molecule search results

store()[source]
store_sf_img_metrics()[source]

Store formula image metrics in the database

store_sf_iso_images()[source]

Store formula images in the database

sm.engine.theor_peaks_gen module

class sm.engine.theor_peaks_gen.TheorPeaksGenerator(sc, sm_config, ds_config)[source]

Bases: object

Generator of theoretical isotope peaks for all molecules in a database.

find_sf_adduct_cand(sf_list, stored_sf_adduct)[source]
Returns:

: list

List of (formula id, formula, adduct) triples which don’t have theoretical patterns saved in the database

generate_theor_peaks(sf_adduct_cand)[source]
Returns:

: list

List of strings with formatted theoretical peaks data

run()[source]

Starts peaks generation. Checks all formula peaks saved in the database and generates peaks only for new ones

sm.engine.util module

class sm.engine.util.SMConfig[source]

Bases: object

Engine configuration manager

classmethod get_conf()[source]
Returns:

: dict

SM engine configuration

classmethod set_path(path)[source]

Set path for a SM configuration file

Parameters:path : String
sm.engine.util.cmd(template, *args)[source]
sm.engine.util.cmd_check(template, *args)[source]
sm.engine.util.init_logger(log_config=None)[source]
sm.engine.util.local_path(path)[source]
sm.engine.util.proj_root()[source]
sm.engine.util.read_json(path)[source]
sm.engine.util.s3_path(path)[source]

sm.engine.work_dir module

synopsis:Access to datasets stored in a local directory or on S3
class sm.engine.work_dir.LocalWorkDir(base_path, ds_name)[source]

Bases: object

clean()[source]
coord_path
copy(source, dest, is_file=False)[source]
ds_config_path
ds_metadata_path
exists(path)[source]
imzml_path
txt_path
class sm.engine.work_dir.S3WorkDir(base_path, ds_name, s3, s3transfer)[source]

Bases: object

clean()[source]
coord_path
copy(local, remote)[source]
ds_config_path
exists(path)[source]
txt_path
class sm.engine.work_dir.WorkDirManager(ds_id)[source]

Bases: object

Provides access to the work directory of the target dataset

clean()[source]
coord_path
copy_input_data(input_data_path, ds_config_path)[source]

Copy imzML/ibd/config/meta files from input path to a dataset work directory

ds_config_path
ds_metadata_path
exists(path)[source]
txt_path
upload_to_remote()[source]
sm.engine.work_dir.split_local_path(path)[source]
sm.engine.work_dir.split_s3_path(path)[source]

Module contents