README.md 5.13 KB
Newer Older
Jiri Borovec committed
1 2 3 4 5 6
# Atomic Pattern Dictionary Learning

 We present an image processing pipeline which accepts a large number of images, containing spatial expression information for thousands of genes in Drosophila imaginal discs. We assume that the gene activations are binary and can be expressed as a union of a small set of non-overlapping spatial patterns, yielding a compact representation of the spatial activation of each gene. This lends itself well to further automatic analysis, with the hope of discovering new biological relationships. Traditionally, the images were labeled manually, which was very time consuming. The key part of our work is a binary pattern dictionary learning algorithm, that takes a set of binary images and determines a set of patterns, which can be used to represent the input images with a small error. We also describe the preprocessing phase, where input images are segmented to recover the activation images and spatially aligned to a common reference. We compare binary pattern dictionary learning to existing alternative methods on synthetic data and also show results of the algorithm on real microscopy images of the Drosophila imaginal discs.


# Project
Jiri Borovec committed
7

Jiri Borovec committed
8 9 10 11
* **WWW:** https://gitlab.fel.cvut.cz/borovji3/atomicpatterndict
* **SSH:** git@gitlab.fel.cvut.cz:borovji3/atomicpatterndict.git
* **HTTPS:** https://gitlab.fel.cvut.cz/borovji3/atomicpatterndict.git

Jiri Borovec committed
12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
# Methods

We have our method BPDL and also we comapre it to state-of-the-art, see [Faces dataset decompositions]
(http://scikit-learn.org/stable/auto_examples/decomposition/plot_faces_decomposition.html#example-decomposition-plot-faces-decomposition-py):
 
 * **FastICA**, [sklearn.decomposition.FastICA]
 (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.FastICA.html)
 * **SparsePCA**, [sklearn.decomposition.SparsePCA]
 (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.SparsePCA.html)
 * **Non-negative Matrix Factorisation**, [sklearn.decomposition.NMF]
 (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.NMF.html)
 * **Dictionary Learning** with Matching pursuit, [sklearn.decomposition.DictionaryLearning]
 (http://scikit-learn.org/stable/modules/generated/sklearn.decomposition.DictionaryLearning.html)
 * our **Binary Pattern Dictionary Learning**

Jiri Borovec committed
27 28
# Data

Jiri Borovec committed
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
We work on synthetic and also real images

## Synthetic datasets
 
 We have script **run_generate_dataset.py** which generate dataset with given configurationa. The images subsets are:
  
  1. **pure** images meaning they are generated just from the atlas
  2. **noise** images from (1) with added binary noise
  3. **deform** images from (1) with applyed small elastic deformation
  4. **deform&noise** images from (3) with added binary noise
  
  Some parameters like noise and deformation ratio are specified in the script.
  Some other parameters like number of patterns and image size (2D or 3D) are parameters passed to the scrippt
  
  The location is **/datagrid/Medical/microscopy/drosophila/synthetic_data**
 
## Real images - drosophila

 One source if images are probabilistic segmentation so we binarize them, 
 **run_preproc_prob_segm.py**
 
 We use our pipeline **segmentation-registration**
 
 1. segment images **run_experiment_segm_disc.py**
 2. contact M. Dolejsi for registration
Jiri Borovec committed
54 55 56

# Experiments

Jiri Borovec committed
57 58 59 60 61 62 63 64
We run experiment for debuging and also evaluating perfomances .
To collect the results we use **python run_experiments_parser.py** which visit all experiments and agregate the configurations with results together into one large CSV file

```
python run_experiments_parser.py \
    -p ~/Medical-data/microscopy/drosophila/TEMPORARY/experiments_APDL_synth \
    --fn_results results.csv --func_stat mean
```
Jiri Borovec committed
65

Jiri Borovec committed
66
## Binary Pattern Dictionary Learning
Jiri Borovec committed
67

Jiri Borovec committed
68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83
 We run just our method on both synthetic/real images using **run_experiment_apd_apdl.py** where each configuration have several runs in debug mode 
 (saving more log information and also exporting all partial estimated atlases)
 
 1. **Synthetic datasets**
```
python run_experiment_apd_apdl.py \
    -in /datagrid/Medical/microscopy/drosophila/synthetic_data/atomicPatternDictionary_v1 \
    -out /datagrid/Medical/microscopy/drosophila/TEMPORARY/experiments_APDL_synth
```
 2. **Real images - drosophila**
```
python run_experiment_apd_apdl.py --type real \
    -in /datagrid/Medical/microscopy/drosophila/TEMPORARY/type_1_segm_reg_binary \
    -out /datagrid/Medical/microscopy/drosophila/TEMPORARY/experiments_APDL_real \
    --dataset gene_ssmall
```
Jiri Borovec committed
84

Jiri Borovec committed
85
## All methods
Jiri Borovec committed
86

Jiri Borovec committed
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102
 We cun all methods in the equal configuration mode on given synthetic/real data using **run_experiment_apd_all.py** runing in info mode, just a few printing
 
 1. **Synthetic datasets**
```
python run_experiment_apd_all.py \
    -in /datagrid/Medical/microscopy/drosophila/synthetic_data/atomicPatternDictionary_v1 \
    -out /datagrid/Medical/microscopy/drosophila/TEMPORARY/experiments_APDL_synth2
    --method APDL
```
 2. **Real images - drosophila**
```
python run_experiment_apd_all.py --type real \
    -in /datagrid/Medical/microscopy/drosophila/TEMPORARY/type_1_segm_reg_binary \
    -out /datagrid/Medical/microscopy/drosophila/TEMPORARY/experiments_APD_real \
    --dataset gene_ssmall
```