Helper Module for Deep Learning.
Module that provides generative losses.
Code: https://github.com/YannDubs/disentangling-vae
-
class
pynet.losses.generative.BaseLoss(steps_anneal=0, use_mse=False)[source]¶ Base class for losses.
-
__init__(steps_anneal=0, use_mse=False)[source]¶ Init class.
- Parameters
steps_anneal: int, default 0
number of annealing steps where gradually adding the regularisation.
use_mse: bool, default False
if set use MSE for the reconstruction loss rather than Log Likelihood.
-
compute_ll(p, data)[source]¶ Compute log likelihood.
- Parameters
p: torch.distributions
probabilistic decoder (or likelihood of generating true data sample given the latent code).
data: torch.Tensor
reference data.
-
get_params()[source]¶ Get forward layers outputs.
- Returns
q: torch.distributions
probabilistic encoder (or estimated posterior probability function).
z: torch.Tensor
the compressed code learned in the bottleneck layer.
model: nn.Module
the network.
-
kl_log_uniform(normal)[source]¶ Calculates the KL log uniform divergence.
Paragraph 4.2 from: Variational Dropout Sparsifies Deep Neural Networks Molchanov, Dmitry; Ashukha, Arsenii; Vetrov, Dmitry https://arxiv.org/abs/1701.05369 https://github.com/senya-ashukha/variational-dropout-sparsifies-dnn/ blob/master/KL%20approximation.ipynb
-
kl_normal_loss(q)[source]¶ Calculates the KL divergence between a normal distribution with diagonal covariance and a unit normal distribution.
- Parameters
q: torch.distributions
probabilistic encoder (or estimated posterior probability function).
-
linear_annealing(init, fin)[source]¶ Linear annealing of a parameter.
- Returns
annealed: float
loss factor to gradually add the regularisation.
-
reconstruction_loss(p, data)[source]¶ Calculates the per image reconstruction loss for a batch of data (i.e. negative log likelihood).
The distribution of the likelihood on the each pixel implicitely defines the loss. Bernoulli corresponds to a binary cross entropy. Gaussian distribution corresponds to MSE, and is sometimes used, but hard to train because it ends up focusing only a few pixels that are very wrong. Laplace distribution corresponds to L1 solves partially the issue of MSE.
- Parameters
p: torch.distributions
probabilistic decoder (or likelihood of generating true data sample given the latent code).
data: torch.Tensor
reference data.
- Returns
loss: torch.Tensor
per image cross entropy (i.e. normalized per batch but not pixel and channel).
-
-
class
pynet.losses.generative.BetaBLoss(C_init=0.0, C_fin=20.0, gamma=100.0, **kwargs)[source]¶ Compute the Beta-VAE loss.
Understanding disentangling in beta-VAE, Burgess, arXiv 2018.
-
__init__(C_init=0.0, C_fin=20.0, gamma=100.0, **kwargs)[source]¶ Init class.
- Parameters
C_init: float, default 0
starting annealed capacity C.
C_fin: float, default 20
final annealed capacity C.
gamma: float, default 100
weight of the KL divergence term.
kwargs: dict
additional arguments for ‘BaseLoss’.
-
-
class
pynet.losses.generative.BetaHLoss(beta=4, **kwargs)[source]¶ Compute the Beta-VAE loss.
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework, Irina Higgins, ICLR 2017.
-
class
pynet.losses.generative.BtcvaeLoss(dataset_size, alpha=1.0, beta=6.0, gamma=1.0, is_mss=True, **kwargs)[source]¶ Compute the decomposed KL loss with either minibatch weighted sampling or minibatch stratified sampling according.
Isolating sources of disentanglement in variational autoencoders, Tian Qi, Advances in Neural Information Processing Systems, 2018.
-
__init__(dataset_size, alpha=1.0, beta=6.0, gamma=1.0, is_mss=True, **kwargs)[source]¶ Init class.
- Parameters
dataset_size: int
number of training images in the dataset.
alpha: float, default 1
weight of the mutual information term.
beta: float, default 6
weight of the total correlation term.
gamma: float, default 1
weight of the dimension-wise KL term.
dataset_size: int
number of training images in the dataset.
is_mss: bool, default True
wether to use minibatch stratified sampling instead of minibatch weighted sampling.
kwargs: dict
additional arguments for ‘BaseLoss’.
-
static
log_importance_weight_matrix(batch_size, dataset_size)[source]¶ Calculates a log importance weight matrix.
- Parameters
batch_size: int
number of training images in the batch.
dataset_size: int
number of training images in the dataset.
-
static
matrix_log_density_gaussian(x, q)[source]¶ Calculates log density of a Gaussian for all combination of bacth pairs of ‘x’ and ‘mu’, i.e. return tensor of shape (batch_size, batch_size, dim) instead of (batch_size, dim) in the usual log density.
- Parameters
x: torch.Tensor (batch_size, dim)
value at which to compute the density.
q: torch.distributions
probabilistic encoder (or estimated posterior probability function).
-
-
class
pynet.losses.generative.FactorKLoss(device, gamma=10.0, disc_kwargs={}, optim_kwargs={'betas': (0.5, 0.9), 'lr': 5e-05}, **kwargs)[source]¶ Compute the Factor-VAE loss (algorithm 2).
Disentangling by factorising, Hyunjik, arXiv 2018.
-
__init__(device, gamma=10.0, disc_kwargs={}, optim_kwargs={'betas': (0.5, 0.9), 'lr': 5e-05}, **kwargs)[source]¶ Init class.
- Parameters
device: torch.device
the device.
optimizer: torch.optim
the network optimizer.
gamma: float, default 10
Weight of the TC loss term. gamma in the paper.
disc_kwargs: dict
discrimiator arguments.
optim_kwargs: dict
Adam optimizer arguments.
kwargs: dict
additional arguments for ‘BaseLoss’.
-
-
class
pynet.losses.generative.MCVAELoss(n_channels, beta=1.0, enc_channels=None, dec_channels=None, sparse=False, nodecoding=False)[source]¶ MCVAE loss.
Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data, Luigi Antelmi, Nicholas Ayache, Philippe Robert, Marco Lorenzi Proceedings of the 36th International Conference on Machine Learning, PMLR 97:302-311, 2019.
MCVAE consists of two loss functions:
KL divergence loss: how off the distribution over the latent space is from the prior. Given the prior is a standard Gaussian and the inferred distribution is a Gaussian with a diagonal covariance matrix, the KL-divergence becomes analytically solvable.
log-likelihood LL
loss = beta * KL_loss + LL_loss.
-
__init__(n_channels, beta=1.0, enc_channels=None, dec_channels=None, sparse=False, nodecoding=False)[source]¶ Init class.
- Parameters
n_channels: int
the number of channels.
beta, float, default 1.
for beta-VAE.
enc_channels: list of int, default None
encode only these channels (for kl computation).
dec_channels: list of int, default None
decode only these channels (for ll computation).
sparse: bool, default False
use sparsity contraint.
nodecoding: bool, default False
if set do not apply the decoding.
-
class
pynet.losses.generative.MOESimVAELoss(beta=1.0, alpha=1.0, n_components_umap=2, n_neighbors_knn=10, use_similarity_loss=False, use_balancing_loss=True)[source]¶ MOE-Sim_VAE Loss.
-
__init__(beta=1.0, alpha=1.0, n_components_umap=2, n_neighbors_knn=10, use_similarity_loss=False, use_balancing_loss=True)[source]¶ Init class.
- Parameters
beta: float, default 1
the weight of KL regularization term.
alpha: float, default 1
the weight of the DEPICT term.
n_components_umap: int, default 2
the UMAP projection of the data desired number of dimensions.
n_neighbors_knn: int, dafault 10
the number of k-nearest-neighbors used to define the adjacency matrix.
use_similarity_loss: bool, default False
activate the similarity loss.
use_balancing_loss: bool, default True
activate the balancing loss.
-
static
balancing(probs)[source]¶ One thing we need to be careful about when training this model is that the manager could easily degenerate into outputting a constant vector regardless of the input in hand. This results in one VAE specialized in all digits, and nine VAEs specialized in nothing. One way to mitigate it, is to add a balancing term to the loss. It encourages the outputs of the manager over a batch of inputs to be balanced, i.e. the distribution of the sum of the probabilities over the batch is almost uniform.
-
static
depict(probs, probs_noisy)[source]¶ The DEPICT loss encourages the model to learn invariant features from the latent representation for clustering with respect to noise.
-
static
get_similarity_matrix(data, n_components_umap=2, n_neighbors_knn=10, random_state=None)[source]¶ The similarity matrix is derived in an unsupervised way (e.g., UMAP projection of the data and k-nearest-neighbors or distance thresholding to define the adjacency matrix for the batch), but can also be used to include weakly-supervised information (e.g., knowledge about diseased vs. non-diseased patients). If labels are available, the model could even be used to derive a latent representation with supervision. Thesimilarity feature in MoE-Sim-VAE thus allows to include prior knowledge about the best similarity measure on the data.
-
-
class
pynet.losses.generative.PMVAELoss(beta=1)[source]¶ PMVAE loss.
Compute a global & a local (per pathway) reconstruction loss and a KL divergence regularization loss with beta weighting.
-
class
pynet.losses.generative.SparseLoss(beta=4, **kwargs)[source]¶ Compute the Beta-Sparse VAE loss.
Sparse Multi-Channel Variational Autoencoder for the Joint Analysis of Heterogeneous Data, Luigi Antelmi, Nicholas Ayache, Philippe Robert, Marco Lorenzi, PMLR 2019.
-
pynet.losses.generative.get_vae_loss(loss_name, **kwargs)[source]¶ Return the correct VAE loss function given the input arguments.
The parameters for each loss:
vae: -
betah: beta
betab: C_init, C_fin, gamma
factor: device, gamma, latent_dim, lr_disc
btcvae: dataset_size, alpha, beta, gamma
sparse: beta
- Parameters
loss_name: str
the name of the loss.
kwargs: dict
the loss kwargs.
- Returns
loss: @callable
the loss function.
Follow us
Inspired by AZMIND template.