\documentclass[a4paper]{article}
\usepackage[english]{babel}
\usepackage{amsmath,amsthm,amssymb,amsfonts}
\usepackage[utf8]{inputenc}
\usepackage{color}
\usepackage{hyperref}
\usepackage[round]{natbib}
\topmargin-3mm
\addtolength{\textheight}{115pt}
\textwidth15.2cm
\textheight 22cm
\oddsidemargin0pt
\evensidemargin0pt
\hoffset0.15in
\setlength{\parindent}{6mm}
\unitlength1.0cm
<>=
library(knitr)
opts_chunk$set(
fig.path='Figures/',fig.width=5.625,fig.height=5.625,cache=TRUE,tidy=TRUE, fig.align = "center"
)
@
\title{Computing local specificity index}
\author{Mahendra Mariadassou \\ \url{mahendra.mariadassou@jouy.inra.fr}}
\begin{document}
\setkeys{Gin}{width=1.0\textwidth}
\maketitle
% \tableofcontents
This vignette shows how to reproduce the analysis and graphics used in \citet{Mariadassou2015}. The function used to compute local specificity index make heavy use of \texttt{phyloseq} \citep{McMurdie2013a} and the graphics are all made using \texttt{ggplot2} \citep{Wickham2009} so we load those two packages.
<>=
library(phyloseq); packageVersion("phyloseq")
library(ggplot2); packageVersion("ggplot2")
@
Instructions about how to install \texttt{phyloseq} are available at the author's website: \url{https://joey711.github.io/phyloseq/}. \texttt{ggplot2} is available on CRAN and can be installed with \texttt{install.packages}. We then load the custom functions used to compute local specificity index.
<>=
source("specificity_methods.R")
@
and illustrate their use on the Gloabl Patterns data set \citep{Caporaso2011} provided by \texttt{phyloseq}
<>=
data(GlobalPatterns)
## Filter out mock communities
GP <- subset_samples(GlobalPatterns, SampleType != "Mock")
@
To use your own data, you can use the various data import functions of \texttt{phyloseq}, detailed in the very nice following tutorial: \url{http://joey711.github.io/phyloseq/import-data.html}
To speed up computations a bit, we filter out the singletons and rarefy the dataset to \Sexpr{format(10000, big.mark = ",", scientific = FALSE)} reads per sample.
<>=
set.seed(24082015) ## for reproducibility
GP.down <- prune_taxa(taxa_sums(GP) > 1, GP) ## remove singletons
GP.down <- rarefy_even_depth(GP.down, sample.size = 10000) ## rarefaction
@
\texttt{GP.down} is a \texttt{phyloseq}-class object with several components. We only use two here: \texttt{otu\_table}, the count table, and \texttt{sample\_data}, the metadata associated with the samples (see \texttt{phyloseq} documentation for more details on additional components).
<>=
print(GP.down)
head(otu_table(GP.down), n = 2)
head(sample_data(GP.down), n = 2)
@
We can now compute the local specificity using \texttt{SampleType} (read directly from the sample data component) as a grouping factor. Alternatively, you can provide the grouping factor directly. We also use 999 stratified bootstrap replicates to estimate the error of the local specificity coefficient. As stated in \cite{Mariadassou2015}, the bootstrap is stratified by levels of the grouping factor to preserve the structure of the data set (with respect to the grouping factor): to create bootstrap Soil samples, we only resample from Soil samples.
<>=
specOTUS <- estimate_local_specificity(GP.down, group = "SampleType", index = "indspec", B = 999)
@
The result is a data frame with one line per taxon and the following columns:
\begin{itemize}
\item specificity: observed local specificity
\item group: grouping factor level
\item abundance: local relative abundance of otu (all samples of a level are weighted equally)
\item mean, sd, quantiles 5, 25, 50, 75 and 95\% (of the stratified bootstrap distribution) if 'se' is TRUE
\end{itemize}
<>=
options(digits = 2)
@
<>=
head(specOTUS, n = 2)
@
We can then plot the results (Figure~\ref{fig:plot-local-specificity}) using default settings of the \texttt{plot\_local\_specificity} function. You may get some warnings for the loess fit to the data but it is safe to ignore them.
<>=
plot_local_specificity(specOTUS)
@
The default settings plot the standard error but you can remove them from the plot to make it easier to read (Figure~\ref{fig:plot-local-specificity-without-se}) using \texttt{se = FALSE} in the call:
<>=
plot_local_specificity(specOTUS, se = FALSE)
@
Stratified bootstrap computes the variance associated with a limited number of samples in each group. It is clear from Figure~\ref{fig:plot-local-specificity} that local specificity increases on average with local abundance but this could simply be a fluke. We test whether abundance-specificity relationship derives from the grouping factor by ``breaking'' the structure imposed by the grouping factor and assessing whether the relationship is preserved. We do so with standard bootstrap replicates, \emph{i.e.} to create bootstrap Soil samples, we resample from all samples. Once again, we use 999 boostrap replicates.
<>=
specOTUStest <- test_local_specificity(GP.down, group = "SampleType", B = 999, replace = TRUE, type = "global", index = "indspec")
@
The result is a data frame with one line per taxon and the following columns:
\begin{itemize}
\item specificity: observed specificity
\item level: grouping factor level
\item abundance: local relative abundance of otu (all samples of a level are weighted equally)
\item rawp: raw p-value
\item adjp: adjusted p-value (corrected using "fdr")
\item mean, sd, quantiles 50, 75, 90, 95 and 99\% (of the standard bootstrap distribution) if 'se' is TRUE
\end{itemize}
The p-values correspond to the probability that the local specificity is as high as observed under the hypothesis of no structure from the grouping factor. They are computed from the quantiles of the standard bootstrap distribution.
<>=
head(specOTUStest, n = 3)
@
We can represent the local specificity under the null hypothesis of no structure, taken here to be the mean of the standard bootstrap distribution (Figure~\ref{fig:plot-null-local-specificity-without-se}):
<>=
plot_local_specificity(specOTUStest, y = "bmean", se = FALSE)
@
There is a upward trend, but it is much shallower than for the structured data (note that the y-scale is different)
We can finally plot the observed specificity and its expected value under the null distribution on the same graph but it require a bit more work.
<>=
specOTUS <- merge(specOTUS, specOTUStest)
attr(specOTUS, "index") <- 'indspec'
## Plot local specificty values for SampleType
p <- plot_local_specificity(specOTUS, y = "mean", plot = FALSE)
## Add smoother (loess fit) to highlight the abundance-specificity relationship
p <- p + geom_smooth(color = "grey40", method = "loess", se = TRUE)
## Add expected specificity values under the null distribution
p <- p + geom_point(aes(x = abundance, y = bmean), color = "grey60", alpha = 0.5)
## Add error bars for the expected values
p <- p + geom_errorbar(aes(x = abundance, ymin = bmean -bsd, ymax = bmean + bsd), color = "grey60", alpha = 0.2)
## Add smoother for the abundance-specificity relationship for the expected values
p <- p + geom_smooth(aes(x = abundance, y = bmean), method = "loess", color = "grey40", se = TRUE)
## change axes titles, ticks and tick label
p <- p + labs(x = "Local abundance", y = "Local specificity")
p <- p + scale_y_continuous(breaks = c(0:4)/4, limits = c(0, 1))
## remove the legend and use white background
p <- p + theme_bw() + theme(legend.position = "none")
plot(p)
## Save the figure in your favorite format
ggsave(filename = "Figure.png", plot = p, width = 10, height = 6)
@
\bibliographystyle{plainnat}
\bibliography{/home/mahendra/Migale/Bibliography/Biblio2/GeneralBibliography}
\end{document}