Title: | Continuous-Time Multivariate Analysis |
---|---|
Description: | Implements a basis function or functional data analysis framework for several techniques of multivariate analysis in continuous-time setting. Specifically, we introduced continuous-time analogues of several classical techniques of multivariate analysis, such as principal component analysis, canonical correlation analysis, Fisher linear discriminant analysis, K-means clustering, and so on. Details are in Biplab Paul, Philip T. Reiss and Erjia Cui (2023) "Continuous-time multivariate analysis" <doi:10.48550/arXiv.2307.09404>. |
Authors: | Biplab Paul [aut, cre], Philip Tzvi Reiss [aut] |
Maintainer: | Biplab Paul <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.4.0 |
Built: | 2024-11-03 03:05:03 UTC |
Source: | https://github.com/cran/ctmva |
Implements continuous-time analogues of several classical techniques of multivariate analysis.
The inputs are "fd"
(functional data) objects from the fda package.
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
Paul, B., Reiss, P. T. and Cui, E. (2023). Continuous-time multivariate analysis. arXiv:2307.09404 [stat.ME]
A continuous-time version of canonical correlation analysis (CCA).
cca.ct(fdobj1, fdobj2)
cca.ct(fdobj1, fdobj2)
fdobj1 , fdobj2
|
a pair of continuous-time multivariate data sets, of class |
A list consisting of
vex1 , vex2
|
matrices defining the canonical variates. The first columns of each give the coefficients defining the first pair of canonical variates; and so on. |
cor |
canonical correlations, i.e., correlations between the pairs of canonical variates |
Columns of the output matrix vex2
are flipped as needed to ensure positive correlations.
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
cancor
, for classical CCA
## Not run: # CCA relating Canadian daily temperature and precipitation data require(fda) data(CanadianWeather) daybasis <- create.bspline.basis(c(0,365), nbasis=80) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd precfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"log10precip"], daybasis)$fd tpcor <- cca.ct(tempfd, precfd) par(mfrow=1:2) barplot(tpcor$vex1[,1], horiz=TRUE, las=1, main="Temperature", sub="First canonical coefficients vector") barplot(tpcor$vex2[,1], horiz=TRUE, las=1, main="Log precipitation", sub="First canonical coefficients vector") ## End(Not run)
## Not run: # CCA relating Canadian daily temperature and precipitation data require(fda) data(CanadianWeather) daybasis <- create.bspline.basis(c(0,365), nbasis=80) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd precfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"log10precip"], daybasis)$fd tpcor <- cca.ct(tempfd, precfd) par(mfrow=1:2) barplot(tpcor$vex1[,1], horiz=TRUE, las=1, main="Temperature", sub="First canonical coefficients vector") barplot(tpcor$vex2[,1], horiz=TRUE, las=1, main="Log precipitation", sub="First canonical coefficients vector") ## End(Not run)
Subtracts the (continuous-time) mean of each of the variables. This is analogous
to column-centering an data matrix.
center.ct(fdobj)
center.ct(fdobj)
fdobj |
continuous-time multivariate data set of class |
A centered version of the input data.
Philip Tzvi Reiss <[email protected]>
Computes the correlation matrix of a continuous-time multivariate
data set represented as an fd
object; or the cross-correlation
matrix of two such data sets.
cor.ct(fdobj1, fdobj2 = fdobj1, common_trend = FALSE)
cor.ct(fdobj1, fdobj2 = fdobj1, common_trend = FALSE)
fdobj1 |
continuous-time multivariate data set of class |
fdobj2 |
an optional second data set |
common_trend |
logical: centering wrt mean function if |
A matrix of (cross-) correlations
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
center.fd
, for centering of "fd"
objects; inprod.cent
## Not run: # Canadian temperature data require(fda) require(corrplot) data(CanadianWeather) daybasis <- create.fourier.basis(c(0,365), nbasis=55) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd ## The following yields a matrix of correlations that are all near 1: rawcor <- cor.ct(tempfd) corrplot(rawcor, method = 'square', type = 'lower', tl.col="black", tl.cex = 0.6) ## This occurs due to a strong seasonal trend that is common to all stations ## Removing this common trend leads to a more interesting result: dtcor <- cor.ct(tempfd, common_trend = TRUE) ord <- corrMatOrder(dtcor) dtcord <- dtcor[ord,ord] corrplot(dtcord, method = 'square', type = 'lower', tl.col="black", tl.cex = 0.6) ## End(Not run)
## Not run: # Canadian temperature data require(fda) require(corrplot) data(CanadianWeather) daybasis <- create.fourier.basis(c(0,365), nbasis=55) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd ## The following yields a matrix of correlations that are all near 1: rawcor <- cor.ct(tempfd) corrplot(rawcor, method = 'square', type = 'lower', tl.col="black", tl.cex = 0.6) ## This occurs due to a strong seasonal trend that is common to all stations ## Removing this common trend leads to a more interesting result: dtcor <- cor.ct(tempfd, common_trend = TRUE) ord <- corrMatOrder(dtcor) dtcord <- dtcor[ord,ord] corrplot(dtcord, method = 'square', type = 'lower', tl.col="black", tl.cex = 0.6) ## End(Not run)
Computes the covariance matrix of a continuous-time multivariate
data set represented as an fd
object; or the
cross-covariance matrix of two such data sets.
cov.ct(fdobj1, fdobj2 = fdobj1, common_trend = FALSE)
cov.ct(fdobj1, fdobj2 = fdobj1, common_trend = FALSE)
fdobj1 |
continuous-time multivariate data set of class |
fdobj2 |
an optional second data set |
common_trend |
logical: centering with respect to the mean function if |
A matrix of (cross-) covariances
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
# see example for cor.ct, which works similarly
# see example for cor.ct, which works similarly
Several methods of continous-time multivariate analysis require a matrix of inner products of pairs of centered functions from a basis, such as a B-spline basis, or pairs consisting of one function from each of two bases. This function computes such matrices via 7-point Newton-Cotes integration, which is exact for cubic B-splines. For a Fourier basis with the inner product taken over the entire range, a simple closed form is used instead of integration.
inprod.cent(basis1, basis2 = basis1, rng = NULL)
inprod.cent(basis1, basis2 = basis1, rng = NULL)
basis1 |
basis object from the |
basis2 |
an optional second basis |
rng |
time range. By default, the entire range spanned by the basis, or the intersection of the ranges of the two bases. |
Matrix of inner products of each pair of centered basis functions.
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
create.bspline.basis
from package fda
, for the most commonly used basis object type.
## Not run: require(fda) bbasis6 <- create.bspline.basis(nbasis=6) inprod.cent(bbasis6) fbasis7 <- create.fourier.basis(nbasis=7) inprod.cent(fbasis7) ## End(Not run)
## Not run: require(fda) bbasis6 <- create.bspline.basis(nbasis=6) inprod.cent(bbasis6) fbasis7 <- create.fourier.basis(nbasis=7) inprod.cent(fbasis7) ## End(Not run)
A continuous-time version of k-means clustering in which each cluster is a time segments or set of time segments.
kmeans.ct( fdobj, k, common_trend = FALSE, init.pts = NULL, tol = 0.001, max.iter = 100 )
kmeans.ct( fdobj, k, common_trend = FALSE, init.pts = NULL, tol = 0.001, max.iter = 100 )
fdobj |
continuous-time multivariate data set of class |
k |
number of clusters |
common_trend |
logical: Should the curves be centered with respect to the mean function?
Defaults to |
init.pts |
a set of k time points. The observations at these time points serve as initial values for the k means. Randomly generated if not supplied. |
tol |
convergence tolerance for the k means |
max.iter |
maximum number of iterations |
Object of class "kmeans.ct
", a list consisting of
fdobj |
the supplied |
means |
means of the k clusters |
transitions |
transition points between segments |
cluster |
cluster memberships in the segments defined by the transitions |
size |
length of each cluster, i.e. sum of lengths of subintervals making up each cluster |
totisd |
total integrated sum of distances from the overall mean; this is the analogue of |
withinisd |
within-cluster integrated sum of distances, i.e. integrated sum of distances from each cluster mean |
tot.withinisd |
total within-cluster integrated sum of distances, i.e. |
betweenisd |
between-cluster integrated sum of distances, i.e. |
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
## Not run: require(fda) data(CanadianWeather) daybasis <- create.bspline.basis(c(0,365), nbasis=55) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd kmtemp3 <- kmeans.ct(tempfd, 3) plot(kmtemp3, axes=FALSE) axesIntervals(); box() plot(silhouette.ct(kmtemp3), axes=FALSE) axesIntervals(); box() ## End(Not run)
## Not run: require(fda) data(CanadianWeather) daybasis <- create.bspline.basis(c(0,365), nbasis=55) tempfd <- smooth.basis(day.5, CanadianWeather$dailyAv[,,"Temperature.C"], daybasis)$fd kmtemp3 <- kmeans.ct(tempfd, 3) plot(kmtemp3, axes=FALSE) axesIntervals(); box() plot(silhouette.ct(kmtemp3), axes=FALSE) axesIntervals(); box() ## End(Not run)
A continuous-time version of Fisher's LDA, in which segments of the time interval take the place of groups of observations.
lda.ct(fdobj, partition, part.names = NULL)
lda.ct(fdobj, partition, part.names = NULL)
fdobj |
continuous-time multivariate data set of class |
partition |
a priori break points dividing the time interval into segments |
part.names |
optional character vector of names for the segments |
The means
and scaling
components of the output are similar to
lda
, but unlike that function, lda.ct
performs only
Fisher's LDA and cannot incorporate priors or perform classification.
Object of class "lda.ct
", a list consisting of
means |
means of the variables within each segment |
scaling |
matrix of coefficients defining the discriminants (as in |
values |
eigenvalues giving the ratios of between to within sums of squares |
partition |
the supplied |
fdobj |
linear discriminants represented as an |
nld |
number of linear discriminants |
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
plot.lda.ct
; lda
, for the classical version
## see end of example in ?pca.ct
## see end of example in ?pca.ct
Given a basis object as defined in the fda package (see basisfd
),
this function simply computes the vector of means of the basis functions. Used internally.
meanbasis(basis, rng = NULL)
meanbasis(basis, rng = NULL)
basis |
a basis object of class |
rng |
time range. By default, the entire interval spanned by the basis. Must be left NULL for Fourier bases. |
Vector of means of the basis functions
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
require(fda) bbasis6 <- create.bspline.basis(nbasis=6) meanbasis(bbasis6) meanbasis(bbasis6, c(.3,.6)) fbasis11 <- create.fourier.basis(nbasis=11) meanbasis(fbasis11)
require(fda) bbasis6 <- create.bspline.basis(nbasis=6) meanbasis(bbasis6) meanbasis(bbasis6, c(.3,.6)) fbasis11 <- create.fourier.basis(nbasis=11) meanbasis(fbasis11)
A continuous-time version of principal component analysis.
pca.ct(fdobj, cor = FALSE, common_trend = FALSE)
pca.ct(fdobj, cor = FALSE, common_trend = FALSE)
fdobj |
continuous-time multivariate data set of class |
cor |
logical: use correlation matrix if |
common_trend |
logical: Should the curves be centered with respect to the mean function?
Defaults to |
Returns a list including:
var |
variances of the principal components. |
loadings |
the matrix of loadings (i.e., its columns are the eigenvectors of the continuous-time covariance). |
scorefd |
score functions. |
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
cov.ct
; princomp
, for the classical version
## Not run: # Data for one session from a classic EEG data set require(fda) require(eegkit) data(eegdata) data(eegcoord) longdat <- subset(eegdata, subject=="co2a0000369" & trial==0) widedat <- reshape(longdat, direction="wide", drop=c("subject","group","condition","trial"), v.names="voltage",idvar="channel") # Convert time series for 64 channels to a functional data object bsb <- create.bspline.basis(c(0,255),nbasis=30) fdo <- Data2fd(argvals=0:255, y=t(as.matrix(widedat[,-1])), basisobj=bsb) plot(fdo) # Now do PCA and display first loadings for 3 PC's, # along with percent variance explained by each pcc <- pca.ct(fdo) pve <- 100*pcc$var/sum(pcc$var) par(mfrow=c(1,3)) cidx <- match(widedat[,1],rownames(eegcoord)) eegspace(eegcoord[cidx,4:5],pcc$loadings[,1], colorlab="PC1 loadings", main=paste0(round(pve[1],0), "%"), mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],pcc$loadings[,2], colorlab="PC2 loadings", main=paste0(round(pve[2],0), "%"), mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],pcc$loadings[,3], colorlab="PC3 loadings", main=paste0(round(pve[3],0), "%"), mar=c(17,3,12,2), cex.main=2) # Linear discriminant analysis: discriminating among the 1st, 2nd and 3rd portions # of the time interval ld <- lda.ct(fdo, c(85,170)) plot(ld) eegspace(eegcoord[cidx,4:5],ld$scaling[,1], colorlab="LD1 coefficients", mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],ld$scaling[,2], colorlab="LD2 coefficients", mar=c(17,3,12,2), cex.main=2) ## End(Not run)
## Not run: # Data for one session from a classic EEG data set require(fda) require(eegkit) data(eegdata) data(eegcoord) longdat <- subset(eegdata, subject=="co2a0000369" & trial==0) widedat <- reshape(longdat, direction="wide", drop=c("subject","group","condition","trial"), v.names="voltage",idvar="channel") # Convert time series for 64 channels to a functional data object bsb <- create.bspline.basis(c(0,255),nbasis=30) fdo <- Data2fd(argvals=0:255, y=t(as.matrix(widedat[,-1])), basisobj=bsb) plot(fdo) # Now do PCA and display first loadings for 3 PC's, # along with percent variance explained by each pcc <- pca.ct(fdo) pve <- 100*pcc$var/sum(pcc$var) par(mfrow=c(1,3)) cidx <- match(widedat[,1],rownames(eegcoord)) eegspace(eegcoord[cidx,4:5],pcc$loadings[,1], colorlab="PC1 loadings", main=paste0(round(pve[1],0), "%"), mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],pcc$loadings[,2], colorlab="PC2 loadings", main=paste0(round(pve[2],0), "%"), mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],pcc$loadings[,3], colorlab="PC3 loadings", main=paste0(round(pve[3],0), "%"), mar=c(17,3,12,2), cex.main=2) # Linear discriminant analysis: discriminating among the 1st, 2nd and 3rd portions # of the time interval ld <- lda.ct(fdo, c(85,170)) plot(ld) eegspace(eegcoord[cidx,4:5],ld$scaling[,1], colorlab="LD1 coefficients", mar=c(17,3,12,2), cex.main=2) eegspace(eegcoord[cidx,4:5],ld$scaling[,2], colorlab="LD2 coefficients", mar=c(17,3,12,2), cex.main=2) ## End(Not run)
Plots a continuous-time k-means clustering object generated by a call
to kmeans.ct
.
## S3 method for class 'kmeans.ct' plot( x, plottype = "functions", mark.transitions = TRUE, col = NULL, lty = NULL, xlab = "Time", ylab = NULL, legend = TRUE, ncol.legend = 1, cex.legend = 1, ... )
## S3 method for class 'kmeans.ct' plot( x, plottype = "functions", mark.transitions = TRUE, col = NULL, lty = NULL, xlab = "Time", ylab = NULL, legend = TRUE, ncol.legend = 1, cex.legend = 1, ... )
x |
clustering object produced by |
plottype |
either |
mark.transitions |
logical: Should transitions between clusters be marked
with vertical lines? Defaults to |
col |
plot colors |
lty |
line type |
xlab , ylab
|
x- and y-axis labels |
legend |
either a logical variable (whether a legend should be included) or a character
vector to appear in the legend. Default is |
ncol.legend |
number of columns for legend |
cex.legend |
character expansion factor for legend |
... |
other arguments passed to |
None; a plot is generated.
Philip Tzvi Reiss <[email protected]> and Biplab Paul <[email protected]>
kmeans.ct
, which includes an example
Plots the Fisher's linear discriminant functions generated by a call
to lda.ct
.
## S3 method for class 'lda.ct' plot(x, ylab = "Discriminants", xlab = "Time", which = NULL, col = NULL, ...)
## S3 method for class 'lda.ct' plot(x, ylab = "Discriminants", xlab = "Time", which = NULL, col = NULL, ...)
x |
linear discriminant analysis object produced by |
ylab , xlab
|
y- and x-axis labels |
which |
which of the linear discrminants to plot |
col |
color vector |
... |
other arguments passed to |
None; a plot is generated.
Biplab Paul <[email protected]> and Philip Tzvi Reiss <[email protected]>
## see the example at the end of ?pca.ct
## see the example at the end of ?pca.ct
Plots the silhouette index, generated by a call to
silhouette.ct
, for a continuous-time k-means clustering object.
## S3 method for class 'silhouette.ct' plot(x, mark.transitions = TRUE, xlab = "Time", ylab = "Silhouette", ...)
## S3 method for class 'silhouette.ct' plot(x, mark.transitions = TRUE, xlab = "Time", ylab = "Silhouette", ...)
x |
silhouette object produced by |
mark.transitions |
logical: Should transitions between clusters be marked
with vertical lines? Defaults to |
xlab , ylab
|
x- and y-axis labels |
... |
other arguments passed to |
None; a plot is generated.
Philip Tzvi Reiss <[email protected]>
kmeans.ct
, which includes an example; silhouette.ct
Computes the silhouette index, at a grid of time points, for a continuous-time
k-means clustering object produced by kmeans.ct
.
silhouette.ct(kmobj, ngrid = 5000)
silhouette.ct(kmobj, ngrid = 5000)
kmobj |
continuous-time k-means clustering from |
ngrid |
number of equally spaced grid points at which to compute the silhouette index |
Object of class "silhouette.ct
", a list consisting of
grid |
grid of |
value |
silhouette index at each point along the grid |
transitions |
transition points between segments |
cluster |
cluster memberships in the segments defined by the transitions |
mean |
mean silhouette index |
An error is issued if the grid of time points contains one or more of the
cluster transition points. This should not ordinarily occur, but if it does, it can
be remedied by modifying ngrid
.
Philip Tzvi Reiss <[email protected]>
kmeans.ct
, which includes an example; plot.silhouette.ct
Subtracts the (continuous-time) mean and divides by the (continuous-time) standard
deviation of each of the variables. This is the continuous-time analogue
of taking an data matrix, subtracting the mean of each column, and
dividing by the standard deviation of each column, as is done by
scale(..., center=TRUE, scale=TRUE)
.
standardize.ct(fdobj)
standardize.ct(fdobj)
fdobj |
continuous-time multivariate data set of class |
A standardized (centered and scaled) version of the input data.
Philip Tzvi Reiss <[email protected]>