|
The Department of Statistics at Indiana University, Bloomington, will host the third annual "Stat
Day" on April 25, 2011. In 2009 it was hosted by Purdue University. In 2010, it was held at IUPUI.
Stat Day is an informal gathering of statisticians and biostatisticians from universities throughout Indiana. Its goal is to bring together statisticians with
varied research interests as a way to facilitate the sharing of research ideas and the forming of
collaborations. Participants share their research through presentations and informal group
discussions. There are many excellent speakers on this year's schedule. The final talk, by Dr.
Karen Kafadar, will chronicle her work on the FBI investigation of the 2001 Anthrax mailings.
Please join us for Stat Day 2011 at IU Bloomington
April 25, 2011 at Indiana University Memorial Union (IMU) Dogwood Room from 10am - 4:15pm |
Speakers:
Jeesun Jung, Karen Kafadar, Chuanhai Liu, Hanxiang Peng, Prendrag Radivojac, Guilherme Rocha, Michael Trosset, William Wyatt & Tonglin Zhang
Schedule:
10:00-10:30 : Guilherme Rocha
10:30-11:00 : Chuanhai Liu
11:00-11:30 : Hanxiang Peng
11:30-12:00 : Tonglin Zhang
12:00-1:00 : LUNCH BREAK
(vouchers provided for use in IMU)
1:00-1:30 : Predrag Radivojac
1:30-2:00 : William Wyatt
2:00-2:30 : Michael Trosset
2:30-3:00 : Jeesun Jung
3:00-3:15 : BREAK
3:15-4:15 : Karen Kafadar
| Guilherme Rocha, IU Bloomington :: 10:00 - 10:30 |
| Monitoring Civil Structures Using Restricted Autoregressive Models and Wireless Sensor Networks |
Wireless Sensor Networks (WSNs) are a promising technology to detect changes in the state of a structure by monitoring its features such as its natural vibration properties. The natural vibration properties of the structure can be estimated using a multivariate autoregressive model (AR model) of its measured response to ambient vibrations. Fitting a multivariate AR model to the observed data requires the computation of the lagged covariance between measurements in all nodes. The resulting volume of data transmission causes significant latency due to the low data bandwidth of WSNs in addition to having a high transmission energy cost. In this talk, a set of physically motivated restrictions to the estimation of the AR model is presented. Such restrictions significantly reduce the volume of data flowing through the WSN thus reducing the latency in obtaining modal parameters and extending the battery lifetime of the WSN. The stabilization plots for the restricted and full AR models fitted using data simulated from linear structures are compared. Data collected from a WSN deployed on the Golden Gate Bridge are used to compare the stabilization plots and the estimated modes using the restricted and full AR models. The comparisons show that the restricted form of the AR leads to estimates of the modal parameters of comparable quality to that of the full AR model while substantially reducing the volume of transmitted data.
|
| Chuanhai Liu, Purdue :: 10:30 - 11:00 |
| Exact Probabilistic Inference and its Applications in Medical Statistics |
Valid, prior-free, and situation-specific probabilistic inference is desirable for serious uncertain inference, especially in medical statistics. This talk introduces such an inferential framework, called an Inferential Model (IM) framework and proposed recently by our research team. IMs are prior-free and produce inferential results that are probabilistic and have desired exact frequency properties. It illustrates the IM framework and demonstrates its potential applications in medical statistics with a collection of benchmark examples, including (i) 2x2 tables and Simpson's paradox, (ii) selective inference for partially selective sampling, (iii) a Poisson model with constrained parameter, and (iv) a many-normal-means problem in meta-analysis. It concludes with a few remarks on more applications of IMs in modern, large-scale, and challenging statistical problems, such as Stein's paradox, the Behrens-Fisher problem, multiple hypothesis testing, and variable selection in linear regression.
|
| Hanxiang Peng, IUPUI :: 11:00 - 11:30 |
| An Empirical Likelihood Approach To Goodness of Fit Testing |
Motivated by applications to goodness of fit testing, the empirical likelihood approach is generalized to allow for the number of constraints to grow with the sample size and for the constraints to use estimated criteria functions. The latter is needed to handle naturally occurring nuisance parameters. A central limit theorem is proved to deal with quadratic forms based on random vectors of increasing dimensions.This result is needed to prove the appropriate Wilks theorems.The proposed empirical likelihood based goodness of fit tests are asymptotically distribution free. For univariate observations, tests for a specified distribution, for a distribution of parametric form, and for a symmetric distribution are presented.For bivariate observations tests for independence, for spherical symmetry, and for equal marginals are developed.
|
| Tonglin Zhang, Purdue :: 11:30 - 12:00 |
| Investigating the Net Primary Production of Chinese Forest Ecosystem with Spatial Statistical Methods |
Forest ecosystems play an important role in global carbon cycle and quantification of forest net primary production (NPP) in a spatial context remains an important challenge at landscape, regional and continental scales. Previous studies have revealed that the spatial correlation is present in forest NPP distributions. By using spatial statistical methods, this study first investigated the local relationship between Chinese forest NPP density and many climate variables, and then derived prediction of the national forest NPP total. The results showed that the geostatistical modeling method made significantly better prediction than multiple regression method, and it was more robust than the remote sensing or process-based methods.
|
| Predrag Radivojac, IU Bloomington :: 1:00 - 1:30
|
| Towards Predicting Molecular Cause of Disease from Amino Acid Substitutions |
Advances in high-throughput genotyping and next generation sequencing have generated vast amounts of human genetic variation data. Single nucleotide substitutions within protein coding regions are of particular importance due to their potential to give rise to amino acid substitutions that affect protein structure and function which may ultimately lead to disease. Over the last decade, a number of computational methods have been developed to predict whether such amino acid substitutions result in an altered phenotype, but are not well suited to providing probabilistic estimates of the underlying disease mechanism. In this talk I will present our (supervised, kernel and non-kernel) methods for predicting functionally important amino acid substitutions from protein sequence and structure. I will argue that the molecular cause of disease can be confidently predicted in about 10% of currently available disease-associated mutations and that some hints on the molecular mechanisms can be obtained in as many as 50% of mutations. I will discuss both algorithmic issues and significant differences in the patterns of amino acid substitutions between inherited disease, somatic disease and putatively neutral polymorphisms.
|
| William Wyatt, IU Bloomington
:: 1:30 - 2:00 |
| Impact of Blocking Factors on the Interpretation of Motor Learning Data. |
In many fields repeated experimentation is utilized to evaluate, support and/or refute existing theories. While conclusions are frequently drawn based on conflicting experimental paradigms little attention is given to the analysis design. Using examples from motor learning and feedback literature, the impact of block designs specifically variation in trial number, block size, and analysis strategy will be explored. It will be shown that without a consistent design and analysis strategy conclusions drawn from contradicting studies should be compared with caution. With this in mind a clear relationship between design, statistical power, bias, and variance will be demonstrated.
|
| Michael Trosset , IU Bloomington
:: 2:00 - 2:30 |
| Quasi-Newton Methods for Simulation-Based Parameter Estimation |
We describe two variants of an algorithm for stochastic optimization, i.e., optimization in which evaluation of the objective function is corrupted by chance variation. In this framework, each attempted function evaluation f(x) is drawn from a probability distribution P(x). A typical application is optimizing the parameters of a stochastic simulation, e.g., estimating the parameters of an analytically intractable stochastic process by minimizing a measure of discrepancy between simulated samples and an observed sample. The proposed algorithm synthesizes ideas from response surface methodology (local quadratic approximations of theobjective function constructed from designed experiments by regression, confidencesets for constrained minimizers of quadratic functions, ridge analysis) and numericaloptimization (trust region methods, secant updates).
|
| Jeesun Jung, IUPUI
:: 2:30 - 3:00 |
| Identification of Multiple Rare Variants Associated with a Aisease |
Identification of rare variants responsible for complex disease has been promoted by advances in sequencing technologies. However statistical methods that can handle the vast amount of data generated and interpret the complicated relationship between disease and these variants have lagged behind. In this paper, we have applied a novel statistical approach called a zero-inflated Poisson regression model taking into account the excess of zeros caused by the extremely low frequency to 24,487 exonic variants distributed by the Genetic Analysis Workshop 17. The 697 subjects provided were grouped as Europeans, Asians and Africans based on principal component analysis, and the total number of rare variants per gene was found for each individual. These collapsed variants were analyzed based on the assumption that rare variants are enriched in a group of people affected by a disease as compared to a group of unaffected people. Within the same framework, we tested for the hypothesis with given quantitative traits, Q1, Q2 and Q4. Analysis performed on the combined 697 individuals and each separate ethnic group yielded quite different results in both the full exonic SNP dataset and the nsSNP only dataset. For the combined population analysis, UGT1A1 was found to be associated with disease liability while FLT1 was associated with Q1. In comparison with the simulation model, our results confirmed that FLT1 and KDR were associated with Q1, while VNN1 was correlated with Q2. No significant genes were found to be associated with Q4. These results show the feasibility and capability of our new statistical model to detect multiple rare variants influencing disease risk.
|
| Karen Kafadar, IU Bloomington
:: 3:15 - 4:15 |
| Statistical Analysis of Data from the FBI's Scientific Investigation of the 2001 Bacillus Anthracix (Anthrax) Mailings
|
On February 15, a Committee of the National Academy of Science released its report on the scientific approaches used in the investigation into the origins of the anthrax found in letters mailed to New York City and Washington D.C. in October 2001.
Findings in the report included:
(1) the available scientific evidence alone was insufficient to reach a definitive conclusion;
(2) the letters included small amounts of silicon but no evidence that it was added as a dispersant for added weaponization;
(3) spores in the letters and in RMR-1029, a flask found at U.S. Army Medical Research Institute for Infectious Diseases (USAMRIID), share a number of genetic similarities, which could arise in several ways;
(4) RMR-1029 was not the immediate "parent material" for spores used in the letters.
This talk will discuss the data made available to the Committee that was used in the statistical analyses which led to these findings, focusing primarily on finding (3).
The press release and full report can be found at http://www.nationalacademies.org/onpinews/newsitem.aspx RecordID=13098
|
|
|