Event Archives
2019 Events
Tuesday, September 17, 2019 Paul Availlach, PhD, MD, Assistant Professor of Biomedical Informatics in the Department of Biomedical Informatics, Harvard Medical School
"Creating FAIR Computational Tools for the Nartionally-Scaled Conduct of Biomedical Research"
The real value in biomedical research lies not in the scale of any single source of data, but in the ability to integrate and interrogate multiple, complementary datasets simultaneously. Our investigatin focus is in translational bioformatics, specifically in integrating multiple heterogeneous sources of clinical and genomics data in a meaningful way.
Tuesday, May 21, 2019 Ankur Pandya, PhD Assistant Professor of Health Decision Science in the Department of Health Policy and Management at the Harvard T.H. chan School of Public Health
"Modeling the cost Effectiveness of Two big League Pay-for-Performance Policies:
To date, evidence on pay-for-performance has been mixed. When pay-for-performance policies improve health outcomes, researchers should evaluate whether these health gains are worth the incremental costs (financial incentives and increased utilization) needed to achieve them.
Tuesday, May 7, 2019 Ludovic Trinquart, PhD, Assistant Professor of Biostatistics for the Boston Universtiy School of Public Health
"Restricted Mean Survival Times to Improve Communication of Evidence from Clilnical Trials and Meta-Analyses?"
The hazard ratio (HR) has become the most commonly reported effect size measure for censored outcome data in clinical trials. In this talk, I will show how the choice of measure in clinical trial reports can influence the take-home message.
Tuesday, April 23, 2019 Suzanne G. Leville, PhD, UMass Boston 
"Generating Evidence Using Population-Based Methods: Shedding New Light on the Age-Old Problem of Chronic Pain in Older Adults"
Chronic musculosketal pain is so highly provalent in old age that in some ways it has been taken for granted, attributed t the inevitable arthritis that accomanies aging. In recent years, studies have begun to examine the functional burden of arthritis and pain symptoms in older populations, often focused on selected common sites of pain.
Tuesday, April 16, 2019 Michael L. Barnett, MD, MS Assistant Professor, Health Policy Management Harvard T.H. Chan School of PUblic Health
"Bundled Payments for Joint Replacement in Medicare: The Future of Payment Reform?"
In April 2016, Medicar Implemented Comprehensive Care for Joint Replacement (CJR), a mandatory bundled payment model for inpatient lower extremity joint replacement (LEJR) of the hip and knee in Medicare beneficiaries.
Tuesday, April 2, 2019 Lane Harrison, PhD Assistant Professor in the Department of Computer Science at worcester Polytechnic Institute
"Model-based Approaches for Data visualization Ability Assessment"
Data visualization is making strides in delivering new tools, techniques, and systems to analysts engaged in data analysis and communication. But by providing more options leads to a paradox of choice - how do creators of data visualizations navigate the tradeoffs and uncertainty between available design and techniques?
Tuesday, March 19th, 2019 Chi Hyun Lee, PhD., Assistant Professor, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, Universtiy of Massachusetts amherst
"Analysis of Restricted Mean Survival Time for Length-Biased Data"
In clinical studies with time-to-even outcomes, the restricted mean survival time (RMST) has attracted substantial attention as a summary measurement for its straightforward clinical interpretation. When the data are subject to length-biased sam;ling, which is frequently encounteded observational cohort studies, existing methods to estimate the RMST are not applicable.
Tuesday, March 5th, 2019 Tsung-Heng Tsai, PhD., Postdoctoral Associate in the lab of Olga Vitek at Northeastern Universtiy
“Statistical Methods for Reproducible Quantitative MD-based Proteomics”
Statistical methodology is key for reproducible research This is quantitative mass spectrometry-based proteomic experiments, which must overcome many sources of bias and unwanted variation.
Tuesday, January 15th, 2019 Jamie Ostroff, PhD., Chief, Behavioral Sciences Services and Vice Chair for Research. Director, Tobacco Treatment Program. Department of Psychiatry & Behavioral Sciences
“Testing the Effectiveness of Implementation Strategies to Improve Adherence to Tobacco Treatment Guidelines in Public Dental Health Clinics”
Backround: Despite American Dental Association recommendations, national surveys demonstrate that tobacco use assessment and treatment (TUT) has not been integrated into routine dental care.
2018 Events
Tuesday, December 4th, 2018 Dr. Juliana Cohen, PhD., Department of Public Health and Nutrition at Merrimack College and Department of Nutrition at the Harvard T.H. Chan School of Public Health
“Research and Methodology Assessing Children's Diets"
Dr. Juliana Cohen will discuss her research examining innovative techniques to improve children’s diets and the methodology used to assess diet in this population. The interventions discussed will include nutrition policies (including both the Healthy-Hunger Free Kids Act and Massachusetts policies), the impact of cafeteria-based interventions and choice architecture, and marketing in fast food restaurants. The dietary assessments discussed will include plate waste and 24-hour recalls.
Tuesday, November 6, 2018 Dr. Paul Avillach MD, PhD., Assistant Professor of Biomedical Informatics, Harvard Medical School
 “Creating FAIR Computational Tools for the Nationally-Scaled Conduct of Biomedical Research" 
Paul Avillach is Assistant Professor of Biomedical Informatics in the Department of Biomedical Informatics | Harvard Medical School. Paul Avillach holds secondary appointments in 1) the Department Pediatrics with Boston Children hospital and in the department of Epidemiology | Harvard T.H. Chan School of Public Health. Paul Avillach holds an MD in public health and epidemiology and a PhD in biomedical informatics. Avillach's research focuses on the development of novel methods and techniques for the integration of multiple heterogeneous clinic cohorts, electronic health records data, and multiple types of genomics data to encompass biological observations. He is PI and Co-Investigator on several large projects at the Department of Biomedical Informatics at Harvard Medical School, including the BD2K PIC-SURE Center of Excellence, the Global Rare Diseases Registry project, the PCORI ARCH project, the PCORI Phelan-Mcdermid Syndrome project, NIH Undiagnosed Diseases Network (UDN) coordinating center, NIH Data Commons and NHLBI Data Stage project. All those projects are running in production on secured cloud environments (HIPAA and/or FISMA moderate compliant)
Tuesday, October 16, 2018 Dr. Junwei Lu, PhD., Assistant Professor of Biostatistics, Harvard T.H. Chan School of Public Health
“Combinatorial Inference for Brain Imaging Datasets"
We propose the combinatorial inference to explore the global topological structures of graphical models.  In particular, we conduct hypothesis tests on many combinatorial graph properties including connectivity, hub detection, perfect matching, etc. Our methods can be applied to any graph property which is invariant under the deletion of edges. On the other side, we also develop a generic minimax lower bound which shows the optimality of the proposed method for a large family of graph properties. Our methods are applied to the neuroscience by discovering hub voxels contributing to visual memories.
Tuesday, October 2, 2018 Dr. Kunal Mankodiya, M.Sc., PhD., University of Rhode Island
“Smart Wearable Systems for Clinical Interventions"
In this talk, Dr. Mankodiya will present Wearable IOT, a unique framework that establishes human-centered interconnections with wearable sensors, smart textiles and data analytics––key elements for the future success of IOT in healthcare practices. He will demonstrate some of his ongoing (federally-funded) projects involving smartwatches and smart textiles that are targeted to remotely intervene patients suffering from neuropsychiatric disorders such as Parkinson's, post-traumatic stress disorders, and autism. He will also touch upon the emerging paradigm of modern IOT concepts. Dr. Mankodiya will also discuss about his newly-developed courses and hack-a-thons for undergraduate and graduate students to nurture the skill of entrepreneurial and design thinking in the intersection of IOT and healthcare.
Tuesday, September 8, 2018 Dr. Donghui Yan, University of Massachusetts Dartmouth 
“Random Projection Forests"
In this talk, I will introduce Random Projection Forests (RPF). RPF is an ensemble of random projection trees grown by successive node splits along randomly generated directions which preserves the locality of data points. This complements previous development of unsupervised extension to Random Forests-Cluster Forests (Yan, Chen and Jordan 2013)-which aims at clustering by random feature pursuits. We discuss two applications of RPF, fast k-nearest neighbors (kNN) search and the scoring of tissue microarray images.
Tuesday, September 4, 2018 Dr. Firas Khatib, PhD, University of Massachusetts Dartmouth 
 “Scientific Discoveries By Protein-Folding Game Players"
Can the brainpower of humans worldwide be brought to bear on critical problems posed in computational biology, such as the structure determination of proteins and designing novel enzymes? Yes! Citizen scientists—most of whom have little or no prior biochemistry experience—have uncovered knowledge that eluded scientists for years. Players of the online protein folding video game Foldit have contributed to several scientific discoveries through gameplay. Rather than solving problems with a purely computational approach, combining humans and computers can provide a means for solving problems neither could solve alone. 
Tuesday, May 1st, 2018, Dr. Annie Gjelsvik, PhD, Brown University School of Public Health
“Neighborhood Risk and Pediatric Asthma Hospital Use"
Asthma is one of the most common chronic conditions of childhood. Stressors related to the neighborhood context
have been shown to exacerbate asthma symptoms in children. Children who are exposed to higher levels of
neighborhood risks are at increased risk for more health care utilization. By linking data from a RI hospital network
that provides about two-thirds of pediatric emergency department (ED) and 90% of inpatient services in the state
with Census data this studies aim is to assess the association of neighborhood risks and pediatric asthma hospital use.
Tuesday, April 17, 2018, Dr. Maoyuan Sun, PhD, University of Massachusetts Dartmouth
“Exploring a Relationship Space: Researching through Visualizations"
We are living in a connected world, where more and more information and everyday objects (e.g., photos, videos, pedometers, home appliances, and automobiles) are gathering into a worldwide digital network. This connected planet offers us unprecedented opportunities to step into the future based on our today's understanding. However, how can we make sense of this web of world and navigate in it? I investigate this question by exploring a relationship space through researching usable visualizations. In this talk, I will go through multiple designs that help to form my current view of the space, highlighting a 4S-aspect: Schema, Structure, Strength, and Size. I will conclude a brief overview of design implications drawn from this in-progress relationship space, and outline future research opportunities around it
Tuesday, April 3, 2018, Dr. Michael Sugarman, PhD, Bedford VA Medical Center
“The Placebo Effect in Clinical Trials and Health Research"
Large placebo responses in clinical trials are often viewed as a “nuisance,” and they have been referred to as “unexpected” and “problematic” responses that “mask” the true effectiveness drugs in clinical trials. But just what are placebo effects, and what might cause their presence in clinical trials? This talk will cover common mechanisms of placebo effects in clinical trials, conditions in which large placebo effects have occurred, and relevant implications and recommendations for clinical research and clinical practice.
Tuesday, March 20, 2018, Dr. Pedro L. Gonzalo, PhD, Brown University School of Public Health
“Cross-Temporal Matching: A Quasi-Experimental Design Method"
While Randomized Controlled Trials (RCTs) still represent the gold standard for many clinical questions, they are not without shortcomings and may be hard to implement in many applications of interest. Quasi-experimental observational methods provide a viable alternative to RCTs in many situations where the RCT approach is too costly or not feasible. This talk will present a new quasi-experimental approach that utilizes a cross-temporal matching framework to carry out causal treatment effect estimation in the presence of confounders. The method takes advantage of the natural experiment created when a given treatment experiences a rapid growth of dissemination over a relatively short period of time. The method will be illustrated with several clinical examples, including the study of the effect of hospitalists in the care of hospitalized patients, and the study of the potential savings effect of hospice on Medicare expenditures.
Tuesday, March 6, 2018, Dr. Emmanuelle Belanger, PhD, Brown University
“Measurement Validity of the Patient Health Questionnaire 9 in US Nursing Home Residents"
The introduction of the Patient Health Questionnaire 9 (PHQ-9) as a measure of depression in the Minimum Dataset 3.0 (MDS) provides an unprecedented opportunity to examine depressive symptomatology in US nursing homes. In this presentation, Dr. Belanger will discuss the latest results of her work validating both the self-reported and observer-version of the PHQ-9 in a national sample newly admitted and long-stay nursing home residents. This work draws on the 2012 COnsensus-based Standards for the selection of health measurement INstruments (COSMIN) to ensure rigor in assessments of measurements properties, with a focus on internal consistency, construct validity, as well as criterion validity.
Tuesday, February 20, 2018, Dr. Laura Balzer, PhD, UMass Amherst
“Targeting Learning to evaluate the effects of community-based interventions: the SEARCH trial & HIV prevention in East Africa”
Evaluation of cluster-based interventions presents significant methodological challenges. In this talk, we describe the design and analysis of the SEARCH trial, an ongoing community randomized trial to evaluate the impact of early HIV diagnosis and immediate treatment with streamlined care in rural Uganda and Kenya. We focus on 3 choices to optimize the design and analysis: causal parameter, pair-matching, and data-adaptive estimation. These choices are compared theoretically and with finite sample simulations. We demonstrate how each choice improves upon standard practice. We conclude with practical implications and some ongoing challenges.
Tuesday, February 6, 2018, Dr. Philip Thomas, UMass Amherst
“Safe Machine Learning”
Machine learning algorithms are everywhere, ranging from simple data analysis and pattern recognition tools used across the sciences to complex systems that achieve super-human performance on various tasks. Ensuring that they are safe—that they do not, for example, cause harm to humans or act in a racist or sexist way—is therefore not a hypothetical problem to be dealt with in the future, but a pressing one that we can and should address now.
In this talk I will discuss some of my recent efforts to develop safe machine learning algorithms, and particularly safe reinforcement learning algorithms, which can be responsibly applied to high-risk applications. I will focus on a specific research problem that is central to the design of safe reinforcement learning algorithms: accurately predicting how well a policy (controller) would perform if it were to be used, given data collected from the deployment of a different policy. Solutions to this problem provide a way to determine that a newly proposed policy would be dangerous to use without requiring the dangerous policy to ever actually be used.
Tuesday, January 30, 2018, Dr. Fang Zhang, Harvard Medical School
“Interrupted Time Series Design and Matching in Health Services Research”
Interrupted time is a strong quasi-experimental research design that is increasingly applied to estimate the effects of health services and policy interventions. Basic methods and rational will be introduced. The extension of existing methods incorporating various statistical matching/weighting techniques will be discussed.
Tuesday, January 16, 2018, Chanelle J. Howe, Brown University School of Public Health
“Causal Mediation Analyses When Studying HIV Racial/Ethnic and Other Health Disparities"
Reducing HIV racial/ethnic and other health disparities in the United States is a high priority. Reductions in HIV racial/ethnic and other health disparities can potentially be achieved by intervening on important intermediates. Causal mediation analysis techniques can be used to identify important intermediates of HIV racial/ethnic and other health disparities as well as estimate the impact of intervening on such intermediates when certain conditions are met. Using racial disparities in HIV virologic suppression as an example, this talk will: (1) describe a conceptual framework for studying HIV racial/ethnic and other health disparities; (2) review causal mediation analysis techniques that can be used for the aforementioned identification and estimation; (3) discuss why studies of HIV racial/ethnic and other health disparities can be particularly vulnerable to selection bias and detail potential approaches that can be used to minimize such selection bias; and (4) emphasize the importance of “good data” when performing causal mediation analyses in this setting.
2017 Events
Tuesday, December 5, 2017, Ming (Daniel) Shao, University Of Massachusetts Dartmouth
“Low-Rank Transfer Learning and Its Applications"
For knowledge-based machine learning algorithms, label or tag is critical in training the discriminative model. However, labeling data is not an easy task because these data are either too costly to obtain or too expensive to hand-label. For that reason, researchers use labeled, yet relevant, data from different databases to facilitate learning process. This is exactly transfer learning that studies how to transfer the knowledge gained from an existing and well-established data (source) to a new problem (target). To this end, we propose a method to align the structure of the source and target data in the learned subspace by minimizing the reconstruction error, called low-rank transfer subspace learning (LTSL). The basic assumption is if each datum in a specific neighborhood in the target domain can be reconstructed by the same neighborhood in the source domain, then the source and target data might have similar distributions. The benefits of this method are two-fold: (1) generality to subspace learning methods, (2) robustness by low-rank constraint. Extensive experiments on face recognition, kin relationship understanding, and objection recognition demonstrate the effectiveness of our method. We will also discuss the potential of using low-rank modeling for other transfer learning related problems including: clustering, dictionary learning, and zero-shot learning.
Tuesday, November 7, 2017, Yue Wu, Northeastern University
“Low-shot Learning and Large Scale Face Recognition”
Automatic face recognition in visual media is essential for many real-world applications: e.g., face verification, automatic photo library management, biomedical analysis, along with many security applications. In this talk, I will first introduce the large scale face recognition problem, which suffers from classes and data explosion. Recognizing large scale people naturally brings the low-shot learning problem since many people only have limited number of images available for training. This talk also includes how to solve the low-shot learning problem. Along this line, some interesting applications, e.g., image-based kinship recognition, are also illustrated.
Yue Wu is a second-year Ph.D candidate in the Department of Electrical & Computer Engineering at Northeastern University, supervised by Professor Yun Raymond Fu. He received his B.S. degree in Electronic Information Engineering and M.S. degree in Information and Communication Engineering from Beijing University of Posts and Telecommunications, China, respectively. He was a research intern at France Telecom-Orange Lab Beijing during 2013-2014. His research is focused on face recognition, object recognition and deep learning, with the goal to make computers better understand faces and objects.
Tuesday, October 3, 2017, Brandon Marshall, PhD
"Responding to The Opioid Overdose Crisis: Insights from Rhode Island"
In this seminar, participants will learn about the state of the opioid overdose epidemic nationally and in Rhode Island. Prof. Brandon Marshall will discuss the state’s strategic plan to reduce overdose deaths, and will highlight the role that epidemiologists play in responding to the nation’s top public health crisis.
Brandon Marshall, PhD is an Associate Professor of Epidemiology at the Brown University School of Public Health. His research interests focus on infectious disease epidemiology, substance use, and the social, environmental, and structural determinants of health of vulnerable populations. He has published more than 125 scientific publications, including articles in JAMA, BMJ, and The Lancet. He works closely with the Rhode Island Department of Health on the state’s overdose epidemic efforts and directs www.PreventOverdoseRI.org, a CDC-funded statewide online surveillance system. He also chairs the Rhode Island Overdose Data Working Group and serves as an expert advisor to the Governor’s Overdose Prevention and Intervention Task Force.
Tuesday, September 19, 2017, Zhigang Li, PhD, Geisel School of Medicine at Dartmouth
"A Semiparametric Joint Model for Terminal Trend of Quality of Life and Survival in Palliative Care Research”
Dr. Li’s research interests include developing statistical modeling tools in the field of molecular epidemiology to analyze human microbiome and epigenetic changes in order to identify microbes and DNA methylation that mediate disease-leading causal pathways in children’s health research. He is also interested in developing joint modeling approaches to model longitudinal quality of life and survival data in palliative care research to answer important questions in this area. A broad range of modeling tools are involved in his research such as lasso-type regularization, SCAD regularization, mediation analysis, structural equation modeling, GEE, mixed models, Cox model, etc.
Tuesday, September 5, 2017, Jiang Gui, PhD, Geisel School of Medicine at Dartmouth
“Efficient Survival Multifactor Dimensionality Reduction Method for Detecting Gene-Gene Interaction”
The problem of identifying SNP-SNP interactions in case-control studies has been studied extensively and a number of new techniques have been developed. Little progress has been made, however, in the analysis of SNP-SNP interactions in relation to censored survival data. We present an extension of the two class multifactor dimensionality reduction (MDR) algorithm that enables detection and characterization of epistatic SNP-SNP interactions in the context of survival outcome. The proposed Efficient Survival MDR (ES-MDR) method handles censored data by modifying MDR’s constructive induction algorithm to use logrank Test. 
 We applied ES-MDR to genetic data of over 470,000 SNPs from the OncoArray Consortium. We use onset age of lung cancer and case-control (n=27,312) status as the survival outcome. We also adjust for subject’s smoking status. We first use PLINK to generate a pruned subset of SNPs that are in approximate linkage equilibrium with each other. We then ran ES-MDR to exhaustively search over all one-way and two-way interaction models. We identified that chr17_41196821_INDEL_T_D from BRCA1 gene and rs11692723_C from LOC102467079 gene as the top SNP-SNP interaction that associated with lung cancer onset age.
Jiang Gui, received his PhD. in statistics from University of California, Davis. His research involved the development of statistical and computational methods for relating high-dimensional microarray gene expression data to censored survival data. He is also interested in identifying gene-gene interaction and gene-environment interactions using machine learning algorithms.
2015 Events
November 18, 2015, James Scanlan, Attornery at Law, Washington, DC
"The Mismeasure of Health Disparities in Massachusetts and Less Affluent Places"
Problems exist in the measurement of health and healtcare disparities arising from the failure to consider ways that standard measures of differences between outcome rates tend to be systematically affected by the prevalence of an outcome. Special attention will be given to Massachusetts in two respects. One involves the fact that relative demographic differences in adverse outcomes tend to be camparatively large, while relative differences in the corresponding favorable outcomes tend to be comparatively small, in geographic areas where adverse outcomes are comparatively uncommon. The second involves anomalies in the Massachusetts Medicaid pay-for-performance program arising from the use of a measure of healthcare disparities that is a function of absolute differences between rates.
Tuesday, November 3, 2015, Steven D. Pizer, PhD, Associate Professor of Health Economics, Department of Pharmacy and Health Systems Sciences, Northeastern University
“Big Data, Casual Inference, and Instrumental Variables”
“I plan to discuss the potential and risk related to analysis of big data in health services and clinical research, conduct a brief tutorial on causal inference in observational studies using instrumental variables, and then present some of my own work that puts these methods to use. The specific application uses prescribing pattern instrumental variables to study the comparative effectiveness of alternative 2nd-line medications for type 2 diabetes.”
Tuesday, October 20, 2015, Finale Doshi-Velez, PhD, Assistant Professor in Computer Science, Harvard University
“Data-Driven Phenotype Trajectories in Autism Spectrum Disorders”
Autism Spectrum Disorder (ASD) is an extremely heterogeneous developmental disorder that affects nearly one in fifty children today. Understanding the heterogeneity of ASD is critical to discovering distinct etiologies and guiding treatment. The proliferation of electronic health records (EHRs) has made it possible for data-driven disease subtyping in a variety of disorders, such as chronic kidney disease and diabetes. However, especially in developmental disorders, the presentation of the disease is inextricably linked to the development of the child: these stages do not necessarily mark disease progression but rather disease evolution. In these cases, it makes sense to think about pathophenotypes as phenotype trajectories that evolve over time. In this talk, I will describe several approaches for deriving phenotype trajectories from EHR, which resulted in the discovery of novel ASD phenotypes.
Tuesday, October 6, 2015, Norma Terrin, PhD, Professor, Tufts University School of Medicine
“Joint Models for Predicting Clinical Outcomes from Quality of Life Data”
Our objective was to test whether longitudinally measured health-related quality of life (HRQL) predicts transplant-related mortality in pediatric hematopoietic stem cell transplant (HSCT). A standard analysis (Cox model with a time-varying covariate) would not have been adequate because it ignores measurement error in the covariate, missing data in the covariate, correlation between measurement error and survival, and the endogeneity of the covariate. Instead we used a joint model, which analyzes both the longitudinal and time-to-event variables as outcomes. Specifically, we used a shared parameter model with other causes of mortality as a competing risk. The trajectories for each HRQL domain were modeled by random spline functions. The survival submodels were adjusted for baseline patient, family, and transplant characteristics and the longitudinal submodels were run with and without adjustment. We found that HRQL trajectories were predictive of transplant-related mortality in pediatric HSCT, even after adjusting the survival outcome for baseline characteristics. Unadjusted trajectories were better predictors than adjusted trajectories.
Tuesday, September 15, 2015, Hoifong Poon, PhD, Researcher, Microsoft Research
“Machine Reading for Cancer Panomics”
Advances in sequencing technology have made available a plethora of panomics data for cancer research, yet the search for disease genes and drug targets remains a formidable challenge. Biological knowledge such as pathways can play an important role in this quest by constraining the search space and boosting the signal-to-noise ratio. The majority of knowledge resides in text such as journal articles, which has been undergoing its own explosive growth, making it mandatory to develop machine reading methods for automating knowledge extraction. In this talk, I will formulate the machine reading task for pathway extraction, review the state of the art and open challenges, and present our Literome project and latest attack to the problem based on grounded semantic parsing.
Tuesday, September 1, 2015, Roee Gutman, PhD, Department of Biostatistics, Brown University
“Robust Estimation of Causal Effects with Application to Erythropoiesis-stimulating Agents for End-stage Renal Disease”
This talk will focus on the proposal of an outcome-free three-stage procedure to estimate causal effects from non-randomized studies. First, we create subclasses that include observations from each group based on the covariates. Next, we independently estimate the response surface in each group using a flexible spline model. Lastly, multiple imputations of the missing potential outcomes are performed. A simulation analysis which resembles real life situations and compares this procedure to other common methods is carried out. In relation to other methods and in many of the experimental conditions examined, our proposed method produced a valid statistical procedure while providing a relatively precise point estimate and a relatively short interval estimate. We will demonstrate an Extension of this procedure to estimate the effects of Erythropoiesis-stimulating agents (ESAs) for end-stage renal disease patients undergoing hemodialysis.
May 19, 2015, Matthias Steinrücken, PhD, Assistant Professor of Biostatistics, Department of Biostatistics and Epidemiology, UMass Amherst
"Detecting Tracts of Local Ancestry in Genomic Sequence Data of Modern Humans"
The complex demographic history of modern humans has had a substantial impact on the genetic variation we observe today. Due to the process of chromosomal recombination the genomes of contemporary individuals can be mosaics comprised of different DNA segments originating from diverged subpopulations. This is of particular interest when studying variation related to genetic diseases. On the one hand, one has to account for neutral background variation resulting from the demographic history, but on the other hand, knowledge about the distribution of these ancestry segments can also be used to identify causal variants.
In this talk, I present a new method to detect tracts of local ancestry in genomic sequence data of modern humans, and demonstrate its accuracy and efficiency on simulated data. Explicitly modeling the underlying demographic history allows detection under very general scenarios. I will discuss extensions of the method and potential applications using the local ancestry information to foster the detection of functional genetic variation. The distribution of these tracts can also be used to infer features of the demographic history.
May 5, 2015, Yizhou Sun, PhD, Assistant Professor, College of Computer and Information Science, Northeastern University
"Mining Information Networks by Modeling Heterogenous Link Types"
Real-world physical and abstract data objects are interconnected, forming gigantic, interconnected networks. By structuring these data objects and interactions between these objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real-world applications that handle big data, including interconnected social media and social networks, scientific, engineering, or medical information systems, online e-commerce systems, and most database systems, can be structured into heterogeneous information networks. Different from homogeneous information networks, where objects and links are treated either as of the same type or as of untyped nodes or links, heterogeneous information networks in our model are semi-structured and typed, following a network schema. We then propose different methodologies in mining heterogeneous information networks by carefully modeling the links from different types. In this talk, I will introduce three recent developed techniques, which include (1) meta-path-based mining, (2) relation strength-aware mining, and (3) semantic-aware relation modeling, and their applications, such as similarity search, clustering, information diffusion, and voting prediction.
April 21, 2015, Scott Evans, PhD, Senior Research Scientist, Department of Biostatistics, Harvard School of Public Health
"Battling Superbugs with Statistical Thinking: Using Endpoints to Analyze Patients Rather than Patients to Analyze Endpoints"
Suberbugs, “nightmare bacteria” that have become resistant to our most potent antibiotics, are one of our most serious health threats. In the United States, at least 2 million people annually acquire serious bacterial infections that are resistant to antibiotics, originally designed to treat those infections. Resistance undermines our ability to fight infectious diseases and presents increased risk to vulnerable patient populations, including those with HIV, cancer, renal failure, as well as patients requiring surgery, neonatal care, and intensive care. In September of 2014, President Obama issued an Executive Order outlining a national strategy for combating antibiotic-resistant bacteria.
April 7, 2015, Nicholas Reich, PhD, Assistant Professor, Department of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst
"Statistical Challenges in Real-Time Infectious Disease Forecasting"
Epidemics of communicable diseases place a huge burden on public health infrastructures across the world. Advanced warnings of increases in disease incidence can, in many cases, help public health authorities allocate resources more effectively and mitigate the impact of epidemics. However, scientists and public health officials face many obstacles in trying to create accurate real-time forecasts of infectious disease incidence. Challenges range from the logistical (what data do you need and when is it available), to the statistical (what are the best methods for training and validating a forecasting model), to the scientific (what are the best models of disease transmission). In collaboration with the Thai Ministry of Public Health, we have developed a real-time forecasting model for dengue hemorrhagic fever in the 79 provinces of Thailand. Dengue is a mosquito-borne virus that annually infects over 400 million people worldwide. In this talk we will present results from our ongoing real-time forecasting efforts in Thailand while discussing the frameworks we have developed to address the challenges of this project.
March 17, 2015, Michael McGeachie, PhD, Instructor, Channing Division of Network Medicine, Harvard Medical School
"Longitudinal Microbiome Prediction with Dynamic Bayesian Networks"
High resolution DNA sequencing allows high resolution quantitative assessments of microbiome bacteria populations, and emerging evidence suggests that differences or aberrations in the microbiome can lead to various diseases and chronic conditions. Dynamic Bayesian Networks have been used in other settings to successfully model time series data and obtain accurate predictions of future behavior as well as identify salient connections and relationships within the data. In this work, we show that a DBN model of the infant gut microbiota ecology captures explicit relationships casually observed previously, including a relationship between age and clostridia, and between clostridia, gammaproteobacteria, and bacilli. DBN models is further useful for identifying rare, dramatic, sudden shifts in microbiome population (“abruptions”) observed in some infants, and providing quantitative likelihood estimates for these events. We will further discuss the differences between iterative and sequential prediction of infant gut microbiome composition, and the DBN’s usefulness for predicting response to perturbations and unusual initial conditions.
March 3, 2015, Matthew Fox, DSc, MPH, Associate Professor, Center for Global Health & Development, Department of Epidemiology, Boston University
"Quantitative Bias Analysis: The Case of the NO-SHOTS trial"
It is well understood that both systematic and random error can impact the results of epidemiologic research, however while random error is nearly always quantified, systematic error rarely is. Systematic error is typically relegated to discussion sections of manuscripts despite the fact that simple methods to quantify the impact of sources of bias have existed for years. This talk will demonstrate simple methods for quantitative bias analysis for misclassification problems and use the example of the NO-SHOTS randomized trial to demonstrate how the methods can be effective at exploring but the magnitude and direction of bias.
February 17, 2015, Xiagnan Kong, PhD, Assistant Professor, Computer Science Department, Worcester Polytechnic Institute
"Towards Taming Big Data Variety: From Social Networks to Brain Networks"
Over the past decade, we are experiencing big data challenges in various research domains. The data nowadays involve an increasing number of data types that need to be handled differently from conventional data records, and an increasing number of data sources that need to be fused together. Taming data variety issues is essential to many research fields, such as biomedical research, social computing, neuroscience, business intelligence, etc. The data variety issues are difficult to solve because the data usually have complex structures, involve many different types of information, and multiple data sources. In this talk, I'll briefly introduce the big data landscape and present two projects that help us better understand how to solve data variety issues in different domains. The first project addresses the challenge of integrating multiple data sources in the context of social network research. Specially, I will describe a network alignment method which exploit heterogeneous information to align the user accounts across different social networks. The second project addresses the challenge of analyzing complex data types in the context of brain network research. I will model the functional brain networks as uncertain graphs, and describe a subgraph mining approach to extract important linkage patterns from the uncertain graphs. I'll also introduce future work in this direction and explain some possibilities for upcoming evolutions in big data research.
January 20, 2015, Hao Wu, PhD, Assistant Professor, Department of Psychology, Boston College
“A Nonparametric Bayesian Item Response Model for Monotonic Selection Effect”
In various practical settings of educational and psychological measurement, individuals are potentially selected according to their ability levels before being measured. In this case, understanding the selection process would shed light on either possible unexpected issues in the administration of the measurement or important features of the group of people being measured. Given such importance, we will explore the potential selection process in this research. Especially, we will build a nonparametric Bayesian model to account for a monotonic selection effect in item response theory (IRT), where individuals with higher ability are more likely to be measured. Simulation results show that this model is able to identify and recover the selection effect in the population.
2014 Events
December 2, 2014: Lorenzo Trippa, PhD, Assistant Professor, Harvard School of Public Health
“Bayesian Nonparametric Cross-Study Validation Prediction Methods”
We consider comparisons of statistical learning algorithms using multiple datasets, via leave-one-in cross-study validation: each of the algorithms is trained on one dataset; the resulting model is then validated on each remaining dataset. This poses two statistical challenges that need to be addressed simultaneously. The first is the assessment of study heterogeneity, with the aim of identifying subset of studies within which algorithm comparisons can be reliably carried out. The second is the comparison of algorithms using the ensemble of datasets. We address both problems by integrating clustering and model comparison. We formulate a Bayesian model for the array of cross-study validation statistics, which defines clusters of studies with similar properties, and provides the basis for meaningful algorithm comparison in the presence of study heterogeneity. We illustrate our approach through simulations involving studies with varying severity of systematic errors, and in the context of medical prognosis for patients diagnosed with cancer, using high-throughput measurements of the transcriptional activity of the tumor's genes.
November 4, 2014: Todd MacKenzie, PhD, Associate Professor, Dartmouth College
“Causal Hazard Ratio Estimation Using Instrumental Variables or Principal Strata”
Estimation of treatment effects is a primary goal of statistics in medicine. Estimates from observational studies are subject to selection bias, while estimates from non-observational (i.e. randomized) studies are subject to bias due to non-compliance. In observational studies confounding by unmeasured confounders cannot be overcome by regression adjustment, conditioning on propensity scores or inverse weighted propensities. The method of instrumental variables (IVs) can overcome bias due to unmeasured confounding. In the first part of this talk a method for using IVs to estimate hazard ratios is proposed and evaluated. In the second part of this talk the approach of principal strata for deriving treatment effects for randomized studies subject to all-or-nothing compliance is reviewed and an estimate of the complier hazard ratio is proposed and evaluated.
October 21, 2014: Wei Ding, PhD, Associate Professor, UMass Boston
“Data Mining with Big Data”
Big Data concerns large-volume, complex, growing data sets with multiple, autonomous sources. In this talk, I will give an overview of our recent machine learning and data mining results in feature selection, distance metric learning, and least squares-based optimization with applications to NASA mission data analysis, extreme weather prediction, and physical activity analysis for children obesity.
October 7, 2014: Tam Nguyen, PhD, Assistant Professor, Boston College, Connell School of Nursing
“Application of Item Response Theory in the Development of Patient Reported Outcome Measures: An Overview”
The growing emphasis on patient-centered care has accelerated the demand for high quality data from patient reported outcome measures (i.e. quality of life, depression, physical functioning). Traditionally, the development and validation of these measures has been guided by Classical Test Theory. However, Item Response Theory, an alternate measurement framework, offers promise for addressing practical measurement problems found in health-related research that have been difficult to solve through Classical methods. This talk will introduce foundational concepts in Item Response Theory, as well as commonly used models and their assumptions. Example will be provided that exemplify typical applications of Item Response Theory. These examples will illustrate how Item Response Theory can be used to improve the development, refinement, and evaluation of patient reported outcome measures. Greater use of methods based on this framework can increase the accuracy and efficiency with which patient reported outcomes are measured.
September 16, 2014: Dr. Amresh Hanchate, Assistant Professor, Health Care Disparities Research Program, Boston University School of Medicine
“Did MA reform increase or decrease use of ED services? An Application of Difference-in-Differences Analysis”
This presentation will focus on difference-in-differences regression models as an approach to estimate causal relationships. Commonly applied in the context of “natural experiments”, I will examine its application to evaluate the impact of Massachusetts health reform on the use of emergency department services. Two previous studies applying this approach (Miller 2012 & Smulowitz 2014) obtained contrasting results, one finding an increase in ED use and the other a decrease, following the Massachusetts insurance expansion (2006-2007). I will report the findings of a comparative assessment of these contrasting results, based on side-by-side replication of the original analysis using similar data.
September 2, 2014: Stavroula Chrysanthopoulou, PhD, Department of Biostatistics, Brown University School of Public Health
“Statistical Methods in Microsimulation Modeling: Calibration and Predictive Accuracy”
This presentation is concerned with the statistical properties of MicroSimulation Models (MSMs) used in Medical Decision Making. The MIcrosimulation Lung Cancer (MILC) model, a new, streamlined MSM describing the natural history of lung cancer, has been used as a tool for the implementation and comparison of complex statistical techniques for calibrating and assessing the predictive accuracy of continuous time, dynamic MSMs. We present the main features of the MILC model along with the major findings and conclusions, as well as the challenges imposed from the implementation of the suggested statistical methods.
May 20, 2014: Craig Wells, Ph.D., Department of Educational Policy, Research and Administration, UMASS Amherst
"Applications of Item Response Theory"
“Item response theory (IRT) is a powerful, model-based technique for developing scales and assessments. Due to the attractive features of IRT models, it is the statistical engine that is used to develop many types of assessments. The purpose of the presentation will be to describe the fundamental concepts of IRT as well as its applications in a variety of contexts. The presentation will address the advantages of IRT over classical methods, describe popular IRT models and applications”
May 6, 2014: Jeffrey Brown, Ph.D., Department of Population Medicine, Harvard Medical School
“FDA's Mini-Sentinel Program to Evaluate the Safety of Marketed Medical Products”
“The Sentinel Initiative began in 2008 as a multi-year effort to create a national electronic system for monitoring the safety of FDA-regulated medical products (e.g., drug, biologics, vaccines, and devices). The Initiative is the FDA’s response to the Food and Drug Administration Amendments Act requirement that the FDA work develop a system to obtain information from existing electronic health care data from multiple sources to assess the safety of approved medical products. The Mini-Sentinel pilot is part of the Sentinel Initiative. Mini-Sentinel uses a distributed data approach in which data partners retain control over data in their possession obtained as part of normal care and reimbursement activities. Using this approach allows Mini-Sentinel queries to be executed behind the firewalls of data partners, with only summary level or minimum necessary information returned for analysis. The Mini-Sentinel network allows FDA to initiate hundreds of queries a year using across a network of 18 data partners and over 350 million person-years of electronic health data. These queries use privacy-preserving approaches that have greatly minimized the need to share protected health data. Mini-Sentinel analyses have been used to support several regulatory decisions, and Mini-Sentinel.”
April 15, 2014: Jessica Meyers Franklin, Ph.D., Department of Medicine, Division of Pharmacoepidemiology & Pharmacoeconomics, Harvard Medical School
"High-dimensional simulation for evaluating high-dimensional methods: Comparing high-dimensional propensity score versus lasso variable selection for confounding adjustment in a novel simulation framework"
“The high-dimensional propensity score (hdPS) algorithm has been shown to reduce bias in nonrandomized studies of treatments in administrative claims databases through empirical selection of confounders. Lasso regression provides an alternative confounder selection method and allows for direct modeling of the outcome in a high-dimensional covariate space through shrinkage of coefficient estimates. However, these methods have not been able to be compared, due to limitations in ordinary simulation techniques. In this talk, I will discuss a novel "plasmode" simulation framework that is better suited to evaluating methods in the context of a high-dimensional covariate space, and I will present a study in progress that uses this framework to compare the performance of hdPS to that of a lasso outcome regression model for reduction of confounding bias.” The Department of Quantitative Health Sciences and the Quantitative Methods Core will conduct monthly seminars to explore statistical issues of general interest.
Tuesday, April 1, 2014: Presented by: Michael Ash, Ph.D., Chair, Department of Economics, Professor of Economics and Public Policy, University of Massachusetts Amherst
"Critical Replication for Learning and Research"
“Critical replication asks students to replicate a published quantitative empirical paper and to extend the original study either by applying the same model and methods to new data or by applying new models or methods to the same data. Replication helps students come rapidly up to speed as practitioners. It also benefits the discipline by checking published work for accuracy and robustness. Extension gives a practical introduction to internal or external validity and can yield publishable results for students. I will discuss critical replication of three published papers: Growth in a Time of Debt (Reinhart and Rogoff 2010); Mortality, inequality and race in American cities and states (Deaton and Lubotsky 2003); and Stock markets, banks, and growth (Levine and Zervos 1998).”
Tuesday, March 18, 2014: Presented by: Balgobin Nandram, Ph.D., Professor, Mathematical Sciences, Worcester Polytechnic Institute
"A Bayesian Test of Independence for Sparse Contingency Tables of BMD and BMI"
“Interest is focused on a test of independence in contingency tables of body mass index (BMI) and bone mineral density (BMD) for small places. Techniques of small area estimation are implemented to borrow strength across U.S. counties using a hierarchical Bayesian model. For each county a pooled Bayesian test of independence of BMD and BMI is obtained. We use the Bayes factor to perform the test, and computation is performed using Monte Carlo integration via random samples rather than Gibbs samples. We show that our pooled Bayesian test is preferred over many competitors.”
Key Words: Bayes factor, Contingency tables, Cressie-Read test, Gibbs sampler, Monte Carlo integration, NHANES III, Power, Sensitivity analysis, Small area estimation.
Tuesday, March 4, 2014: Presented by: Krista Gile, Ph.D., Assistant Professor, Department of Mathematics and Statistics, University of Massachusetts
"Inference and Diagnostics for Respondent-Driven Sampling Data"
“Respondent-Driven Sampling is type of link-tracing network sampling used to study hard-to-reach populations. Beginning with a convenience sample, each person sampled is given 2-3 uniquely identified coupons to distribute to other members of the target population, making them eligible for enrollment in the study. This is effective at collecting large diverse samples from many populations.
Unfortunately, sampling is affected by many features of the network and sampling process. In this talk, we present advances in sample diagnostics for these features, as well as advances in inference adjusting for such features.
This talk includes joint work with Mark S. Handcock, Lisa G. Johnston and Matthew J. Salganik.”
Tuesday, February 18, 2014: Presented by: John Griffith, Ph.D., Associate Dean for Research, Bouve College of Health Sciences, Northeastern University
"Translating Science to Health Care: the Use of Predictive Models in Decision Making"
“Clinical predictive models take information about a patient or subject and synthesize it into a composite score that can then assist with decision making concerning treatment for the individual patient. To be useful, these tools need to accurately categorize the risk of events for patients and their use needs to positively impact treatment decisions and patient outcomes. Statistical approaches can be used for internal validation of these models. However, clinical trials are often needed to show treatment effectiveness. The issues that arise with the development, testing, and implementation of such models will be discussed.”
Tuesday, February 4, 2014: Presented by: Christopher Schmid, Ph.D., Professor of Biostatistics, Center for Evidence Based Medicine, Brown University School of Public Health
"N-of-1 Trials"
“N-of-1 trials are a promising tool to enhance clinical decision-making and patient outcomes. These trials are single-patient multiple-crossover studies for determining the relative comparative effectiveness of two or more treatments within each individual patient. Patient and clinician select treatments and outcomes of interest to them, carry out the trial, and then make a final treatment decision together based on results of the trial. This talk will discuss the advantages and challenges in conducting N-of-1 trials, along with some of the design and analytic considerations. A study to test the effectiveness of the N-of-1 trial as a clinical decision tool comparing patients randomized to N-of-1 vs. usual care is ongoing. The challenges of implementing the decision strategy in such a context will be discussed.”
Tuesday, January 21, 2014: Presented by:David MacKinnon, Ph.D., Professor, Arizona State University, Author of "Introduction to Statistical Mediation Analysis"
"Mediation Analysis"
Learning Objective: Understanding and Running Mediation Analyses
Bring in your laptops and run step-wise mediation analyses with the speaker using SAS and free Mplus demo program.
2013 Archives
Tuesday, December 3, 2013: Presented by: Erin M. Conlon, Ph.D., Associate Professor, Department of Mathematics and Statistics Lederle Graduate Research, University of Massachusetts, Amherst
"Bayesian Meta-Analysis Models for Gene Expression Studies"
Biologists often conduct multiple independent gene expression studies that all target the same biological system or pathway. Pooling information across studies can help more accurately identify true target genes. Here, we introduce a Bayesian hierarchical model to combine gene expression data across studies to identify differentially expressed genes. Each study has several sources of variation, i.e. replicate slides within repeated experiments. Our model produces the gene-specific posterior probability of differential expression, which is the basis for inference. We further develop the models to identify up- and down-regulated genes separately, and by including gene dependence information. We evaluate the models using both simulation data and biological data for the model organisms Bacillus subtilis and Geobacter sulfurreducens.
Tuesday, November 19, 2013: Presented by: Jing Qian, Ph.D., Assistant Professor of Biostatistics, Division of Biostatistics and Epidemiology, School of Public Health and Health Sciences, University of Massachusetts, Amherst
"Statistical Methods for Analyzing Censored Medical Cost and Sojourn Time in Progressive Disease Process"
To conduct comprehensive evaluation in clinical studies for chronic diseases like cancer, features of the disease process, such as lifetime medical cost and sojourn time in progressive disease process, are often assessed in addition to the overall survival time. However, statistical analysis of these features is challenged by dependent censoring and identifiability issue, arising from the incomplete follow-up data in clinical studies. In this talk, I will first present a semiparametric regression model for analyzing censored lifetime medical cost, which can be used to address cost difference between different treatments in the motivating example of a lung cancer clinical trial. Next, I will discuss how to use the similar inference approach to estimate sojourn time in progressive disease process, motivated by a colon cancer study where patients progress through cancer-free and cancer-recurrence states. Inference procedures and simulation studies will be described. The methods will be illustrated through a lung cancer and a colon cancer clinical trials.
Thursday, November 7, 2013: Presented by: Bei-Hung Chang, Sc.D., Associate Professor, Boston University School of Public Health, VA Boston Healthcare System
"Mind and Body Medicine Research: Study Design and Statistical Method Demonstrations"
The nature of mind /body practices, such as meditation and acupuncture, poses a challenge for evaluating the intervention effect. Blinding, randomization, control group selection, and placebo effects are among the list of these challenges. This talk will present two studies that employed innovative study designs to overcome these challenges in investigating the health effect of acupuncture and the relaxation response/meditation. The use of statistical methods including a 2-slope regression model and mixed effects regression models in the studies will also be demonstrated.
Tuesday, October 15, 2013: Presented by: Laura Forsberg White, Ph.D., Associate Professor, Department of Biostatistics, Boston University School of Public Health
"Characterizing Infectious Disease Outbreaks: Traditional and Novel Approaches"
Infectious disease outbreaks continue to be a significant public health concern. Quantitative methods for characterizing an outbreak rapidly are of great interest in order to mount an appropriate and effective response. In this talk, I will review some traditional approaches to doing this and discuss more recent work. In particular, this talk will focus on methods for quantifying the spread of an illness through estimation of the reproductive number. We will also briefly discuss methods to determine the severity of an outbreak through estimation of the case fatality ratio and attack rate. Applications of this work to the 2009 Influenza A H1N1 outbreak will be discussed. We will also discuss methods to estimate heterogeneity in the reproductive number
Tuesday, October 1, 2013: Presented by Molin Wang, Ph.D., Assistant Professor, Department of Medicine, Harvard Medical School, Departments of Biostatistics and Epidemiology, Harvard School of Public Health
"Statistical Methods and SAS Macros for Disease Heterogeneity Analysis"
Epidemiologic research typically investigates the associations between exposures and the risk of a disease, in which the disease of interest is treated as a single outcome. However, many human diseases, including colon cancer, type II diabetes mellitus and myocardial infarction, are comprised of a range of heterogeneous molecular and pathologic processes, likely reflecting the influences of diverse exposures. The approach, which incorporates data on the molecular and pathologic features of a disease directly into epidemiologic studies, Molecular Pathological Epidemiology, has been proposed to better identify causal factors and better understand how potential etiologic factors influence disease development. In this talk, I will present statistical methods for evaluating whether the effect of a potential risk factor varies by subtypes of the disease, in cohort studies, case-control studies and case-case study designs. Efficiency of the tests will also be discussed. SAS macros will be presented to implement these methods. The macros test overall heterogeneity through the common effect test (i.e., the null hypothesis is that all of the effects of exposure on the different subtypes are the same) as well as pair-wise differences in exposure effects. In adjusting for confounding, the effects are allowed to vary for the different subtypes or they can be assumed to be the same across the different subtypes. To illustrate the methods, we evaluate the effect of alcohol intake on LINE-1 methylation subtypes of colon cancer in the Health Professionals Follow-up Study, where 51,529 men have been followed since 1986 during which time 268 cases of colon cancer have occurred. Results are presented for all 3 possible study designs for comparison purposes. This is a joint work with Aya Kuchiba and Donna Spiegelman.
Tuesday, September 17, 2013: Presented by Zheyang Wu, Ph.D., Assistant Professor, Department of Mathematical Sciences, Worcester Polytechnic Institute, Worcester, MA.
"Genetic Effects and Statistical Power of Gene Hunting Using GWAS and Sequence Data"
Genome-wide association studies (GWAS) use high-density genotyping platforms to reveal single-nucleotide and copy number variants over whole genome for gene hunting. Although many significant genetic factors have been identified, genes discovered so far account for a relatively small proportion of genetic contribution to most complex traits, the so-called “missing heritability”. A key statistical research to champion the discovery of novel disease genes is to reveal the capacity of association-based detection strategies and design optimal methods. We study this problem from the view of statistical signal detection for high-dimensional data, while considering three major features of those unfound genetic factors: weak effects of association, sparse signals among all genotyped variants, and complex correlations and gene-gene interactions. In this talk, I will discuss two relevant results. First, we address how gene-gene interaction and linkage disequilibrium among variants influence the capacity of model selection strategies for searching and testing genes. In particular, we developed a novel power calculation framework for model selection strategies to pick up proper signals of disease genes. Second, the requirement for signal strength in gene detection could be reduced when we target on the detection of groups of signals, instead of on individual signals. Specifically, we established a theory of detection boundary, which clarifies the limit of statistical analysis: genetic effects below the boundary are simply too rare and weak to be reliably detected by any statistical methods. Meanwhile, we developed optimal tests that work for these minimally detectable signals. These results are also applicable in designing statistical association tests for detecting rare variants in exome or whole-genome sequence data analysis.
Tuesday, September 3, 2013: Presented by Raji Balasubramanian, Sc.D., Assistant Professor of Biostatistics, Division of Biostatistics and Epidemiology, UMass Amherst
Variable importance in matched case control studies in settings of high dimensional data
In this talk, I’ll describe a method for assessing variable importance in matched case-control investigations and other highly stratified studies characterized by high dimensional data (p >> n). The proposed methods are motivated by a cardiovascular disease systems biology study involved matched cases and controls. In simulated and real datasets, we show that the proposed algorithm performs better than a conventional univariate method (conditional logistic regression) and a popular multivariable algorithm (Random Forests) that does not take the matching into account.
This is joint work with E. Andres Houseman (Oregon State University), Rebecca A. Betensky (Harvard School of Public Health) and Brent A. Coull (Harvard School of Public Health).
Tuesday, May 21, 2013: Presented by Alexander Turching, MD, MS, Director of informatics Research, Department of Endocrinology, Diabetes and Hypertension Harvard Medical School
Using Electronic Medical Records Data for Clinical Research: Experience and Practical Implications
Electronic medical records (EMR) systems represent a rich source of clinical data that can be utilized for research, quality assurance, and pay-for-performance, among others. However, it is important to recognize that, like any other data source, EMR data has its own pitfalls that need to be approached in a rigorous fashion. In particular, a large fraction of data in EMR is “locked” in narrative documents and can therefore be especially challenging to extract. This presentation will discuss common flaws in EMR data with a special focus on a systematic approach to using data from narrative electronic documents. The discussion will be illustrated by specific examples of clinical research using EMR data, including narrative text.
Learning Objectives:
1. To understand limitations and caveats of EMR data
2. To learn how to approach development of NLP algorithms
3. To learn how to evaluate NLP algorithms
Tuesday, May 7, 2013: Presented by Tingjian Ge, PhD, Assistant Professor, Department of Computer Science, UMass Lowell
How Recent Data Management and Mining Research can Benefit Biomedical Sciences
Data management (a.k.a. databases, traditionally) and data mining have been active research topics in Computer Science since the 1960s, both in academia and in the research and development groups of companies (for example IBM Research). In recent years we have seen a surge in this research due to the “big data” trend. On the other hand, various areas in the biomedical sciences are producing increasingly large amount of data due to the prevalence of automatic data-generating devices. It is natural to consider what some of the most recent results from data management and mining can do for the state-of-the-art biomedical research and practice.
In this talk, I will discuss the potential applications of my research in data management and mining to various biomedical studies. They include: (1) complex event detection over correlated and noisy time series data, such as ECG monitoring signals and real-time dietary logs; (2) ranking and pooled analysis of noisy and conflicting data, such as microarray results and emergency medical responses in disaster scenes (e.g., terrorist attacks or earthquakes); and (3) association rule mining on mixed categorical and numerical data, such as the dietary logs, for food recommendation and weight control.
Tuesday, April 16, 2013: Presented by Jeffrey Bailey, MD, PhD,
Computational Approaches for Analyzing Copy Number Variation and Standing Segmental Duplication
Segmental duplication represents the key route for the evolution of new genes within an organism. An regions of duplication are often copy number variant providing increased functional diversity. Detecting regions of duplication and copy number variation is still a challenge even with hihg-throughput sequencing. The lecture will review the key methods for identifying duplicated sequence and copy number variant regions within genomic sequence and provide an overview of our laboratory's ongoing work to detect, type and correlate such regions with phenotype particularly vis-a-via malaria.
Tuesday, April 2, 2013: Presented by Becky Briesacher, PhD
"Offsetting Effects of Medicare Part D on Health Outcomes and Hospitalization?"
This presentation will cover a Medicare Part D policy evaluation and the novel use of time-series and bootstrapping methods. My early results challenge the assumption of the US Congressional Budget Office that Medicare prescription drug costs are offset by medical service savings. I will also describe how we used Pre-Part D data to create simulated post-Part D outcomes. Confidence intervals were constructed using bootstrapping and the test for differences was based on the proportion of simulated values that exceeded/fell below the observed value.
Tuesday, March 5, 2013: Presented by David Hoaglin, PhD, Professor, Biostatistics and Health Services Research
"Regressions Gone Wrong: Why Many Reports of Regression Analyses Mislead"
Regression methods play an important role in many analyses: multiple regression, logistic regression, survival models, longitudinal analysis. Surprisingly, many articles and books describe certain results of such analyses in ways that lead readers astray. The talk will examine reasons for these problems and suggest remedies.
February 19, 2013: Presented by Wenjun Li, PhD, Associate Professor, Preventative and Behavioral Medicine
Use of Small Area Health Statistics to Inform and Evaluate Community Health Promotion Programs
This presentation discusses the application of small area estimation methods to identify priority communities for public health intervention programs, to tailor community-specific intervention strategies, and to evaluate the effectiveness at the community level.
2012 Events
December 4, 2012: Presented by Thomas Houston, MD, MPH Professor and Chief
Comparative Effectiveness Research (CER) Seminar Series -- Pragmatic Clinical Trials (PCT II) (following Bruce Barton's PCT 1 on Sept. 18)
Dr. Houston will describe a series of cluster-randomized trials where they have used the Internet and informatics to support Interventions for providers and patients. He will also review the PRECIS tool, a way to characterize your pragmatic trials, and the stages of implementation complete (SIC measure) a time-and-milestone-based method to assess success in implementation.
November 20, 2012: Presented by Jennifer Tjia, MD, MSCE, Associate Professor of Medicine
Pharmacoepidemiologic Approaches to Evaluate Outcomes of Medication Discontinuation
The self-controlled case series method, or case series method for short, can be used to study the association between an acute event and a transient exposure using data only on cases; no separate controls are needed. The method uses exposure histories that are retrospectively ascertained in cases to estimate the relative incidence. That is, the incidences of events within risk periods—windows of time during or after experiencing the exposure when people are hypothesized to be at greater risk—relative to the incidences of events within control periods, which includes all time before the case experienced the exposure and after the risk has returned to the baseline value. For many researchers, the main appeal of the self-controlled case series method is the implicit control of fixed confounders. We will discuss the application of this method in pharmacoepidemiologic outcomes studies, and explore the idea of whether this approach offers advantages over more conventional cohort studies when evaluating adverse drug withdrawal events following medication discontinuation. We will use examples from a linked Medicare Part D and Minimum Data Set database to facilitate discussion.
November 6, 2012: Presented by Molin Wang, PhD, Harvard University
Latency Analysis under the Cox Model when the effect may change over time
We consider estimation and inference for latency in the Cox proportional hazard model framework, where time to event is the outcome. In many public health settings, it is of interest to assess whether exposure effects are subject to a latency period, where the risk of developing disease depending on the exposure level varies over time, perhaps affecting risk only during times near the occurrence of the outcome, or perhaps affecting risk only during times preceding a lag of some duration. Identification of the latency period, if any, is an important aspect of assessing risks of environmental and occupational exposures. For example, in air pollution epidemiology, of interest is often not only the effect of the m-year moving cumulative average air pollution level on risk of all cause mortality, but also point and interval estimation of m itself. In this talk, we will focus on methods for point and interval estimation of the latency period under several models for the timing of exposure which have previously appeared in the epidemiologic literature. Computational methods will be discussed. The method will be illustrated in the study of the timing of the effects of constituents of air pollution on mortality in the Nurses’ Health Study.
October 16, 2012: Presented by Dr. Sherry Pagoto and Deepk Ganesan, PhD
mHealth-based Behavioral Sensing and Interventions
This presentation will review mHealth and sensing research and methodologies at the UMass Amherst and UMass Chan Medical School campuses. We will discuss ongoing research in mobile and on-body sensing to obtain pysiological data in the field, and to design a toolkit for processing such data to derive high quality features, deal with data quality issues (e.g. loose sensors, missing data), and leverage diverse sensor modalities to improve inference quality. To demonstrate the methodologies, we will discuss a recently funded pilot project in which mobile and sensing technology will be used to assess and predict physiological and environmental factors that impact eating behavior. Once eating behavior is predictable with accuracy, interventions will be delivered via technology at the precise moments when individuals are the most likely to overeat. The purpose of this research is to improve the impact of behavioral weight loss interventions.
October 2, 2012: Presented by Amy Rosen, PhD
Assessing the Validity of the Agency of Healthcare Research and Quality (AHRQ) Patient Safety Indicators (PSIs) in the VA
This presentation will review general patient safety concepts and ways in which patient safety events are identified. Background on the PSIs will be provided, and a recent multi-faceted validation study that was conducted in the VA to examine both the criterion and attributional validity of the indicators will be presented. Two questions will be specifically addressed: 1) Do the PSIs Accurately Identify True Safety Events? 2) Are PSI rates associated with structures/processes of care?
September 18, 2012: Presented by Bruce Barton, PhD
Pragmatic Clinical Trials: Different Strokes for Different Folks
Pragmatic clinical trials (PCTs) are relatively new on the clinical research scene and are being proposed routinely for NIH funding. In a sense, PCTs are comparative effectiveness studies on steroids! This presentation will discuss the concepts behind this new breed of clinical trial, how PCTs differ from the usual randomized clinical trial, and what to be careful of when developing one. We will review two PCTs as case studies to look at different approaches to the study design. The references are two of the more recent papers on PCT methodology and approaches.
July 17, 2012: Presented by Dianne Finkelstein, PhD; Mass General Hospital and Harvard School of Public Health, Boston, MA
Developing Biostatistics Resources at an Academic Health Center.
Although biostatistics plays an important role in health-related research, biostatistics resources are often fragmented, or ad hoc, or oversubscribed within Academic Health Centers (AHCs). Given the increasing complexity and quantity of health-related data, the emphasis on accelerating clinical and translational science, and the importance of reproducible research, there is need for the thoughtful development of biostatistics resources with AHCs. I will be reporting on a recent collaboration of CTSA biostatisticians who identified strategies for developing biostatistics resources in three areas: (1) recruiting and retaining biostatisticians; (2) using biostatistics resources efficiently; and (3) improving science through biostatistics collaborations. Ultimately, it was recommended that AHCs centralize biostatistics resources in a unit rather than disperse them across clinical departments, as the former offers distinct advantages to investigator collaborators, biostatisticians, and ultimately to the success of the research and education missions of AHCs.
May 15, 2012: Presented by George Reed, PhD
Modeling disease states using Markov models with covariate dependence and time varying intervals.
An example of modeling transitions among multiple disease states where measurements are not made at fixed and equal time intervals and the primary interest is in factors associated with the transition probabilities. Both first order and higher order Markov models are considered.
May 1, 2012: Presented by Becky Briesacher, PhD
"Medicare Prescription Drug program and Using Part D Data for Research"
In 2006, the Medicare program began offering coverage for prescription drugs, and as of June 2008, Part D data have been available to researchers. This presentation will briefly introduce the audience to the Medicare Part D program and Part D data for research purposes. The presentation will include personal reflections on becoming a drug policy researcher and excerpts from my own program evaluation research.
April 17, 2012: Presented by David Hoaglin, PhD
"Indirect Treatment Comparisons and Network Meta-Analysis: Relative Efficacy and a Basis for Comparative Effectiveness"
Evidence on the relative efficacy of two treatments may come from sets of trials that compared them directly (head to head); but often one must rely on indirect evidence, from trials that studied them separately with a common comparator (e.g., placebo) or from a connected network of treatments. The talk will review basic meta-analysis, discuss steps and assumptions in network meta-analysis, and comment on applications to comparative effectiveness
- ISPOR States Its Position on Network Meta-Analysis
- Conducting Indirect-Treatment-Comparison and Network-Meta-Analysis Hoaglin 2011 ViH
- Appendix: Examples of Bayesian Network Hoaglin Appen 2011
- Jansen 2011 ViH
- Luce 2010 Millbank
March 20, 2012: Presented by Thomas English,PhD
"Using Allscripts Data at UMass for Clinical Research"
I will discuss work that I have done that has been enabled by EHRs. This should give an idea of how the current EHR at UMass could help your research.
February 28, 2012: Presented by Nancy Baxter, MD, PhD, FRCSC, FACRS
"Room for Improvement in Quality Improvement"
In most circumstances in clinical medicine randomized clinical proving efficacy are required before widespread adoption of interventions. However in the area of quality improvement many strategies have been implemented with little supporting evidence. Why is this, and why worry? These are topics that will be explored in my presentation.
February 21, 2012: Presented by Stephen Baker, MScPH
"Sequentially Rejective Procedures for Multiple Comparisons in Genome Wide Association Studies (GWAS)"
The problem of additive type I error due to multiple comparisons has been well known for many years, however with the introduction of microarrays and other technologies it has become one of the central problems in data analysis in molecular biology. Sequential testing procedures have been popular but have limitations with these new technologies. I will discuss some popular methods, some new ones and illustrate them with microarray data for associating gene expression with disease status.
February 7, 2012: Presented by Arlene Ash, PhD
Risk Adjustment Matters
What variables should be included, and how, in models designed to either detect differences in quality among providers with very different “case-mix” or to isolate the effect of some patient characteristic on outcome? What role does the purpose of the modeling effort play? What are the consequences of different modeling choices? What does “do no harm” mean for a statistical analyst?
- Should Health Plan Quality Measures be Adjusted for Case Mix? by Romano P.S.
- Integrating Research on Racial and Ethnic Disparities in Health Care Over Place and Time. by Zaslavsky A.M. and Ayanian J.Z.
- Statistical Issues in Assessing Hospital Performance by Ash A.S. et al
- Disparities Casemix handout
- Why Include Hospital Characteristics
- General description of the classifications:
Technical LevelIntroductoryFocus Application Data None Methods Conceptual 
January 17, 2012: Presented by Zhiping Weng
Computational Identification of Transposon Movement With Whole Genome Sequencing
Transposons evolve rapidly and can mobilize and trigger genetic instability. In Drosophila melanogaster, paternally inherited transposons can escape silencing and trigger a hybrid sterility syndrome termed hybrid dysgenesis. We developed computational methods to identify transposon movement in the host genome and uncover heritable changes in genome structure that appear to enhance transposon silencing during the recovery to hybrid dysgenesis.
- Adaptation to P Element Transposon Invasion in Drosophila Melanogaster. By Khurana JS et al
2011 Events
December 20, 2011: Presented by Jacob Gagnon, PhD
Gene Set Analysis Applied to a Leukemia Data Set
Gene set analysis allows us to determine which groups of genes are differentially expressed when comparing two subtypes of a given disease. We propose a logistic kernel machine approach to determine the gene set differences between B-cell and T-cell Acute Lymphocytic Leukemia (ALL). Compared to previous work, our method has some key advantages: 1) our hypothesis testing is self-contained rather than being competitive, 2) we can model gene-gene interactions and complex pathway effects, and 3) we test for differential expression adjusting for clinical covariates. Results from simulation studies and from an application of our methods to an ALL dataset will be discussed.
- "Gene Set Enrichment Analysis: A Knowledge-based Approach for Interpreting Genome-Wide Expression Profiles" by Subramanian A, et al.
- "Identification of Differentially Expressed Gene Categories in Microarray Studies Using Nonparametric Multivariate Analysis" by Nettleton D, et al.
December 14, 2011: Presented by Yunsheng Ma, MD, PhD
Determinants of Racial/Ethnic Disparities in Incidence of Clinical Diabetes in Postmenopausal Women in the United States: The Women’s Health Initiative 1993- 2009
Although racial/ethnic disparities in diabetes risk have been identified, determinants of these differences have not been well-studied. Previous studies have considered dietary and lifestyle factors individually, but few studies have considered these factors in aggregate in order to estimate the proportion of diabetes that might be avoided by adopting a pattern of low-risk behaviors. Using data from the Women’s Health Initiative, we examined determinants of racial/ethnic differences in diabetes incidence.
- This paper, “Diet, lifestyle, and the risk of type 2 diabetes mellitus in women", by Hu et al., presented ways to analyze diabetes risk factors in aggregate in order to estimate the proportion of diabetes that might be avoided by adopting a pattern of low-risk behaviors.
Technical LevelIntermediateFocus Application Data Nurses’ Health Study Methods Cox proportional hazards models 
November 9, 2011: Presented by: Nanyin Zhang, Ph.D.
In the presentation I will Introduce the fundamental mechanisms of fMRI. I will also talk about potential applications of fMRI in understanding different mental disorders.
- Article #1 (Functional Connectivity and Brain Networks in Schizophrenia), by Mary-Ellen Lynall et. al., tested the hypothesis that Schizophrenia is a disorder of connectivity between components of large-scale brain networks by measuring aspects of both functional connectivity and functional network topology derived from resting-state fMRI time series acquired at 72 cerebral regions over 17 min from 15 healthy volunteers (14 male, 1 female) and 12 people diagnosed with schizophrenia (10 male, 2 female).
Technical LevelIntermediateFocus Application Data Real Methods Proof 
- Article #2 (Hyperactivity and hyperconnectivity of the default network in schizophrenia and in first-degree relatives of persons with schizophrenia), by Susan Whitfield-Gabrieli, examined the status of the neural network mediating the default mode of brain function in patients in the early phase of schizophrenia and in young first-degree relatives of persons with schizophrenia.
Technical LevelIntermediateFocus Application Data Real Methods Proof 
October 18, 2011: Presented by: Bruce A. Barton, Ph.D.
The Continuing Evolution of Randomized Clinical Trials – the Next Steps: Continuing the discussion initiated by Wenjun Li, Ph.D., at the April QHS/QMC Methods Workshop (“Role of Probability Sampling in Clinical and Population Health Research”), this workshop will discuss some proposed designs for randomized clinical trials (RCTs) which provide partial answers to some of the problems with the current design of RCTs – as well as possible next evolutionary steps in RCT design to better address the primary issues of patient heterogeneity and of generalizability of results.
- Tunis SR, Stryer DB, Clancy CM. Practical Clinical Trials. Increasing the Value of Clinical Research for Decision Making in Clinical and Health Policy. JAMA 2003;290:1624-1632.
- Peto R, Baigent C. Trials: The Next 50 Years. Large Scale Randomized Evidence of Moderate Benefits. BMJ 1998;317:1170-1171.
- Slides from Dr. Barton's presentation
September 20, 2011: Presented by Zi Zhang, MD, MPH
Using Address-Based Sampling (ABS) to Conduct Survey Research -
The Traditional random-digital-dial (RDD) approach for telephone surveys has become more problematic due to landline erosion and coverage bias. Dual-sample frame method employing both landlines and cell phones is costly and complicated. We will discuss the use of the U.S. Postal Service Deliver Sequence File as an alternative sampling source in survey research. We will focus on sample coverage and response rate in reviewing this emerging approach.
- This article "A comparison of address-based sampling (ABS) versus random-digit dialing (RDD) for general population surveys" is authored by Link M, Battaglia MP, Frankel MR, Osborn L, Mokdad AH.
 * General description of the classifications:Technical LevelIntermediateFocus Application Data Real Methods Proof 
- This article "Using address-based sampling to survey the general public by mail vs. 'web plus mail'" is authored by  Messer BL, Dillman DA.
 * General description of the classifications:Technical LevelIntermediateFocus Application Data Real Methods Proof 
July 19, 2011: Presented by: Jennifer Tjia, MD, MSCE:
Addressing the issue of channeling bias in observational drug studies
Channeling occurs when drug therapies with similar indications are preferentially prescribed to groups of patients with varying baseline prognoses. In this session, we will discuss the phenomenon of channeling using a specific example from the Worcester Heart Attack Study.
- This article,"Channeling Bias in the Interpretation of Drug Effects" is authored by H. Petri and J. Urquhart
Technical LevelIntermediateFocus Application Data Real Methods Case Study 
June 21, 2011: Presented by Mark Glickman, PhD:
Multiple Testing: Is Slicing Significance Levels Producing Statistical Bologna?
Procedures for adjusting significance levels when performing many hypothesis tests are commonplace in health/medical studies. Such procedures, most notably the Bonferroni adjustment, control for study-wide false positive rates, and recognize that the probability of a single false positive result increases with the number of tests. In this talk we establish, in contrast to common wisdom, that significance level adjustments based on the number of tests performed are, in fact, unreasonable procedures, and lead to absurd conclusions if applied consistently. We argue that confusion may exist between an increased number of tests being performed with a low (prior) probability of each null hypothesis being true. This confusion may lead to the unwarranted multiplicity adjustment. We finally demonstrate how false discovery rate adjustments are a more principled approach to significance level adjustments in health and medical studies.
- This article,"Colloquy: Should Familywise Alpha Be Adjusted?" is authored by Daniel J. O'Keefe
Technical LevelIntroductoryFocus Theory Data None Methods Conceptual 
- This article, "Implementing false discovery rate control: increasing you power", is authored by Verhoeven, K.J.F., Simonsen, K.L. and McIntyre, L.M.
Technical LevelIntermediateFocus Theory Data Simulated Methods Conceptual 
April 19, 2011: Presented by: Wenjun Li, PhD:
- This article, "The MOBILIZE Boston Study: Design and methods of a prospective cohort study of novel risk factors for falls in an older population" by Leveille SG et al. is an example of application of probably sampling in a cohort study to ensure the representativeness of the cohort in relation to its underlying population.
Technical LevelIntermediateFocus Research Idea Data No Methods RCT/Cohort Study 
- This article, "Step Ahead - A Worksite Obesity Prevention Trial Among Hospital Employees" by Lemon S et al. is an example application of sampling in a RCT study that was carried out at UMass Chan.
Technical LevelIntermediateFocus Research Idea Data No Methods RCT/Cohort Study 
Role of Probability Sampling in Clinical and Population Health Research
This workshop uses practical examples to illustrate the use of probability sampling of RCT and population health studies. The approach is used to optimize the generalizability of, increase statistical power and add values to the collected data by preserving the possibility of sub-group analysis.
- This article, "The role of sampling in clinical trial design" explains why probability sampling of study subjects is needed in RCT.
Technical LevelIntroductoryFocus Research Idea Data No Methods RCT/Cohort Study 
March 15, 2011:
The Peters-Belson Approach to study Health Disparities: Application to the National Health Interview Survey
This workshop will discuss cancer screening rates varyingly substantially by race/ethnicity, and identifying factors that contribute to this disparity between the minority groups and the white majority should aid in designing successful programs. The traditional approach for examining the role of race/ethnicity is to include a categorical variable, indicating minority status, in a regression-type model, whose coefficient estimates this effect. We applied the Peters- Belson(PB) approach, used in wage discrimination studies, to analyze disparities in cancer screening rates between different race/ethnic groups from the 1998 National Health Interview Survey (NHIS), and to decompose the difference into a component due to differences in the covariate values in the two groups and a residual difference. Regression model was estimated accounting for the complex sample design. Variances were estimated by the jackknife method where a single primary sampling unit was considered as the deleted group and compared to analytic variances derived from Taylor linearization. We found that among both men and women, most of the disparity in colorectal cancer screening and digital rectal exam rates between whites and blacks was explained by the covariates but the same was not true for the disparity between whites and Hispanics.
- This article, "Understanding the Factors Underlying Disparities in Cancer Screening Rates Using the Peters-Belson Approach", is authored by Rao,Sowmya, Graubard BI, Breen, Nancy and Gastwirth, Joseph.
Technical LevelIntroductoryFocus Application Data Real (NHIS, 1998) Methods Case Study 
- This article, "Using the Peters-Belson method to measure health care disparities from complex survey data", is by Graubard BI, Rao,Sowmya, and Gastwirth, Joseph.
Technical LevelIntermediateFocus Theory and Application Data Real (NHIS, 1998) Methods Proof 
Dr Rao also would like to suggest a book for anyone who wants to analyze national surveys. This is "Analysis of Health Surveys" by Korn EL and Graubard BI. It was published by Wiley, New York, NY in 1999.
February 15, 2011:
Multivariable Modeling Strategies: Uses and Abuses
This workshop will be hosted by George Reed, PhD and will discuss regression modeling strategies including predictor complexity and variable selection. The workshop will examine the flaws and uses of methods like stepwise procedures, and discuss how modeling strategies should be tailored to particular problems.
- This article, " Prognostic modelling with logistic regression analysis: a comparison of selection and estimation methods in small data sets ", is authored by Steyerbert, Ewout, Eijkemans, Marinus, Harrell Jr, Frank, and Habbema, J. Dik.
Technical LevelIntermediateFocus Application Data Real Methods Case Study 
- This article, "Selection of important variables and determination of functional form for continuous predictors in multivariable model building", is by Sauerbrei, Willi, Royston, Patrick and Binder, Harald.
Technical LevelIntermediateFocus Application Data Real Methods Case Study 
Dr Reed would also like to recommend Chapter 4 from the book, "Frank Harrell's regression modeling strategies."
“REGRESSION MODELING STRATEGIES: Chapter 4: ‘Multivariable Modeling Strategies’ by Frank E. Harrell, Jr. Copyright 2001 by Springer. Reprinted by permission of Springer via the Copyright Clearance Center’s Annual Academic Copyright License.”
November 16, 2010:
Bootstrapping: A Nonparametric Approach to Statistical Inference
This workshop will discuss analytic approaches to situations where the sampling distribution of a variable is not known and cannot be assumed to be normal. Bootstrap resampling is a feasible alternative to conventional nonparametric statistics and can also be used to estimate the power of a comparison.
- This article, “Advanced Statistics: Bootstrapping Confidence Intervals for Statistics with “Difficult” Distribution”, by Haukoos et al., provides an overview of Bootstrap.
Technical LevelIntroductoryFocus Application Data None Methods Conceptual 
- This article, “Using Permutation Tests and Bootstrap Confidence Limits to Analyze Repeated Events Data from Clinical Trials”, by Freedman et al., presents the bootstrap approach to clinical trial studies.
Technical LevelIntermediateFocus Application Data Real Methods Case Study 
- This paper,“Confidence Intervals for Cost-effectiveness Rations: A Comparison of Four Methods”, by Polsky et al., “evaluated four methods for computing confidence intervals for cost–effectiveness ratios developed from randomized controlled trials” including the nonparametric bootstrap method. Their findings show that “overall probabilities of miscoverage for the nonparametric bootstrap method and the Fieller theorem method were more accurate than those for the other the methods”
Technical LevelIntroductoryFocus Application Data Simulated Methods Simulation 
October 19, 2010:
Propensity Score Analyses, Part II
Last month's workshop spent a lot of time on the propensity score (PS) "basics" and ended with a rather hurried discussion of what variables do and don't belong in a PS model. This month we will address a range of more advanced issues, including the previously promised discussion of why and when it may not be a good idea to include "all available" variables in a PS analysis, and the pros and cons of PS matching vs. weighting vs. covariate adjustment.
- Toolkit for Weighting and Analysis of Nonequivalent Groups (Twang)", offers functions for propensity score estimating and weighting, nonresponse weighting, and diagnosis of the weights.  Related references are attached to these functions.
Technical LevelIntroductoryFocus Application Data None Methods Conceptual 
- The article, "MatchIt: Nonparametric Preprocessing for Parametric Causal Inference ", by Ho, Imai, King & Stuart.
Technical LevelIntroductoryFocus Application Data None Methods Conceptual 
- This paper, "Matching as Nonparametric Preprocessing for Reducing Model Dependence in Parametric Causal Inference", by Ho, Imai, King & Stuart, et al., proposes a unified approach that makes it possible for researchers to preprocess data with matching and "then to apply the best parametric techniques".  "This procedure makes parametric models produe more accurate and considerably less model-dependent causal inferences."
Technical LevelIntermediateFocus Theory Data Real Methods Case Study 
September 21, 2010:
Propensity Score Analyses, Part I
This meeting will discuss the separate roles of propensity scores and instrumental variables. Time permitting, we will explore implementation issues in constructing propensity score models.
- Analyzing Observational Data: Focus on Propensity Scores (Powerpoint presentation by Arlene Ash, PhD)
- This draft article, Observational Studies in Cardiology , by Marcus et al provides a fairly straightforward, non-technical "review of three statistical approaches for addressing selection bias: propensity score matching, instrumental variables, and sensitivity analyses. There are many other places where such issues are discussed.
Technical LevelIntroductoryFocus Application Data Real Methods Case Study 
- This paper, "Variable Selection for Propensity Score Models", by Brookhart et al., presented "the results of two simulation studies designed to help epidemiologists gain insight into the variable selection Problem" in a propensity score analysis.
Technical LevelIntermediateFocus Theory Data Simulated Methods Simulation 
