Statistical Workshops
The Center for Statistical Computing (CSC) welcomes all graduate students, staff, and faculty to participate in our statistics workshops. These sessions are held on Zoom and/or in Green Labs on the upper level (UL) of Healey Library. Our workshops cover using statistical software such as SPSS, SAS, Stata, Excel, R, RStudio, and Python. We also provide topics in applied statistics encompassing recently developed statistical methods, utilizing tools such as SPSS, SAS, Stata, R, AMOS, Mplus, and WinBUGS. Descriptions for each workshop are provided below:
Summer 2024 Statistics Workshop Schedule
Statistical Workshops (2024 Summer)
Statistics Workshop Descriptions
This is a hands-on workshop designed to empower attendees with the skills to conduct meaningful data analysis using SPSS for Windows. Topics covered include entering and reading data, documenting variable and value labels, examining frequency and crosstab tables for individual and group data, recoding variables, performing independent sample t-tests, and conducting simple linear regression.
This workshop delves into advanced data management and statistical procedures, encompassing case selection, combining cases from two files, and linking files with diverse information. Statistical procedures covered include the chi-square test, one-way ANOVA, repeated measurement analysis, non-parametric statistics, multiple regression, and logistic regression.
This workshop emphasizes conducting fundamental statistical analyses, including descriptive statistics, frequency distributions, Chi-square tests, independent sample t-tests, one-way ANOVA, and linear and logistic regressions. Additional topics cover downloading and installing R packages, reading and writing data files, and creating R graphs. Notably, R is a free, open-source software supported by a strong user community.
This workshop serves as an introduction to Stata, encompassing both the graphic user interface and intuitive command syntax approaches. It aims to efficiently teach fundamental Stata operations. Topics covered include browsing data, data management, descriptive statistics, independent samples t-tests, and simple linear regression models.
This workshop delves into advanced data management topics, including data transformation, recoding variables, and constructing new variables. Additionally, it covers the use of log files, do files, and explores further statistical procedures such as the Chi-square test, one-way ANOVA, and multiple linear regression, along with regression diagnostics and logistic regression.
The workshop provides valuable tips for enhancing efficiency in data analysis with Excel. Topics covered include entering data, organizing data and performing descriptive statistics, examining frequencies and crosstab tables, conducting independent and paired sample t-tests, correlation analysis, and linear regression.
This workshop provides an introduction to the SAS system, focusing on the SAS DATA STEP with an emphasis on data input, manipulation, output, and summary. Topics covered include creating SAS working data sets and files, importing data from SPSS and Excel files, formatting variable and value labels, and conducting simple statistical procedures such as PROC FREQ and PROC MEANS.
This workshop explores the analysis of designed experiments with PROC ANOVA and PROC GLM, along with linear and non-linear regression techniques using PROC REG and PROC GENMOD. Topics covered encompass one-way and two-way analysis of variance, simple and multiple linear regression, regression diagnostics, and logistic regression.
This workshop provides an overview of RStudio and the SAGE Campus platform. RStudio, a user-friendly integrated development environment for the R language, is explored alongside SAGE Campus, a learning platform offering online courses for skills and research methods. This workshop covers key R concepts, including elementary data structures, atomicity, plotting using ggplot2, regression plotting, and logistic regression. The content is based on the course offered by SAGE Campus.
This workshop will involve downloading COVID-19 data for states and Massachusetts from the Center for Systems Science and Engineering of Johns Hopkins University and the Department of Public Health (DPH) Massachusetts. We will employ time series and spatial regression models to analyze the COVID-19 data, utilizing R packages such as forecast, tseries, spdep, maptools, and ggplot2. Additionally, this workshop will demonstrate how to use R to generate reports for COVID data.
This workshop introduces techniques for structural equation modeling (SEM). SEM is employed to test complex relationships between observed (measured) and unobserved (latent) variables. Topics covered include fundamentals underlying SEM, SEM notation, path diagrams, data preparation, mediation analysis, path analysis, parameter estimation, and assessment of model fit. AMOS and R are used to demonstrate examples.
The second SEM workshop delves into advanced topics including measurement error, latent variables analysis, exploratory factor analysis (EFA), confirmatory factor analysis (CFA), development of structural equation models with estimation, and model testing. Additionally, this workshop introduces latent growth models for longitudinal data. R program and AMOS are utilized to demonstrate model structures, parameter estimation, and model modification.
This workshop provides an overview of the fundamental principles of multilevel/hierarchical linear models. Topics include the necessity for appropriate methods to model dependencies (e.g., clustering of students within schools), formulating and interpreting two-level multilevel models and their relevant parameters, and using SPSS to estimate model parameters.
This workshop covers sample size determinations and power estimation for various statistical comparisons and tests using the PROC POWER procedure in SAS.
This workshop covers missing data mechanisms, non-random selection bias analysis, and methods of single and multiple imputation (MI) using SAS and Stata. Missing data is a common issue in various datasets. Most statistical software packages automatically eliminate entire cases with missing data from the analysis, potentially leading to low sample sizes and biased results.
This is an introductory workshop in statistical learning focusing on the important elements of modern data analysis such as regression and classification methods. Topics covered include linear and logistic regression, linear discriminant analysis, cross-validation, principal components, and clustering. Data analysis examples in this workshop are demonstrated using R.
This workshop introduces spatial modeling, exploring tools such as R’s maptools and spdep packages. The workshop covers essential topics including spatial data visualization in R, understanding spatial autocorrelation, statistical methods for spatial dependence, creating spatial weights, and building spatial regression models.
This workshop emphasizes the practical aspects of time series analysis. Methods are hierarchically introduced, starting with terminology and exploratory graphics, moving to descriptive statistics, and ending with practical modeling procedures including how to choose an appropriate time series forecasting method, fit a model, evaluate its performance, and use it for forecasting. It focuses on the most popular business forecasting methods: regression models, smoothing methods including Moving Average (MA) and Exponential Smoothing, and Autoregressive (AR) models. Practical implementation in R is illustrated at each stage of the workshop.
SAGE Campus is a learning platform that offers designed online courses for skills and research methods. These fully self-paced courses feature an engaging mix of video content, interactive elements, and formative assessments. This workshop provides an overview of SAGE Campus courses and guides students in setting up an account to enroll in SAGE Campus courses. The session will use the SAGE online course “Introduction to R” as an example.
This workshop using SPSS presents statistical methods of survival analysis, specifically focusing on studies where the outcome is a time-to-event variable. It covers the estimation of survival time using the life table and Kaplan-Meier Methods, as well as modeling survival risk. It also assesses the relationship between risk factors and survival times using the Cox regression model. SPSS 28.0 will be used for data analysis.
This is a causal inference research design method for analyzing the impact of a specific event on a particular outcome or variable of interest over a defined time period. The event can be considered as the treatment in a Difference-in-Difference (DiD) analysis, and the dynamics of the impact can be assessed by comparing the changes in outcomes over time between the treated and control groups. This workshop will make use of a variety of R packages, specifically, fixest, plm, and did for event-study regression. Topics covered include data preparation, DiD analysis, dynamic DiD model, and the graphic display of the dynamic event effects.