Oceania Stata Conference 2026 – Virtual

5 February 2026

Oceania Stata Conference 2026

Oceania Stata Conference Presentations

We are excited to introduce our speakers for the 2026 Oceania Stata Conference!

Nicholas Cox

Nicholas Cox

Some Graphical Tips for Stata Users

The talk will cover a miscellany of graphical tips, some old, some new. Nick will discuss using both official commands and community-contributed commands. He will range from small stuff (often it is a matter of detail to change a good graph into a much better one), through various techniques and tricks, to broad strategy, both in learning and using Stata easily and effectively and in working with graphics for research, teaching, and service.

About the speaker

Nicholas Cox is a statistically minded geographer at Durham University. He works mostly with environmental data, secondarily with social science data. His interests include statistical graphics, exploratory data analysis, distributions, transformations, generalised linear models, directional data analysis, and the history of statistics. He contributes talks, postings, FAQs, and programs to the Stata user community. He has co-authored 16 commands in official Stata. He was an author of several inserts in the Stata Technical Bulletin and is an editor of the Stata Journal. His “Speaking Stata” articles on graphics from 2004 to 2013 have been collected as Speaking Stata Graphics (2014). He also edits the Tips in the Stata Journal, intermittently collected in book form (most recently in 2024).

Aramayis Dallakyan

Aramayis Dallakyan

Introduction to Explainable Machine Learning Using Stata

Machine learning (ML) has become a powerful tool for modeling complex data and providing accurate predictions. However, the "black-box" nature of many ML models often raises concerns about their explainability and trustworthiness. Explainable machine learning (XML) seeks to address these concerns by enhancing the transparency andunderstanding of ML predictions. This talk aims to provide a practical guide to XML techniques. It begins with an overview of ensemble decision tree models such as random forests and gradient boosting, which are widely used but often difficult to interpret. Aramayis then introduces methods for explaining predictions using both global and local XML techniques. These include state-of-the-art approaches such as SHAP values, individual conditional expectation (ICE) plots, variable importance measures, partial dependence plots, and global surrogate models.

About the speaker

Aramayis Dallakyan is a Senior Statistician and Software Developer at StataCorp LLC. His research interests lie at the intersection of high-dimensional time series, causal discovery, and statistical/machine learning. His work has appeared in leading venues in statistics, data science, and machine learning. Aramayis earned his PhD in Statistics from Texas A&M University.

Zixuan Cong

Zixuan Cong

Preliminary Findings on Advancing Women's Health in Singapore through AI Acceptance

As artificial intelligence rapidly advances, it can integrate electronic health records, genetic profiles, and clinical information to enable individualized, female-specific prevention strategies. This presentation uses structural equation modelling in Stata to analyses survey data to examine factors influencing Singapore women’s adoption of AI-enabled healthcare.

Zixuan Cong is a Master's student in Behavioral and Implementation Science Interventions at the National University of Singapore, with hands-on experience in data analysis and user research with Stata. She is committed to addressing real-world challenges in health and behavioral sciences through rigorous, data-driven insights.

Pablo Gluzmann

Pablo Gluzmann

SAMREGC: Stata Module to Perform Sensitivity Analysis of Main Regression Coefficients

This presentation introduces samregc, a fast, flexible, and simple Stata command that systematizes specification-based sensitivity analysis. It evaluates the robustness of target coefficients by analyzing all–subsets (or user defined subsets) regression results over alternative combinations of control variables.

About the speaker

Pablo Gluzmann is a PhD in Economics, Senior Researcher at CEDLAS-UNLP and CONICET, and a specialist in applied econometrics. He has been using Stata for more than 20 years for economic research and the development of user-written commands.

Links

Andrew Gray

Andrew Gray

"Can I just use n=30 in each group?": Using Stata for Sample Size Determination in an Increasingly Complex World

In this talk, Andrew will outline some of the practical and technical challenges involving sample size that he faces as a biostatistician in the health sciences. He will describe how Stata helps to support a workflow including simulations when needed and use examples from recent research projects.

About the speaker

Andrew Gray is a biostatistician in the Biostatistics Centre, University of Otago, where he collaborates on a wide range of health-related research projects as well as pursuing his own research. Prior to this, Andrew worked in a knowledge engineering research group in the Department of Information Science, University of Otago.

Yuke Li

Yuke Li

How Cooking and Eating at Home Shape Emotional Well-Being: Insights from the Food & You Survey

Using data from the Food & You Survey, this study aimed to evaluate the behavioural pathways linking emotional well-being, cooking behaviour, and eating-at home practices, and to identify leverage points for public-health and behavioural interventions. Partial Least Squares Structural Equation Modelling was performed with Stata to assess these directional relationships.

About the speaker

Yuke Li is a Master’s student in Health Behavior and Implementation Science at the National University of Singapore, with a cross-disciplinary background in Food Science and Engineering. She specializes in leveraging Stata for regression analysis and statistical modeling, having analyzed over 500 datasets to explore the correlations between dietary behaviors, food safety, and mental health in her leading research.

Dean McKenzie

Dean McKenzie

Healthcare Quality Control and Improvement Using Stata

Stata is widely used in healthcare to compare events such as falls, infections and episodes of delirium over time using control charts across hospital wards or across hospitals. This presentation demonstrates methods including control charts, funnel plots, ANOM and contrast and user written CHAID with the goal of developing healthcare quality improvement techniques.

About the speaker

Dean is Biostatistician with Epworth HealthCare, formerly with Monash University. He has been using Stata for more than 25 years, in a wide variery of medical, epidemiological and psychological applications.

Irma Mooi-Reci

Irma Mooi-Reci

XTVFREG: Stata Module for Estimating Variance Function Panel Regression

This presentation introduces xtvfreg, a new Stata module that implements an iterative mean variance panel regression estimator in which both the conditional mean and conditional variance of the dependent variable are modeled as functions of covariates. The estimator is designed for researchers working with panel data in which heteroskedasticity is substantively meaningful.

About the speaker

Irma studies labor market and career mobility dynamics and has been exploring these patterns with Stata since 2005. From tracking career jumps to decoding job flows, Irma turns complex labor data into clear insights and occasionally into graphs that make labour scholars smile.

Marianna Nitti

Marianna Nitti

rdlasso: Regression Discontinuity with High-Dimensional Data

This presentation discusses a command, rdlasso, which allows the inclusion of high dimensional covariates in Regression Discontinuity Design (RDD) settings. The command allows for the inclusion of high-dimensional covariates in RDD for sharp and fuzzy cases, making the methodology accessible to Stata users and also automating the covariate selection procedure.

About the speaker

Marianna Nitti is a PhD student at Sapienza University of Rome. Her research interests include labor, family and gender economics. She is currently working on Stata and Python tools for data analysis.

Mathew Piercy

Mathew Piercy

Fluid Balance in Postoperative Patients and Relationship to Acute Kidney Injury

This paper presents an audit of 330 postoperative patients examining the relationship between postoperative cumulative fluid balance over 7 days and the incidence and rate of recovery of acute kidney injury. The data was analysed with Stata using a zero-inflation Poisson regression model and compared to a GEE model and 2 models using h2o machine learning.

About the speaker

Matt is a newcomer to Stata, having joined the Stata community at the beginning of 2025. For the past 6 years he has worked with the Tasmanian Health Service at Northwest Regional Hospital Burnie as a staff specialist in ICU and anaesthetics.

Alannah Rudkin

Alannah Rudkin

The Use of Stata putdocx for Automating Data Safety Monitoring Committee Reports

Data safety monitoring committee meetings for clinical trials necessitate the creation of a statistical report from which the trial's safety, progress, and data integrity can be assessed. The presentation shows how much of the repetitive process can be streamlined and automated using Stata’s putdocx commands via a do-file.

About the speaker

Alannah is a biostatistician and project officer at the Murdoch Children's Research institute in Melbourne, Australia.

Thomas Soseco

Thomas Soseco

Household Net Wealth Inequality in Indonesia: Evidence from a Dagum Type III Model

Investigating household net wealth inequality in Indonesia is important as it can worsen for low-class individuals or households who are unable to inherit sufficient capital for the next generations and maintain financial stability during a period of low or no income. This paper applies the Dagum Type III model to measure household net wealth inequality in Indonesia.

About the speaker

Thomas Soseco has over 10 years of experience using Stata, with a focus on its application in Economics and Development Economics.

Links

Xuelu Sun

Xuelu Sun

gofbinreg: Goodness-of-fit Statistics in Binary Regression Models

When reporting the results of binary regression, it’s crucial to evaluate the overall model adequacy using goodness-of-fit statistics. This presentation introduces a command gofbinreg, which assesses the performance of the Hosmer-Lemeshow, normalised unweighted sum of squares and Hjort–Hosmer statistics to evaluate overall model adequacy.

About the speaker

Xuelu is a PhD candidate at Swinburne University of Technology. Her research focuses on goodness-of-fit statistics in regression models with categorical outcomes and further Stata implementation.

Luyang Xiao

Luyang Xiao

Young Hearts at Risk: Preliminary Insights into Detection and Personalized Management of Acute Myocardial Infarction

This study investigates acute myocardial infarction among younger adults in Singapore and characterizes their clinical and risk profiles to guide early detection and targeted management. Using data from the National Registry of Diseases Office, a retrospective cohort of patients was analysed for 1-year all-cause mortality using Bayesian proportional hazards models in Stata.

About the speaker

Luyang Xiao is a Master's student in Behavioural and Implementation Science in Healthcare at the National University of Singapore (NUS). With four years of Stata experience, he focuses on leveraging real-world evidence to analyze public health issues and design data-driven behavioural interventions for individuals and organizations.

Ricardo Rodolfo Retamoza Yocupicio

Ricardo Rodolfo Retamoza Yocupicio

Limitations and Comparison of the DFA PP and KPSS Unit Root Test: Evidence for Labour Variables of Mexico

Unit root tests have represented a great contribution to time series analysis by detecting variable stationarity. However, this presentation includes some of the criticisms that have been made to the unit root tests by executing in Stata the three best-known unit root tests for the main macroeconomic variables of Mexico, this with the intention of analyzing, both graphically and technically, whether the series are stationary or not.

About the speaker

Ricardo Retamoza is a PhD candidate in Economics at the National Autonomous University of Mexico (UNAM). He has worked with Stata on time series analysis, unit root tests, cointegration models, and vector autoregression applied to problems in the Mexican labor market such as informal employment, underemployment, unemployment, etc. He has previously participated in Stata Conferences held in Oslo, Norway, and São Paulo, Brazil.

Links

Hengni Yuan

Hengni Yuan

Artificial Intelligence in Suicide Prevention: Comparative Evidence from a Network Meta Analysis

Artificial intelligence is emerging as a powerful tool in suicide prevention. Often outperforming traditional assessments, machine-learning models can analyse electronic health records and social-media language to identify subtle behavioural cues that precede suicidal thoughts or actions. This study applies network meta-analysis to the systematic review by Lejeune et al. (2022), which highlights the potential of AI in improving suicide-risk detection, screening, and monitoring.

About the speaker

Hengni is a Master's student in Behavioural and Implementation Sciences in Health at the NUS Yong Loo Lin School of Medicine. She is developing her expertise in applying Stata for data analysis within implementation and evaluation research in healthcare.

Shufan Zhao

Shufan Zhao

Multistate Survival Modelling of Cardiovascular Admission and Mortality in a Heart Failure Cohort in Singapore

Heart failure patients often experience complex clinical trajectories involving hospitalisation and death. Conventional survival models that focus on a single endpoint may fail to capture these sequential outcomes adequately. This study applies a multistate survival framework to characterise transitions to cardiovascular admission and death.

About the speaker

Graduated with a master's degree in Behavioural and Implementation Science from the National University of Singapore, Shufan is a Stata user with developing proficiency and a strong interest in conducting advanced statistical modelling in health research.

Swift Stata Stories

Mark Chatfield

Mark Chatfield

Finding Incorrect References to Variable Names or Event Names in a REDCap Database

Manually checking a REDCap database before it goes into production mode is a laborious task. By importing the data dictionary into Stata and then extracting references to [variable-names] and [event-names] in calculations, branching logic etc., I show how some basic checks can be done quickly.

About the speaker

Mark is a biostatistician at The University of Queensland. He collaborates with researchers in the Faculty of Health, Medicine and Behavioural Sciences and the UQ Clinical Trials Centre.

Links

Cindy Han

Cindy Han

Uncovering Mathematical Values: A Stata-Powered Bilingual Text Analysis Tool

This story presents the development and application of the Values Automatic Sorting Algorithm, a custom text-analysis tool built entirely within Stata to efficiently process large-scale, open-ended bilingual survey data.

About the speaker

Dr Cindy Han is a Lecturer in the Department of Education at Swinburne University of Technology and a member of the Advisory Board for the Melbourne Girls’ Grammar Institute (MGGI). As a quantitative researcher, she specialises in using Stata to model the factors influencing educational outcomes, including student performance in mathematics and science.

Links

Gabor Mihala

Gabor Mihala

A Minimalist Approach to Version Control and Reproducibility in Stata

This story introduces a deliberately, almost too-simple approach to version control and reproducibility for in-house Stata workflows: a short paragraph of Stata code that can be pasted into any do-file. This code automatically records the development history of the file and supports reproducible results.

About the speaker

Gabor is a biostatistician at the Australasian Kidney Trials Network. He has 15 years of experience using Stata to analyse sensitive data from clinical trials and health research studies.