# CARMA Retreat 2015

## Abstracts

Speaker: Salman Cheema
Title: How Large Should a Large Aggregate Association Index (AAI) be?
Supervisors: Eric Beh and Irene Hudson
Abstract:
Aggregate data arises in situations where survey research or other means of collecting individual-level data are either infeasible or inefficient. The recent increasing use of aggregate data in the statistical and allied fields . including epidemiology, education and social sciences . has arisen due to number of reasons. These include the questionable reliability of estimates when sensitive information is required, the imposition of strict confidentiality policies on data by government and other organisational bodies and in some contexts it is impossible to collect the information that is needed. In this paper we present a novel approach to quantify the statistical significance of the extent of association that exists between two dichotomous variables when only the aggregate data is available. This is achieved by examining a newly developed index, called the aggregate association index (or the AAI), developed by Beh (2008 and 2010) which enumerates the overall extent of association about individuals that may exist at the aggregate level when individual level data is not available.
The applicability of the technique is demonstrated by using leukaemia relapse data of Cave et al. (1998). This data is presented in the form of a contingency table that cross-classifies the follow up status of leukaemia relapse by whether cancer traces were found (or not) on the basis of polymerase child reaction (PCR) . a modern method used to detect cancerous cells in the body assumed superior than conventional for that period, microscopic identification. Assuming that the joint cell frequencies of this table are not available, and that the only available information is contained in the aggregate data, we first quantify the extent of association that exists between both variables by calculating the AAI. This index shows that the likelihood of association is high. As the AAI has been developed by exploiting Pearson.s chi-squared statistics, the AAI inherently suffers from the well-known large sample size effect that can overshadow the true nature of the association shown in the aggregate data of a given table. However, in this presentation we show that the impact of sample size can be isolated by generating a pseudo population of 2x2 tables under the given sample size. Therefore, the focus of this paper is to present an approach to help answer the question .is this high AAI value statistically significant or not?. by using aggregate data only. The answer to this question lies we believe, in the calculation of the p-value of the nominated index. We shall present a new method of numerically quantifying the p-value of the AAI thereby gaining new insights into the statistical significance of the association between two dichotomous variables when only aggregate level information is available. The pseudo p-value approach suggested in this paper enhances the applicability of the AAI and thus can be considered a valuable addition to the literature of aggregate data analysis.
Key word: Aggregate data, Aggregate Association Index, pseudo p values, Ecological inference, sample size
Speaker: David Franklin
Title: Hardy Spaces and Paley-Wiener Spaces for Clifford-valued functions
Supervisor: Jeff Hogan
Abstract:
In one dimension, the Hardy and Paley-Wiener Spaces relate the support of a function's Fourier transform to the growth rate of its analytic extension. In this talk we show that analogues of these spaces exist for Clifford-valued functions in n dimensions, using the Clifford-Fourier Transform of Brackx et al and the monogenic (n+1 dimensional) extension of these functions.
Title: Concentration around a hyperplane in a quasi-normed space
Supervisor: Jon Borwein
Abstract:
It is shown that if the small ball probability of a random sum of vectors in a finite dimensional quasi-normed space does not decay too fast, then many of them are concentrated around a hyperplane. Joint work with Omer Friedland (Paris VI) and Olivier Guédon Paris Est).
Speaker: Cyriac Grigorious
Title: On the Metric Dimension of Extended de Bruijn Digraphs and Extended Kautz Digraphs
Supervisors: Mirka Miller and Joe Ryan
Abstract:
A metric basis for a digraph $G(V,A)$ is a minimum set $W \subset V$ such that for each pair of vertices $u$ and $v$ of $V$, there is a vertex $w \in W$ such that the length of a shortest directed path from $w$ to $u$ is different from the length of a shortest directed path from $w$ to $v$ in $G$; that is $d(w,u) \neq d(w,v)$. The cardinality of a metric basis of $G$ is called the metric dimension and is denoted by $\beta(G)$. We solve the metric dimension problem for extended de Bruijn and extended Kautz graphs.
Speaker: John Harrison
Title: A symptotic behaviour of matrix random walks
Supervisor: George Willis
Abstract:
I will describe the Poisson boundary of a discrete family of matrix groups under certain weak restrictions. The Poisson boundary is a space associated with every random walk on a locally compact group which encapsulates the behaviour of the walks at infinity and gives a description of certain harmonic functions on the group in terms of the essentially bounded functions on the boundary. I will introduce random walks and the Poisson boundary during the talk.
Speaker: Colin Reid — DECRA
Title: Chief series of locally compact groups
Abstract:
Locally compact groups are groups that also have a locally compact topology, compatible with the group structure. This class of groups includes every group with the discrete topology (where the topology is irrelevant) and also compact groups ('small' from a topological perspective), but a general locally compact group does not decompose into compact and discrete groups. A chief factor of a locally compact group G is a quotient K/L, such that K and L are closed normal subgroups of G and no closed normal subgroup of G lies strictly between K and L. A chief series is a series $1 < K_1 < ... < K_n = G$ of closed normal subgroups of G such that $K_{i+1}/K_i$ is a chief factor for all i. Locally compact groups do not generally have finite chief series, as they have too many compact and discrete factors. However, it turns out that if G is generated by a compact subset, then G has a finite 'essentially' chief series, such that every factor is compact, discrete or a chief factor. The 'large' chief factors are also unique in a certain sense. This is joint work with Phillip Wesolek.
Speaker: Björn Rüffer — new lecturer
Title: The dynamics of monotone vector inequalities
Abstract:
We will investigate nonlinear versions of the relation between a monotone vector inequality and qualitative properties of an associated dynamics. The linear version of this vector inequality is $x \leq Ax + b$ (component-wise), where $x$ and $b$ are a non-negative vectors, $A$ a non-negative matrix, and the objective is to bound $x$. The matrix $A$ induces a dynamical system and, in the linear case, stability properties of that system are in one-to-one correspondence with the satisfiability of the inequality. For the nonlinear case we replace $A$ by a monotone mapping and obtain some interesting insights.
Speaker: Amir Salehipour
Title: Tools to analyze impact of backtest overfitting on investment strategies
Supervisor: Jon Borwein
Abstract:
In mathematical finance, backtest overfitting means the usage of historical market data (a backtest) to develop an investment strategy, where too many variations of the strategy are tried, relative to the amount of data available. Backtest overfitting is now thought to be a primary reason why quantitative investment models and strategies that look good on paper often disappoint in practice. In this talk we introduce two online tools, the Backtest Overfitting Demonstration Tool, or BODT and the Tenure Maker Simulation Tool, or TMST, which illustrate the impact of overfitting on investment models and strategies.
Speaker: Sudeep Stephen
Title: Power Domination in de Bruijn and Kautz digraphs
Supervisors: Mirka Miller and Joe Ryan
Abstract:
Let $G(V,A)$ be a connected digraph. We call a set $W$ of vertices critical if there is no vertex outside $W$ which has fewer than two neighbors in $W$. In other words, $W \subseteq V$ is critical if $N^{+}_{W}(i) > 1$ for every $i \in N^{^{-}}_{W}$ If $W$ is critical, but no proper subset of $W$ is critical, then we call $W$ minimal critical. A vertex set $S$ is a power dominating set if and only if $N^{+}_{G}(S) \cap W \neq 0$ for every minimal critical set $W$. In this talk, I discuss the results obtained for power domination problem in de Bruijn and Kautz digraphs.
Speaker: R. Sundara Rajan
Title: On Network Embeddings
Supervisors: Mirka Miller and Joe Ryan
Abstract:
Interconnection networks provide an effective mechanism for exchanging data between processors in a parallel computing system. An interconnection network is often represented as a graph, where nodes represent processors and edges correspond to communication links between processors, the design and analysis of an interconnection network is such that it possess excellent graph embedding ability in order to efficiently execute parallel algorithms. Network embedding is an important technique used in the study of computational capabilities of processor interconnection networks and task distribution. The quality of an embedding can be measured by certain cost criteria, namely dilation, congestion and wirelength. My seminar mainly focuses on wirelength on network embeddings.
Speaker: Garth Tarr — new lecturer
Title: Robust methods and model selection
Abstract:
This presentation outlines two aspects of my research: robust statistics and model selection.
Standard robust statistical procedures assume that less than half the observation rows of a data matrix are contaminated, which may not be a realistic assumption when the number of variables is large. I will give an overview of some recent research into the problem of estimating covariance and precision matrices under a cellwise contamination model (Tarr et al., 2015a).
I will also briefly discuss the R package that provides a collection of functions designed to help users visualise the stability of the variable selection process (Tarr, et al., 2015b). A browser based graphical user interface is provided to facilitate interaction with the results. We have developed routines for modified versions of the simplified adaptive fence procedure (Jiang et al., 2009) and other graphical tools such as variable inclusion plots and model selection plots (Müller and Welsh, 2010; Murray et al., 2013). We also propose extensions to higher dimensional models using via bootstrapping lasso estimates and incorporate robustness to outliers via an initial screening process (Filzmoser et al., 2008).
Speaker: Dushyant Tanna
Title: Reflexive irregular total labeling
Supervisors: Mirka Miller and Joe Ryan
Abstract:
Graph labelling is a mapping from a subset of graph elements to a set of numbers (usually non negative integers). In most cases, the domain of mapping is the set of vertices (called vertex labelling) or set of edges (called edge labelling) or both (called total labelling). We define the weight of a vertex as the sum of the label of that vertex and labels of incident edges. Similarly, the weight of an edge is defined as the sum of the label of that edge and labels of incident vertices. An edge (vertex) irregular total k-labeling of a graph G is a total labelling such that any two different edges (vertices) have distinct weights and k is the largest label. The total edge (vertex) irregularity strength of G, tes(G) (tvs(G)) is the smallest k such that G has total edge (vertex) irregular labelling. Reflexive irregular total labeling is similar to irregular total labeling in many aspect but there are some differences, in particular, in reflexive irregular total labeling, the vertex labels represent loops and so have to be even numbers and vertex label 0 (representing a loopless vertex) is allowed. Here, we will present basic results regarding reflexive irregular total labeling and provide reflexive irregular edge and vertex strength (res(G) and rvs(G) resp.) for star, paths and complete graphs.
Speaker: Andrew Thursby
Title: Software for Education: Gamification Techniques to Support Online and Traditional Learning Methods
Supervisor: Ljiljana Brankovic
Abstract:
Current research increasingly shows that utilising gamification methods in education adds value to traditional learning approaches, by increasing both students' motivation for learning, and their skills acquisition.
Analysis of the research, as well as of gamification as a key element in the success of online communities such as FourSquare and Stack Overflow, demonstrates an emerging set of guidelines and tools that can be utilised by educators and educational program developers to 'gamify' learning, and which is being used to develop software for gamification. Continuous evaluation, however, is required within the field of gamification of education, particularly with the rapid pace of software development. Gamification techniques also should be implemented to complement and enhance, rather than replace, traditional learning techniques.
In this talk I will give a brief survey of gamification in education and of the current software applications, plus discuss the possibilities for extending the functionality of Blackboard to enable integration of Gamification.
Speaker: Duc Tran
Title: Equivalences of Stability Properties for Discrete-Time Nonlinear Systems and extensions
Supervisors: Christopher Kellett and Björn Rüffer
Abstract:
Several qualitative equivalences are demonstrated between various robust stability properties for discrete-time nonlinear systems. In particular, Input-to-State Stability (ISS) and integral ISS (iISS) are shown to be qualitatively equivalent, via a nonlinear change of coordinates, to linear and nonlinear l2-gain properties, respectively. These equivalences, together with previous results on the equivalence of 0-input global asymptotic stability and iISS provide interesting relationships between discrete-time robust stability properties that do not hold in continuous-time. Further extending these equivalences to a general input-output model, ISS with respect to two measures is used to subsume many ISS-type properties such as input-to-output stability (IOS), state-independent input-to-output stability (SI-IOS), and a version of incremental ISS.
Speaker: Duy Tran
Title: The Aggregate Association Index to the Case of Stratified 2x2 Tables. An example: the New Zealand 1893 voting data
Supervisors: Eric Beh and Irene Hudson
Abstract:
Data aggregation often occurs due to data collection methods or confidentiality laws imposed by many governments and organisations. This kind of practice is carried out to ensure that privacy is protected and only a right amount of information is distributed. In the case of categorical analysis, the availability of only aggregate data or marginal totals of contingency tables makes it difficult to draw conclusions about the association between categorical variables. This issue lies in the field of Ecological Inference (EI) and is of growing concern for data analysts, especially for those dealing with the aggregate analysis of a single, or stratified, 2x2 tables. Currently, there are a number of EI approaches to deal with the issue but at varying degrees as they still suffer from major shortfalls in the required assumptions (Hudson et al., 2010).
As an alternative to ecological inference techniques when only marginal totals are available, one may consider the Aggregate Association Index (AAI) of Beh (2004, 2008, 2010) to obtain information about the association between two categorical variables of a single 2x2 table. The original AAI work is currently only applicable for a single 2x2 table, hence the purpose of this presentation is also to extend the application of the AAI to the case of stratified 2x2 tables. In particular, we will investigate the homogeneity among the AAIs of the stratified 2x2 tables and introduce a method to provide an overall AAI. To illustrate this new extension of the AAI, the New Zealand gendered voting data in 1893 is used in this presentation. The data set consists of a number of stratified 2x2 tables at electorate level and is also an interesting one as New Zealand was the first country in the world where women had the right to vote.
Keywords: Marginal Totals, 2x2 tables, Aggregate Data, Ecological Inference, Aggregate Association Index.
Speaker: Nathan Van Maastricht
Title: Computational Approaches to Finding Extremal Graphs
Supervisor: Judy-anne Osborn
Abstract: A brief look at a couple of different approaches in a search for finding graphs with maximal cardinality on the edge set given a fixed cardinality on the vertex set and a minimum girth. A Binary Integer Linear Program and a tree search will be talked about. Details on implementation optimisations in the tree search will be shown.
Speaker: Paul Vrbik
Title: Yet Another Proof of Sylvester's Identity
Supervisor: Jon Borwein
Abstract: In 1857 Sylvester stated a result on determinants without proof that was recognized as important over the subsequent century. Thus it was a surprise to Akritas, Akritas and Malaschonok when they found only one English proof --- given by Bareiss 111 years later! To rectify the gap in the literature these authors collected and translated six additional proofs: four from German and two from Russian. These proofs range from long and "readily understood by high school students" to elegant but high level.
We add our own proof to this collection which exploits the product rule and the fact that taking a derivative of a determinant with respect to one of its elements yields its cofactor. A differential operator can then be used to replace one row with another.