Modeling Time to Open of Emails with a Latent State for User Engagement   Level

Moumita Sinha; Vishwa Vinay; Harvineet Singh

arXiv:1908.06512·cs.LG·August 20, 2019

Modeling Time to Open of Emails with a Latent State for User Engagement Level

Moumita Sinha, Vishwa Vinay, Harvineet Singh

PDF

TL;DR

This paper introduces a survival analysis framework using Cox Proportional Hazards and a mixture model to predict email open times, accounting for user engagement levels, and demonstrates improved accuracy on real-world marketing data.

Contribution

It extends CoxPH models with a latent state mixture approach to better capture user engagement variability in email open time prediction.

Findings

01

Mixture model outperforms standard models in accuracy.

02

Survival analysis jointly models open event and time-to-open.

03

Approach effective on large real-world marketing dataset.

Abstract

Email messages have been an important mode of communication, not only for work, but also for social interactions and marketing. When messages have time sensitive information, it becomes relevant for the sender to know what is the expected time within which the email will be read by the recipient. In this paper we use a survival analysis framework to predict the time to open an email once it has been received. We use the Cox Proportional Hazards (CoxPH) model that offers a way to combine various features that might affect the event of opening an email. As an extension, we also apply a mixture model (MM) approach to CoxPH that distinguishes between recipients, based on a latent state of how prone to opening the messages each individual is. We compare our approach with standard classification and regression models. While the classification model provides predictions on the likelihood of an…

Tables6

Table 1. Table 1. Overview of Datasets

Dataset	#Recipients
	#Emails
	(millions)
Training	2.05	31.86
Validation	2.04	22.73
Test	2.22	19.14

Table 2. Table 2. List of Models

(1)

B: Baselines

(2)

LR: Logistic Regression (Classification)/Linear Regression (Time to Open)

(3)

CPH-L: CoxPH Model with relative hazard,

ψ ​ (β, 𝐗) = e ​ x ​ p ​ (β^{T} ​ X)

(4)

CPH-G: CoxPH Model with relative hazard,

ψ ​ (β, 𝐗)

from a GBM

(5)

MM: Mixture Model with Proportional Hazards

Table 3. Table 3. Comparison of the Models under AUC and MRAD across Censoring Windows

	Censoring Window = 3 hours					Censoring Window = 6 hours					Censoring Window = 12 hours
Model	B	LR*	CPH-L	CPH-G	MM	B	LR*	CPH-L	CPH-G	MM	B	LR*	CPH-L	CPH-G	MM
AUC	0.863	0.931	0.931	0.932	0.929	0.870	0.939	0.939	0.940	0.938	0.878	0.948	0.948	0.949	0.948
MRAD(A)	1.226	1.332	1.085	0.941	0.483	2.504	1.653	1.835	1.372	0.678	5.079	2.332	1.707	1.572	1.318
MRAD(O)	26.641	8.411	11.953	12.217	9.499	40.602	11.706	23.245	14.788	9.832	62.740	17.501	19.831	28.978	15.657

Table 4. Table 4. The effect of varying the percentile p 𝑝 p of Survivor Function on the prediction quality of the time-to-event

Censoring	Model	MRAD(O)
Window		$\hat{t} (5)$	$\hat{t} (10)$	$\hat{t} (25)$	$\hat{t} (50)$	$\hat{t} (75)$	$\hat{t} (90)$
3 hours	CPH-L	11.952	12.608	13.929	16.744	23.462	25.716
	CPH-G	12.217	14.593	18.501	25.407	26.629	26.641
	MM	9.499	26.641	26.641	26.641	26.641	26.641
6 hours	CPH-L	23.245	17.456	18.857	19.870	27.041	34.542
	CPH-G	14.788	19.588	26.233	30.970	38.102	40.602
	MM	9.832	12.483	40.602	40.602	40.602	40.602
12 hours	CPH-L	19.831	34.632	27.229	32.510	40.356	40.750
	CPH-G	28.978	31.866	29.986	34.590	57.086	61.040
	MM	15.657	21.705	62.545	62.740	62.740	62.740

Table 5. Table 5. Mean and Standard Deviation of AUC & MRAD(O) respectively on 10 10 10 boostrapped samples

Censoring	Model	AUC		MRAD(O)
Window		Mean	StdDev	Mean	StdDev
3 hours	LR*	0.931	4e-5	8.215	0.036
	CPH-L	0.931	2e-5	13.579	1.913
	CPH-G	0.929	8e-3	13.746	0.743
	MM	0.929	3e-4	9.277	1.854
6 hours	LR*	0.939	4e-5	11.651	0.079
	CPH-L	0.939	1e-5	20.301	2.486
	CPH-G	0.939	3e-4	19.208	0.798
	MM	0.938	2e-4	9.753	1.194
12 hours	LR*	0.948	4e-5	17.514	0.096
	CPH-L	0.948	7e-5	38.143	3.333
	CPH-G	0.949	2e-4	29.455	1.071
	MM	0.948	2e-4	15.444	1.683

Table 6. Table 6. AUC and MRAD(O) for each of the models for the out-of-time dataset

Censoring Window	Model	AUC	MRAD(O)
3 hours	LR*	0.937	7.753
	CPH-L	0.937	10.009
	CPH-G	0.934	13.787
	MM	0.935	7.381
6 hours	LR*	0.944	10.194
	CPH-L	0.944	23.227
	CPH-G	0.940	18.339
	MM	0.942	9.340
12 hours	LR*	0.952	13.409
	CPH-L	0.952	29.131
	CPH-G	0.950	21.080
	MM	0.951	11.653

Equations26

S (t) = P (T \geq t) = 1 - F (t) = t \int \infty f (u) d u

S (t) = P (T \geq t) = 1 - F (t) = t \int \infty f (u) d u

h (t) = d t \to 0 lim \frac{P ( t \leq T < t + d t ∣ T \geq t )}{d t}

h (t) = d t \to 0 lim \frac{P ( t \leq T < t + d t ∣ T \geq t )}{d t}

δ_{i} = {1, 0, if t_{i} < C_{i} otherwise

δ_{i} = {1, 0, if t_{i} < C_{i} otherwise

h_{i} (t ∣ X_{i}) = h_{0} (t) \times ψ (X_{i})

h_{i} (t ∣ X_{i}) = h_{0} (t) \times ψ (X_{i})

\begin{split}S_{i}(t|X_{i})&=exp\big{\{}-\int\limits_{0}^{t}h_{0}(u)\psi(X_{i})du\big{\}}\\ &=[exp\big{\{}-H_{0}(t)\big{\}}]^{\psi(X_{i})}\\ &=S_{0}(t)^{\psi(X_{i})}\end{split}

\begin{split}S_{i}(t|X_{i})&=exp\big{\{}-\int\limits_{0}^{t}h_{0}(u)\psi(X_{i})du\big{\}}\\ &=[exp\big{\{}-H_{0}(t)\big{\}}]^{\psi(X_{i})}\\ &=S_{0}(t)^{\psi(X_{i})}\end{split}

L (β) = i : δ_{i} = 1 \prod \frac{ψ ( X _{i} ; β )}{l \in R ( t _{i} ) \sum ψ ( X _{l} ; β )}

L (β) = i : δ_{i} = 1 \prod \frac{ψ ( X _{i} ; β )}{l \in R ( t _{i} ) \sum ψ ( X _{l} ; β )}

L_{i} = {1, 0, prone to the event otherwise

L_{i} = {1, 0, prone to the event otherwise

\pi(Z_{i})=P(L_{i}=1|Z_{i})=\frac{exp\big{(}\mathbf{b}^{T}Z_{i}\big{)}}{1+exp\big{(}\mathbf{b}^{T}Z_{i}\big{)}}

\pi(Z_{i})=P(L_{i}=1|Z_{i})=\frac{exp\big{(}\mathbf{b}^{T}Z_{i}\big{)}}{1+exp\big{(}\mathbf{b}^{T}Z_{i}\big{)}}

S_{i} (t ∣ X_{i}) = π (Z_{i}) S (t_{i} ∣ L = 1, X_{i}) + (1 - π (Z_{i}))

S_{i} (t ∣ X_{i}) = π (Z_{i}) S (t_{i} ∣ L = 1, X_{i}) + (1 - π (Z_{i}))

L (β, b) = i = 1 \prod N ([1 - π_{i} (Z_{i})]^{1 - L_{i}} \times [π_{i} (Z_{i}) S (t_{i} ∣ L_{i} = 1, X_{i}) {h (t_{i} ∣ L_{i}, X_{i})}^{δ_{i}}]^{L_{i}})

L (β, b) = i = 1 \prod N ([1 - π_{i} (Z_{i})]^{1 - L_{i}} \times [π_{i} (Z_{i}) S (t_{i} ∣ L_{i} = 1, X_{i}) {h (t_{i} ∣ L_{i}, X_{i})}^{δ_{i}}]^{L_{i}})

r_{ij} = X_{ij} - a_{ij}

r_{ij} = X_{ij} - a_{ij}

a_{ij}=\frac{\sum_{k\in R\small(t_{i}\small)}X_{kj}exp\big{(}\beta^{T}X_{k}\big{)}}{\sum_{k\in R\small(t_{i}\small)}exp\big{(}\beta^{T}X_{k}\big{)}}

a_{ij}=\frac{\sum_{k\in R\small(t_{i}\small)}X_{kj}exp\big{(}\beta^{T}X_{k}\big{)}}{\sum_{k\in R\small(t_{i}\small)}exp\big{(}\beta^{T}X_{k}\big{)}}

M R A D = \frac{1}{N} i \sum \frac{∣ t _{i} - t _{i} ^ ∣}{t _{i}}

M R A D = \frac{1}{N} i \sum \frac{∣ t _{i} - t _{i} ^ ∣}{t _{i}}

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Full text

Modeling Time to Open of Emails with a Latent State for User Engagement Level

Moumita Sinha

Adobe Research

[email protected]

,

Vishwa Vinay

Adobe Research

[email protected]

and

Harvineet Singh

Adobe Research

[email protected]

(2018)

Abstract.

Email messages have been an important mode of communication, not only for work, but also for social interactions and marketing. When messages have time sensitive information, it becomes relevant for the sender to know what is the expected time within which the email will be read by the recipient. In this paper we use a survival analysis framework to predict the time to open an email once it has been received. We use the Cox Proportional Hazards (CoxPH) model that offers a way to combine various features that might affect the event of opening an email. As an extension, we also apply a mixture model (MM) approach to CoxPH that distinguishes between recipients, based on a latent state of how prone to opening the messages each individual is. We compare our approach with standard classification and regression models. While the classification model provides predictions on the likelihood of an email being opened, the regression model provides prediction of the real-valued time to open. The use of survival analysis based methods allows us to jointly model both the open event as well as the time-to-open. We experimented on a large real-world dataset of marketing emails sent in a 3-month time duration. The mixture model achieves the best accuracy on our data where a high proportion of email messages go unopened.

Email interaction data, survival analysis, time-to-event prediction, enterprise email marketing, Cox-proportional hazards model

††journalyear: 2018††copyright: acmcopyright††conference: WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining ; February 5–9, 2018; Marina Del Rey, CA, USA††booktitle: WSDM 2018: WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining , February 5–9, 2018, Marina Del Rey, CA, USA††price: 15.00††doi: 10.1145/3159652.3159683††isbn: 978-1-4503-5581-0/18/02

1. Introduction

Email has a rich history of being a data source for machine learning techniques. Starting with spam filtering (Cormack, 2008), the range of applications today covers a rich spectrum of scenarios. The Enron Corpus (Klimt and Yang, 2004) enabled research into the modeling of users’ interactions with email in a collaborative environment (Chapanond et al., 2005). For email service providers, detailed understanding of consumers’ interactions with the email system allows building predictive models for specific actions, e.g. if an email will be replied to or not (Yang et al., 2017) and creating rich experiences (Karagiannis and Vojnovic, 2009; Kannan et al., 2016) for the recipients. On the consumer side, given its popularity, there has been much work on different ways to handle large volumes of email effectively (Whittaker and Sidner, 1996). An early paper by Horvitz et al. (Horvitz et al., 1999) proposed that autonomous agents may be able to identify and prioritize emails that need attention. The authors of (Di Castro et al., 2016) show that historical data allows the prediction of what actions a user might take on the receipt of an email, for example, marking it for deletion (Dabbish et al., 2003). Apart from being a mode of communication, email is also used as a personal information management environment (Ducheneaut and Bellotti, 2001), leading to the need to support other forms of interactions like search (Narang et al., 2017).

The domain of interest in the current paper is marketing, where the email channel ranks high in popularity (VanBoskirk et al., 2011) alongside social media, search & display advertising. Email based marketing is predicted to have a compound annual growth rate of $10\%$ (VanBoskirk et al., 2011) and nearly every enterprise marketer uses it as a delivery channel (Tsirulnik, 2011; Chaffey, 2009). The engagement levels however are typically low, as compared to personal email messages at work or among friends. The open rates for the marketing email messages, vary by industry - ranging from $15\%$ to $19\%$ in the e-commerce, beauty and personal care, and gambling industries, and in the range of $20\%$ to $28\%$ in the hobbies, home and garden or health and fitness industries (Wells, 2016). Marketers are therefore always on the lookout for techniques that might enhance the engagement levels. For example, Kumar et al. (Kumar et al., 2014) modeled opt-in and opt-out behaviour and related these to transactions made by the consumer. Bonfrer et al. (Bonfrer and Drèze, 2009) proposed a framework that allows real-time evaluation of an email campaign.

In this submission, we propose the use of survival analysis for jointly modeling the open event on an email, as well as the time-to-open. The next section provides technical background to some important concepts in survival analysis that are relevant in the current scenario.

2. Survival Analysis

Survival analysis refers to an area of statistical modeling where the main variable of interest is the time to an event. Historically, the event is assumed to be death. One characteristic of data that makes the use of survival models appropriate is the presence of censoring. This refers to the fact that not all individuals would have experienced the event within the observation window. The censoring may be because at the time of analysis the event had not yet occurred, or if the corresponding individual can no longer be tracked. Figure 1 is a pictorial representation of survival data in the context of emails. Observations are synchronized at $t=0$ , which is the time at which the individuals receive the email. If the event of the email being read is not within a chosen time interval, e.g. $t=3$ hours, this would be a censored data point. And some recipients may of course not read the email at all.

Consider a random variable $T$ for the time to the event of interest, with the corresponding probability density function $f(t)$ and the cumulative distribution function being $F(t)$ at a given time $t$ . Then the survivor function is defined as

[TABLE]

It represents the probability that an individual will survive beyond time $t$ . Equivalently, given that the individual has not yet experienced the event till time $t$ , the hazard function $h(t)$ represents the instantaneous chance of the event occurring at time $t+dt$ .

[TABLE]

The relationship between the survivor function and the hazard function can be derived as being $S(t)=exp\big{\{}-H(t)\big{\}}$ , where $H(t)$ is the cumulative hazard function corresponding to $h(t)$ .

A survival analysis dataset containing N individuals is represented as $\{X_{i},Y_{i},\delta_{i}\}$ , with $i=1\dots N$ . For the $i^{th}$ individual, $X_{i}$ is a vector of features that are believed to be predictive of the survival time. The target $Y_{i}=min(t_{i},C_{i})$ represents the survival time, where $C_{i}$ represents the duration of time for which the individual was observed and is also known as the censoring window. If observed within the censoring window, $t_{i}$ is the time to event for the $i^{th}$ individual. The indicator variable $\delta_{i}$ encodes if the $i^{th}$ individual experienced the event of interest within the censoring window.

[TABLE]

2.1. Cox Proportional Hazard Regression

Given a feature vector $X_{i}$ for the $i^{th}$ individual, the hazard function for the individual at any given time $t$ can be defined as

[TABLE]

Here $h_{0}(t)$ is the baseline hazard function at time $t$ , and $\psi(.)$ incorporates the dependence on the individual-specific features $X_{i}$ , which are independent of time. The specific factorization of $h_{i}(t|X_{i})$ into a global time-dependent component ( $h_{0}(t)$ ) and an individual’s time-independent factor ( $\psi(X_{i})$ ) is the Proportional Hazards assumption - Section 3.2 provides a methodology to validate this assumption on a given dataset. What has been defined above is a semi-parametric approach, in that no assumptions have been made about the shape of the baseline hazard function $h_{0}(t)$ . The parametric alternative would be to impose a functional form, e.g. a Weibull distribution. Based on the relation between the survivor and hazard functions, the survivor function of the $i^{th}$ individual for Cox Proportional Hazard (CoxPH) regression is

[TABLE]

The corresponding partial likelihood function (Cox, 1972) is defined as

[TABLE]

where the function $\psi(.)$ has been parameterized by $\beta$ that controls the combination of the features. $R(t_{i})$ is the set of individuals who are at-risk of the event at time $t_{i}$ , that is, the set of individuals for whom the event has not occurred yet. $t_{i}$ is also the observed time to event of the $i^{th}$ individual. Note that the numerator of the likelihood is a function of only the individuals that observed the event, and censored individuals only contribute to the denominator of Equation 6. The $\beta$ values are estimated by maximizing the above likelihood using a gradient based method.

The most common form of $\psi(X_{i})=exp\big{(}\beta^{T}X_{i}\big{)}$ , where $\beta$ is a vector of parameters controlling the dependence between the features in $X_{i}$ and target $Y_{i}$ . Doing so assumes a linear scaling of the relative (log) hazards of different individuals with respect to the values of the features. Ridgeway (Ridgeway, 1999) proposed that the likelihood in Equation 6 can alternatively be optimized directly using gradient boosting methods that might provide benefits in scenarios where the effect of the features is non-linear. Note that this is still a Proportional Hazards model, but with $\psi(X_{i})$ taken to be the output of a gradient boosting machine (GBM).

2.2. Mixture Model with Cox Proportional Hazard Regression

The CoxPH model assumes that all individuals will eventually experience the event. But there may be a proportion of individuals who are not prone to the event, i.e., who are not predisposed to opening emails. The level at which an individual user is engaged with marketing messages influences his/her act of opening the email (and how quickly). The CoxPH model described earlier tries to explain all the observations using only the features ( $X_{i}$ ) as the explanatory factors. Through the use of mixture models (Farewell, 1982; Branders et al., 2015), we might expect to get more discriminatory power. The $i^{th}$ individual is now represented as $\{X_{i},Y_{i},\delta_{i},L_{i},Z_{i}\}$ , where $L_{i}$ is a latent indicator variable such that

[TABLE]

$Z_{i}$ is a set of features that help predict if an individual is prone to the event of interest or not. The feature set $Z_{i}$ can also be the same as the feature set $X_{i}$ .

[TABLE]

The probability $P(L_{i}=1|Z_{i})$ is estimated using logistic regression here, and is introduced as a mixture probability into the overall survivor function:

[TABLE]

If the individual is predisposed to not experiencing the event, then $\pi(Z_{i})\simeq 0$ , leading to a prediction of a survival probability close to $1$ . Conversely, a scenario with $\pi(Z_{i})\simeq 1$ leads to the first term dominating, with the quantity $S(t_{i}|L=1,X_{i})$ representing the survival probability in the traditional sense. A proportional hazards assumption can be encoded by setting $S(t_{i}|L=1,X_{i})=S_{0}(t)^{exp\big{(}\beta^{T}X_{i}\big{)}}$ as before. The likelihood of the model is given by:

[TABLE]

Since there are latent variables (the $L_{i}$ ), the optimization is an Expectation Maximization based iterative procedure that estimates the $L_{i}$ , along with $\mathbf{b}$ (for calculating $\pi(Z_{i})$ ) and $\beta$ controlling how the features of an individual affect the relative hazards. In the current setting, we are interpreting $\pi_{i}(Z_{i})$ as the engagement level of a given user $i$ , the model however is more general. For e.g., it can be used to represent the probability that a patient has been cured, which in turn affects the chances that he/she will experience the event.

2.3. Related Work

Survival analysis has traditionally been used in the health-care domain to determine the time to ‘death’ in patients, but the usage of this range of techniques has recently expanded to other application areas (Wang et al., 2017). Examples include prediction of early student dropouts (Ameri et al., 2016), post-click engagement on native ads (Barbieri et al., 2016), query specific micro-blog ranking for improved retrieval (Efron, 2012), recommender systems in e-commerce (Wang and Zhang, 2013), search engine evaluation via the use of ”absence time” (Chakraborty et al., 2014), and predicting time for crowd-sourced tasks (Lease et al., 2011).

By appropriately defining the event being modeled, existing marketing concepts also lend themselves survival analysis techniques. E.g. re-purchasing behavior is an indicator of high engagement (Lee et al., 2012) and a proxy for the potential value of a customer (Drye et al., 2001; Lu and Park, 2003). Attrition modeling helps businesses identify customers who are most at-risk so that attempts can be made to keep them in the system, and (Lee et al., 2012) proposes a survival analysis based solution.

Much of the literature referred to above involve applying well-known and established models (like CoxPH) in different scenarios. But more recently, growing interest in the use of survival analysis has led to modeling improvements. For instance, when modeling time-to-event of related tasks, the parameters of the different models can be more reliably estimated using regularization techniques commonly used in multi-task learning (Li et al., 2016). Even in traditional application areas of survival analysis, given a large number of data points and a variety of features that potentially have a highly non-linear dependence on the time-to-event, deep latent models provide better performance (Ranganath et al., 2016).

The closest related work to that presented here is described in (Dave et al., 2017) where time-to-event is modelled in the email domain. Given this context, the contribution of the current paper is two-fold: (1) we describe techniques from the rich history of survival analysis to identify those models whose assumptions are better matched with the characteristics of the data (2) for the application of predicting time-to-event when the censored rows dominate, the mixture model (MM) described above is shown to not only describe the data better but also provide better predictive performance.

3. Problem Definition and Data Description

When emails containing time sensitive information are sent, it may be relevant for the sender to know what is the expected time within which the email will be read by the recipient. Specifically in marketing messages, if the email advertises a flash sale, the marketer will need to decide on the time window for the sale - to optimize between reaching sufficient consumers within the window and yet keep it exclusive. Prediction of time-to-open of an email by a consumer helps to determine the size of the recipient list one wants to reach.

Our dataset corresponds to email marketing campaigns that are sent out to consumers of an enterprise and we are interested in a predictive model that answers questions of two types: (a) Is a particular email likely to be opened by a given recipient? (b) Can we predict the time within which the email will be opened?

In the dataset, there is a high degree of variability amongst the marketing messages - some are sent to a large group of recipients, while others are targeted at a narrow set of consumers - e.g. a personalized birthday communication. We expect that the nature of people’s interaction with these different types of emails varies drastically. In particular, we are interested in modeling how people differ in terms of their engagement with the mass marketing emails. For this reason, the analysis presented here includes only those emails that were sent to at least $50\%$ of the total consumers. We have additionally dropped those consumers who received fewer than $10$ messages during the period of interest.

The time at which an email reaches a consumer is labelled as its start-time. In the event that the email is read, the email has a corresponding open-time. The difference between the two time-stamps is referred to as the time-to-open. The emails are divided into 3 non-overlapping buckets based on the start-time: a Training dataset (spanning 4 weeks) and one dataset each for Validatation & Test (spanning 3 weeks each) respectively. Table 3 shows the size of each of these datasets. Chronologically ordered, these 3 datasets cover 13 weeks of email messages with a $3$ week gap between Validatation and Test. Within each group, data from the initial two weeks are used to compute features that will be used to model users’ interaction with emails sent in the subsequent week(s).

Bibliography42

The reference list from the paper itself. Each links out to its DOI / PubMed record.

1(1)
2Ameri et al . (2016) Sattar Ameri, Mahtab J. Fard, Ratna B. Chinnam, and Chandan K. Reddy. 2016. Survival Analysis Based Framework for Early Prediction of Student Dropouts. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (CIKM ’16) . ACM, New York, NY, USA, 903–912. https://doi.org/10.1145/2983323.2983351 · doi ↗
3Barbieri et al . (2016) Nicola Barbieri, Fabrizio Silvestri, and Mounia Lalmas. 2016. Improving post-click user engagement on native ads via survival analysis. In Proceedings of the 25th International Conference on World Wide Web . International World Wide Web Conferences Steering Committee, 761–770.
4Bonfrer and Drèze (2009) André Bonfrer and Xavier Drèze. 2009. Real-time evaluation of e-mail campaign performance. Marketing Science 28, 2 (2009), 251–263.
5Branders et al . (2015) Samuel Branders, Roberto D’Ambrosio, and Pierre Dupont. 2015. A mixture Cox-Logistic model for feature selection from survival and classification data. ar Xiv preprint ar Xiv:1502.01493 (2015).
6Burke et al . (1997) Harry B Burke, Philip H Goodman, David B Rosen, Donald E Henson, John N Weinstein, Frank E Harrell, Jeffrey R Marks, David P Winchester, and David G Bostwick. 1997. Artificial neural networks improve the accuracy of cancer survival prediction. Cancer 79, 4 (1997), 857–862.
7Chaffey (2009) D Chaffey. 2009. Mint.com used Strong Mail Influencer to create this viral program. http://www.strongmail.com/pdf/sm_casestudy_mint.pdf . (2009).
8Chakraborty et al . (2014) Sunandan Chakraborty, Filip Radlinski, Milad Shokouhi, and Paul Baecke. 2014. On Correlation of Absence Time and Search Effectiveness. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval (SIGIR ’14) . ACM, New York, NY, USA, 1163–1166. https://doi.org/10.1145/2600428.2609535 · doi ↗