The Impact of Multidimensional Warning Messages on Payment Security Behavior Across Different Scenarios
Siyu Fan, Dongyu Liu, Te Ran, Yawen Guo, Haibo Yang

TL;DR
This study explores how different warning messages in mobile payments affect user behavior and decision-making, especially in high-risk scenarios.
Contribution
The study introduces a multidimensional approach to warning message design, using eye-tracking to reveal how color and message type influence user caution.
Findings
Warning messages significantly increase reaction times, indicating more cautious decision-making.
Red warnings and imperative semantics lead to higher transfer rejection rates, especially in high-risk contexts.
Eye-tracking reveals attentional mechanisms influenced by warning design features.
Abstract
To ensure the security of mobile payments, anti-fraud warning messages serve as a critical defensive interface between users and potential risks. The effectiveness of their design directly influences users’ risk perceptions and security-related behaviors. The present study employed eye-tracking technology to examine the effectiveness of warning messages in mobile payment transfer scenarios and the impact of specific warning design features on user decision-making. Experiment 1 utilized a 2 (warning message: present vs. absent) × 3 (potential risk level: high, medium, low) within-subject design to test the fundamental role of warning message presence. Results indicated that the presence of warning messages significantly prolonged participants’ reaction times when selecting the transfer option, suggesting a more cautious decision-making process. Building on Experiment 1, Experiment 2…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5- —Tianjin Municipal Training Program of Innovation and Entrepreneurship for Undergraduates
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSafety Warnings and Signage · Behavioral Health and Interventions · Media Influence and Health
1. Introduction
1.1. The Role of Warning Messages in Mobile Payment Security and the Challenge of Habituation
The widespread adoption of mobile payments has integrated payment behaviors into daily life, but it also introduces significant information security risks, such as data theft, tampering, identity fraud, and transaction repudiation (Qiu & Yang, 2021; Yang, 2024). Within user interfaces, warning messages serve as a critical “defensive boundary,” helping users recognize threats promptly and adopt secure behaviors. Warning design is therefore essential to improving risk perception and security compliance (Ruddro & Mohna, 2023).
Warnings are notifications presented in hazardous contexts to alert users and mitigate risks (Rogers et al., 2000; Wogalter et al., 2021). Effective warnings typically feature a signal word, hazard description, consequences, and instructions, while following principles of conspicuity, legibility, comprehensibility, and motivational impact (Wogalter, 2018; Heaps & Henley, 1999).
In mobile payment scenarios, users make rapid risk assessments under time pressure to minimize transaction costs (Chang et al., 2024; Chen et al., 2020). Anti-fraud warnings act as proactive safeguards by interrupting automatic processing, prompting deliberate risk evaluation, and improving decision security (Wang et al., 2022; Buono et al., 2023).
Despite their established effectiveness in HCI research, warning habituation—reduced responsiveness to repeated warnings—frequently occurs in familiar, efficiency-oriented payment tasks (Anderson et al., 2016). Neuroimaging evidence shows attenuated neural responses after minimal exposures (Anderson et al., 2015; Kirwan et al., 2020), while longitudinal studies confirm declining compliance over time in routine contexts (Vance et al., 2018). This largely unconscious process can generalize to similar cues, causing users to ignore even critical warnings.
Consequently, before examining more complex design factors, it is essential to first establish the fundamental behavioral effect of warning presence versus absence in a controlled experimental setting.
1.2. Potential Risk Level and Warning Design Features
User risk decision-making is shaped by multiple interacting factors, with the magnitude of potential risk serving as a key contextual variable (Sitkin & Weingart, 1995). Loss aversion is not uniform across monetary magnitudes; in low-value contexts, sensitivity to losses may weaken or reverse (Mukherjee et al., 2017; Walasek & Stewart, 2019; Zeif & Yechiam, 2022). This raises a key question: does warning message effectiveness vary with the magnitude of potential loss? Specifically, do warnings primarily function as an “alerting” mechanism in low-risk scenarios, or as a “confirmatory” signal that reinforces caution in high-risk contexts?
Building on the established effectiveness of warnings, a practical question is how to optimize their design. Laughery (2006) stressed that effective warnings must capture attention and deliver clear, comprehensible information to support informed decisions. Hancock et al. (2020) identified two core determinants of warning effectiveness: semantic clarity (e.g., descriptive wording, explicit threat severity) and visual salience (e.g., color, placement).
The impact of warning semantic style on behavior is well documented but highly context- and user-dependent. For instance, Min (2020) found that imperative language can trigger varying emotional responses by age group, with older adults more likely to react negatively to its authoritative tone. Contextual moderation is evident: imperative warnings elicit stronger behavioral responses than informative ones in collision-avoidance tasks (Chai et al., 2022) and in fraud-interception scenarios, where direct, compelling language guides rapid action (Wang et al., 2022). In contrast, informative warnings, which explain risks, better support long-term safety awareness in educational contexts. Overall, no single style is universally superior; effectiveness is moderated by situational demands and other factors (Zhang et al., 2019; Reeder et al., 2018). Thus, warning style should align with the specific practical requirements of the context.
Color is a critical visual element in warning design. Extensive research confirms red’s effectiveness in conveying danger (Pravossoudovitch et al., 2014; Zielinska et al., 2017). ERP studies show that different background colors trigger distinct neural responses, reflecting automatic activation of associations with conventional warning colors (Yuan et al., 2021). While red and green are often contrasted, this approach has limitations: superior red performance may reflect either red’s alerting properties or green’s safety connotations reducing urgency (Bouhassoun et al., 2023; Du & Yang, 2024; Or & Wang, 2014). To isolate red’s effect, a neutral baseline color lacking strong danger or safety associations is preferable.
Eye tracking provides a precise, objective measure of attention to warnings, capturing unconscious visual processing mechanisms and subtle differences often missed by self-reports (Clay et al., 2019; Pham et al., 2018; Pastel et al., 2023). Prior work has shown dissociation between attentional metrics and behavioral intentions (Borys & Plechawska-Wójcik, 2017; Pham et al., 2018; King et al., 2021; Shi et al., 2022). In Experiment 2 of this study, eye tracking was used to record visual attention allocation during mobile payment decisions, offering direct insight into the mechanisms by which warning design features influence processing and outcomes.
1.3. The Present Study and Hypotheses
In summary, this study was designed to systematically investigate the effectiveness of anti-fraud warning messages in mobile payment contexts and to identify optimal design features capable of mitigating warning habituation and promoting security-related behaviors. Experiment 1 first established the fundamental effect of warning presence (vs. absence) on decision-making and examined whether this effect is moderated by the level of potential risk (high, medium, low). It was hypothesized that (H1a) the presence of a warning would result in longer reaction times, reflecting the disruption of automated processing, and that (H1b) the interaction between warning presence and risk level would be significant, with the most pronounced effects on reaction times and rejection rates observed under medium- and high-risk conditions.
Building on Experiment 1, Experiment 2 extended the investigation by examining the independent and interactive effects of warning color (red vs. blue) and semantic type (imperative vs. informative). Based on prior evidence regarding the alerting properties of red (Pravossoudovitch et al., 2014) and the persuasive advantage of imperative language in time-sensitive contexts (Chai et al., 2022), it was hypothesized that (H2a) red warnings and (H2b) imperative semantics would each increase transfer rejection rates, and (H2c) these effects would be most pronounced in high- and medium-risk scenarios. Additionally, it was predicted that (H2d) red and imperative warnings would capture greater visual attention, as indexed particularly in higher-risk contexts. Together, these steps provide empirical evidence for enhancing users’ risk perception and anti-fraud decision-making in mobile payment contexts.
2. Experiment 1
2.1. Method
2.1.1. Participants
Participants were university students in China. Participants were recruited via campus posters and social media platforms (e.g., WeChat 8.0.60). Interested students contacted the researcher voluntarily, resulting in the selection of 42 university students. Their ages ranged from 17 to 30 years, with 21 male and 21 female participants. All participants had experience using smartphones and were frequent users of mobile payment applications (e.g., Alipay 10.7.30, WeChat Pay 8.0.60). To avoid potential confounding effects on risk perception, none of the participants reported prior experience with stock investment. All had normal or corrected-to-normal vision, with no color blindness or other visual impairments. Participation was voluntary, and each participant received compensation upon completion of the experiment. The distribution of demographic characteristics is presented in Table 1.
2.1.2. Materials
To simulate the mobile payment transfer process as closely as possible, the experimental transfer interface included fields for the recipient, the transfer amount, and a payment button. To minimize the influence of the recipient on users’ risk assessment, all elements except for the warning message and the magnitude of the transfer amount remained consistent across trials (see Figure 1). To prevent habituation and automatic dismissal of the warnings, four distinct warning prompts were designed. These prompts featured slight variations in wording while maintaining semantic equivalence and similar length.
The level of potential risk was operationalized by the size of the transfer amount, categorized as large amount (500–1000), medium amount (21–499), or small amount (0–20). After the experiment, participants were asked to rate these three transfer amounts on a 5-point scale. Repeated measures ANOVA on these ratings revealed a significant main effect of amount size (F (2, 86) = 103.52, p < 0.001, η^2^ = 0.60). Post hoc comparisons using Bonferroni correction showed that ratings for small amounts (M = 1.29, SD = 0.92) were significantly lower than those for medium amounts (M = 2.57, SD = 0.76; p < 0.001), which in turn were significantly lower than those for large amounts (M = 4.00, SD = 1.02; p < 0.001). Therefore, the three transfer amounts successfully represented distinct levels of perceived potential risk, confirming the effectiveness of the experimental manipulation.
Consequently, 15 unique formal stimulus materials were prepared for each, yielding 90 formal transfer interface screens. The experiment also included 6 practice trials (using similar transfer interfaces), bringing the total number of transfer interface screens to 96. Together with one transfer success screen and one transfer failure screen, the experiment comprised a total of 98 screens.
A separate group of 42 undergraduate students was recruited to rate the experimental materials for clarity and representativeness. Clarity refers to the extent to which the stimulus material was perceived as instantly and unambiguously recognizable as depicting a “transfer” or “payment” operation, with core transactional elements (e.g., recipient, amount, action buttons) being clearly identifiable (1 = very unclear, 7 = very clear). Representativeness indicates the degree to which the stimulus material was judged to typify a genuine, realistic payment confirmation interface encountered in daily online transactions (1 = not at all representative, 7 = completely representative). The pictures showed no significant differences in either clarity (t (41) = 0.37, p > 0.050) or representativeness (t (41) = −0.36, p > 0.050).
2.1.3. Apparatus
The experimental stimuli were presented on a 14.1-inch screen with a refresh rate of 60 Hz and a resolution of 1920 × 1200 pixels. A program written in Python 3.13.7 was used to control the presentation timing of the stimuli and to collect participants’ response times and key-press data.
2.1.4. Experimental Design
We employed a 2 × 3 within-subject design, with warning message (present vs. absent) and potential risk level (high, medium, low) as the two factors. All participants completed every experimental condition. The decision to employ a within-subject design was primarily guided by two methodological considerations. First, to control individual differences, this design accounts for variation in risk perception and payment habits by allowing each participant to serve as their own control, thereby enhancing statistical power and reducing error variance unrelated to the experimental manipulation. Second, given the focus on how individuals adjust their decision-making processes across different warning and risk scenarios, the within-subject design enables direct comparison of within-participant behavioral changes.
2.1.5. Procedure
The experiment was conducted individually in a quiet behavioral laboratory. Before the formal experiment began, the experimenter presented instructions to ensure participants understood the task. This was followed by a practice phase. Participants viewed the information on the transfer interface and made a decision based on their judgment, pressing the ‘F’ key to indicate willingness to proceed with the payment or the ‘J’ key to refuse payment, The experimental procedure is illustrated in the flowchart shown in Figure 2. Participants completed 6 practice trials to familiarize themselves with the procedure. After confirming their understanding, they proceeded with the main experiment. All trials were presented in a randomized order, and the entire session lasted approximately 8 min. Upon completion, each participant filled out a questionnaire related to the experiment.
2.1.6. Data Cleaning and Statistical Analysis
Data was cleaned by excluding trials that met either of the following criteria: (1) an incorrect key-press response or (2) a reaction time that fell outside ±3 standard deviations from the mean. Repeated-measures analyses of variance (ANOVAs) and binary logistic regression analyses were subsequently conducted using SPSS (Version 26.0).
2.2. Results
2.2.1. Payment Decisions
Binary logistic regression analysis indicated a significant main effect of potential risk level (Wald χ^2^ (2) = 365.196, p < 0.001). Relative to the high potential risk level, both the low risk level (B = −2.90, SE = 0.16, p < 0.001) and the medium risk level (B = −1.43, SE = 0.16, p < 0.001) were associated with a significantly lower probability of payment rejection. In contrast, the main effect of warning message presence was not significant (Wald χ^2^ = 0.022, p = 0.882), nor was the interaction between warning presence and potential risk level (Wald χ^2^(2) = 1.66, p = 0.436). Detailed results are presented in Figure 3 and Table 2.
2.2.2. Reaction Time
The repeated-measures ANOVA revealed a significant main effect of warning message (F (1, 612) = 130.38, p < 0.001). Reaction times were significantly longer when warnings were present (M = 2407 ms) than when they were absent (M = 1662 ms). A significant main effect of potential risk level was also found (F (2, 1185) = 10.350, p < 0.001). Post hoc comparisons indicated that reaction times under medium risk (M = 2174 s) were significantly longer than under both high (M = 1909 ms) and low (M = 2020 ms) risk levels.
Furthermore, a significant interaction was observed between warning presence and potential risk level (F (2, 1185) = 7.610, p = 0.001). Simple-effects analysis showed that in the warning present condition, the effect of potential risk level was significant (p < 0.001). Here, reaction times at the medium risk level (M = 2671 ms) were significantly longer than at both the low (M = 2374 ms) and high (M = 2176 ms) risk levels. In contrast, under the warning absent condition, reaction times did not differ significantly across the three risk levels (range: 1640–1680 ms; p > 0.050). Detailed results are presented in Table 3 and Table 4.
3. Experiment 2
3.1. Method
3.1.1. Participants
Forty-seven Chinese undergraduate students (21 males and 26 females) were recruited using the same sampling method and inclusion criteria as in Experiment 1. All participants received compensation for completion of the experiment. The distribution of demographic information is presented in Table 5.
3.1.2. Materials
As shown in Figure 4, the experimental materials in Experiment 2 were identical to those used in Experiment 1, except for the color and semantic style of the warning messages. Based on the theoretical considerations outlined in the introduction regarding the need to avoid the confounding ‘safety’ semantics of green, blue was selected as the contrast color to red. A separate group of 35 undergraduate students was recruited to rate these materials for clarity and representativeness. No significant differences were found among the stimulus sets in either clarity (t (34) = 0.825, p > 0.050) or representativeness (t (34) = 0.906, p > 0.050).
After the experiment, participants were asked to rate the perceived directive strength of the two semantic styles on a 5-point scale. A repeated-measures ANOVA revealed a significant main effect of semantic style (F (1, 46) = 16.1, p < 0.001, η^2^ = 0.26). Ratings for the informative style (M = 3.4, SD = 0.9) were significantly lower than those for the imperative style (M = 4.1, SD = 1.01), confirming the effectiveness of the manipulation.
3.1.3. Apparatus
To further investigate the implicit cognitive decision-making process, Experiment 2 employed an EyeLink 1000 Plus desktop-mounted eye tracker (SR Research Ltd., Ottawa, ON, Canada). The experimental stimuli were presented on a participant monitor with a resolution of 1920 × 1080 pixels and controlled by SR Research Experiment Builder software (version 2.5.1). The eye tracker operated at a sampling rate of 1000 Hz, ensuring high-resolution gaze tracking. Participants’ heads were stabilized using a chin rest to maintain consistency, and they were seated approximately 750 mm from the screen throughout the session. The system recorded participants’ gaze position, saccades, fixations, pupil size, and blink frequency. All participant data, including reaction times, key presses, and eye movement metrics, were automatically recorded. Subsequent analysis and extraction of data from specific screen regions were performed using Data Viewer software (Version 4.4.1). For this experiment, the area where the warning message was presented was defined as the Area of Interest (AOI).
3.1.4. Experimental Design
A 2 × 2 × 3 within-subject design was used, with the factors being warning color (red vs. blue), warning semantic type (imperative vs. informative), and potential risk level (high, medium, low). Each participant completed all combinations of these conditions.
3.1.5. Procedure
The experiment was conducted individually in a quiet eye-tracking laboratory. A nine-point calibration was performed at the beginning of each session. A fixation cross was displayed at the center of the screen, which participants were instructed to fixate on. Drift correction was also conducted, with a deviation of less than 1 degree considered acceptable. Subsequently, the experimenter presented the instructions to ensure participants fully understood the task, followed by a practice phase. By viewing the information on the transfer interface, participants made decisions based on their judgment, pressing the ‘F’ key to indicate willingness to proceed with the payment or the ‘J’ key to refuse payment. The experimental procedure is illustrated in Figure 2. Each trial began with the presentation of a fixation cross. Following successful calibration, the transfer interface appeared on the screen. Participants completed 10 practice trials to familiarize themselves with the procedure. Upon confirming their understanding, they proceeded to the main experiment. All trials were presented in a randomized order, with the entire session lasting approximately 20 min. After completing the experiment, each participant filled out a questionnaire.
3.1.6. Data Cleaning and Statistical Analysis
The data were first cleaned by excluding trials for any of the following reasons: (1) an incorrect key-press response, (2) a value for reaction time, first fixation duration or total reading time falling beyond ±3 standard deviations from the mean. Repeated-measures analyses of variance (ANOVAs) and binary logistic regression analyses were subsequently conducted using SPSS (Version 26.0).
3.2. Results
3.2.1. Payment Decisions
Binary logistic regression analysis revealed a significant main effect of potential risk level (Wald χ^2^ (2) = 433.199, p < 0.001). Relative to the high-risk level, both the medium-risk levels (B = −2.372, SE = 0.113, p < 0.001) and low-risk levels (B = −0.754, SE = 0.100, p < 0.001) were associated with a significantly lower probability of payment rejection.
A significant main effect of warning color was also obtained (Wald χ^2^ (1) = 107.559, p < 0.001). Red warnings led to a significantly higher probability of rejection compared to blue warnings (B = 1.171, SE = 0.113). Furthermore, the main effect of warning se-mantic type was significant (Wald χ^2^ (1) = 11.845, p = 0.001), with imperative warnings producing a higher probability of rejection than informative warnings (B = 0.356, SE = 0.103).
A significant interaction emerged between potential risk level and warning semantic type (Wald χ^2^ (2) = 6.803, p = 0.033). However, follow-up simple effects analysis indicated that the advantage of imperative over informative warnings did not reach statistical significance at either the low-risk (p = 0.239) or medium-risk (p = 0.236) levels. Simultaneously, the interaction between potential risk level and warning color approached marginal significance (Wald χ^2^ (2) = 5.188, p = 0.075). Simple-effect analysis indicated that within the medium-risk level, red warnings were associated with a higher probability of rejection than blue warnings (B = −0.288, SE = 0.113, p = 0.028). No other interaction effects reached statistical significance. Detailed results are presented in Figure 5 and Table 6.
3.2.2. Reaction Time
A repeated-measures analysis revealed a significant main effect of potential risk level (F (2, 1249) = 11.763, p < 0.001). Post hoc comparisons indicated that reaction times at the medium risk level (M = 2322 ms) were significantly longer than those at both the low (M = 2145 ms, p < 0.001) and high (M = 2154 ms, p < 0.001) risk levels. No significant difference was observed between the low and high-risk levels. The main effects of warning semantic type and warning color were not significant.
A significant interaction was found between potential risk level and warning color (F (2, 1368) = 10.827, p < 0.001). Simple-effects analysis showed that for blue warnings, reaction times under low risk (M = 2078 ms) were significantly shorter than under both high (M = 2238 ms, p < 0.001) and medium risk (M = 2334 ms, p = 0.001), with no difference between high and medium risk levels (p = 0.100). In contrast, for red warnings, reaction times under high risk (M = 2071 ms) were significantly shorter than under both low (M = 2213 ms, p = 0.004) and medium risk (M = 2310 ms, p < 0.001), with no significant difference between low and medium risk levels (p = 0.100).
Furthermore, a significant interaction emerged between warning color and semantic type (F (1, 684) = 4.749, p = 0.030). For informative warnings, the difference between blue and red warnings was not significant (p = 0.340). However, for imperative warnings, reaction times were significantly shorter for red warnings (M = 2163 ms) than for blue warnings (M = 2233 ms, p = 0.040). Detailed results are presented in Table 7 and Table 8.
3.2.3. Eye-Tracking Metrics
Analysis of first fixation duration revealed a significant main effect of potential risk level (F (2, 1326) = 3.565, p = 0.028). Post hoc comparisons showed that first fixations occurred significantly faster under high risk (M = 170 ms) than under low risk (M = 177 ms, p = 0.016) but did not differ significantly from medium risk (M = 175 ms, p = 0.100). A significant main effect of semantic type was also observed (F (1, 684) = 5.390, p = 0.021), with imperative warnings eliciting longer first fixations (M = 177 ms) than informative warnings (M = 172 ms). The main effect of warning color was not significant (F (1, 684) = 2.178, p = 0.140), and no interaction effects reached significance. Detailed results are presented in Table 9 and Table 10.
Analysis of total gaze duration showed a significant main effect of potential risk level (F (2, 1241) = 10.760, p < 0.001). Post hoc comparisons revealed that total gaze duration under the medium risk level (M = 1232 ms) was significantly longer than under both the low-risk (M = 1144 ms, p = 0.015) and high-risk levels (M = 1106 ms, p < 0.001). The main effects of warning semantic type and warning color were not significant. However, a significant interaction emerged between potential risk level and warning color (F (2, 1368) = 4.778, p = 0.009). Simple-effects analysis indicated that for blue warnings, total gaze duration was significantly longer under medium risk (M = 1226 ms) than under both low (M = 1110 ms, p = 0.001) and high risk (M = 1140 ms, p = 0.027), with no significant difference between low and high-risk levels. In contrast, for red warnings, total gaze duration was significantly shorter under high risk (M = 1073 ms) than under both low (M = 1179 ms, p = 0.001) and medium risk (M = 1237 ms, p < 0.001), with no significant difference between low and medium risk levels. Detailed results are presented in Table 11 and Table 12.
4. Discussion
4.1. Summary of Key Findings
Through two sequential lab experiments, this study examined the effectiveness of anti-fraud warning messages in mobile payment contexts and identified design features that mitigate habituation while promoting secure behaviors. Experiment 1 established the basic effect of warning presence versus absence. Experiment 2 used eye tracking to investigate the independent and interactive effects of warning color (red vs. blue) and semantic type (imperative vs. informative), along with underlying attentional mechanisms, while highlighting the moderating role of potential risk level. The findings emphasize the context-dependent nature of warning effectiveness and offer empirical guidance for developing dynamic, adaptive anti-fraud warning systems on mobile payment platforms.
Experiment 1 confirmed the core role of warning presence. Supporting H1, warnings significantly lengthened reaction times, disrupting automated, habitual payment processing and shifting users toward deliberate evaluation (Buono et al., 2023). This suggests well-designed warnings can retain effectiveness despite some habituation (Bravo-Lillo et al., 2013). The effect was moderated by risk level: reaction time prolongation was greatest under medium risk, weaker under low and high risk. This nonlinear pattern aligns with decision conflict literature, where intermediate attribute values or preferences produce the highest cognitive conflict and hesitation (Evans et al., 2015; Fischer et al., 2000).
Although warnings extended decision time, they did not significantly increase overall payment rejection rates overall, consistent with warning habituation in efficiency-oriented tasks (Amran et al., 2018; Vance et al., 2018). These results provide a foundation for testing specific design enhancements: while presence interrupts automatic processing, greater visual and semantic salience is required to boost compliance.
Experiment 2 built on these findings, demonstrating optimization through design features and providing objective attentional evidence via eye tracking. Supporting H2, both red warnings and imperative semantics independently increased rejection probability, with strongest effects under medium- and high-risk conditions. Red’s advantage was most pronounced at medium risk, consistent with its role as a rapid danger signal that heightens perceived urgency and threat (Pravossoudovitch et al., 2014; Yuan et al., 2021; Zielinska et al., 2017). Imperative warnings outperformed informative ones, aligning with evidence that direct, commanding language elicits stronger responses in time-sensitive fraud contexts (Chai et al., 2022; Wang et al., 2022).
Eye-tracking metrics revealed nuanced mechanisms: imperative warnings produced shorter time to first fixation, indicating faster automatic attentional capture. For total gaze duration, a color × risk interaction emerged—under high risk, red warnings elicited shorter sustained attention (efficient threat confirmation and quick rejection; Du & Yang, 2024); under medium risk, blue warnings required longer deliberative processing. These patterns extend prior eye-tracking work on warnings (Pham et al., 2018; Shi et al., 2022) and suggest a dissociation between early capture and later behavioral outcomes in mobile payments.
The red and imperative combination showed the strongest potential for accelerating secure decisions. This supports adaptive warning design that aligns with objective risk indicators (e.g., transaction amount) to optimize effectiveness across risk levels (Mukherjee et al., 2017).
4.2. Significance and Implications
Previous research has predominantly employed green as the control color. The present study selected blue instead to avoid the potential confound introduced by the “safety” semantics associated with green, thereby enabling a cleaner isolation of the alerting effect attributable to red as a conventional warning color. This provides more rigorous evidence for understanding the underlying mechanism of warning color efficacy (Bouhassoun et al., 2023). Furthermore, this study successfully adapted the classic framework of warning effectiveness (Rogers et al., 2000; Wogalter et al., 2021; Hancock et al., 2020) to the unique digital context of mobile payment. This addresses a gap in HCI research concerning the mechanisms of warning habituation in highly time-compressed decision tasks. The research paradigm adopted a sequential logic. Experiment 1 first established the foundational effect of warning presence versus absence, ensuring that the basic effect was not overlooked by the immediate manipulation of complex variables. Building upon this, Experiment 2 introduced multi-factor interactions and integrated eye-tracking. The systematic application of eye-tracking metrics innovatively captured early, automatic stages of warning processing (e.g., longer time to first fixation for imperative semantics and shorter total gaze duration for red under high risk). This approach moves beyond the limitations of traditional behavioral and self-report methods by revealing the dissociation between subjective evaluation and objective attention (Shi et al., 2022), thereby addressing a common shortcoming in prior studies that often focused on single variables or subjective reports.
This study offers concrete, actionable recommendations for optimizing anti-fraud interface design in mobile payment platforms. In the context of widespread mobile payment adoption and the high prevalence of telecom fraud (Chang et al., 2024; Chen et al., 2020), platforms should prioritize the use of red warnings paired with imperative semantics to maximize attentional capture and compliance with safe behavior. Furthermore, the development of risk-adaptive warning systems—which dynamically adjust salience and content based on transaction context—can significantly reduce users’ susceptibility to fraud and mitigate financial losses for platforms. These findings can also be extended to broader digital-risk scenarios, such as browser security warnings and phishing email interception, thereby contributing to the enhancement of public risk awareness in increasingly digitalized environments.
4.3. Limitations and Future Research
The study was conducted in a laboratory setting using static screenshots of transfer interfaces, rather than simulating the operational workflow of an actual mobile payment application. As a result, the motivational involvement of participants and the real financial consequences were reduced compared to genuine transactional contexts. Future research should shift toward testing within real application environments or employ high-fidelity simulations that incorporate real monetary incentives or loss penalties to improve ecological validity. Second, the participant sample was limited to undergraduate students from university, which restricted the age range represented. The risk perception and mobile payment habits of this group may differ considerably from those of other demographics, such as middle-aged or older adults or professional investors. Subsequent studies should therefore expand recruitment to broader population segments, with particular attention to groups that are vulnerable to telecom fraud. However, despite the implementation of procedural controls such as trial randomization, within-subject designs remain susceptible to potential confounding factors, including demand characteristics and fatigue effects. Future research could adopt a mixed experimental design to validate the robustness of the present findings. Furthermore, future work could integrate physiological and neuroscientific measures—such as event-related potentials (ERPs), functional magnetic resonance imaging (fMRI), or heart rate monitoring—to obtain deeper insight into the neural mechanisms underlying warning processing. Such approaches would support a more nuanced understanding of how warnings are processed and how their effectiveness can be enhanced, ultimately enabling more precise and personalized warning design.
5. Conclusions
In mobile payment contexts, effective anti-fraud warning design transcends simple information delivery and requires a holistic integration of visual and semantic features. Specifically, well-designed warnings can disrupt users’ automated processing and promote more deliberate, cautious decision-making. Compared to blue warnings, red warnings lead to higher security behavior compliance; similarly, imperative semantics outperform informative semantics in guiding users toward safe actions. The effectiveness of warning color is moderated by risk level, with red particularly enhancing users’ perception of and response to high-risk situations. Eye-tracking measures further provide objective evidence of the underlying attentional mechanisms. Together, these findings offer empirical support and practical design guidance for mobile payment platforms to build more efficient and precise risk–intervention interfaces.
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Amran A. Zaaba Z. F. Mahinderjit Singh M. K. Habituation effects in computer security warning Information Security Journal: A Global Perspective 201827419220410.1080/19393555.2018.1505008 · doi ↗
- 2Anderson B. B. Jenkins J. L. Vance A. Kirwan C. B. Eargle D. Your memory is working against you: How eye tracking and memory explain habituation to security warnings Decision Support Systems 20169231310.1016/j.dss.2016.09.010 · doi ↗
- 3Anderson B. B. Kirwan C. B. Jenkins J. L. Eargle D. Howard S. Vance A. How polymorphic warnings reduce habituation in the brain: Insights from an f MRI study Proceedings of the 33rd annual ACM conference on human factors in computing systems ACM 20152883289210.1145/2702123.2702322 · doi ↗
- 4Borys M. Plechawska-Wójcik M. Eye-tracking metrics in perception and visual attention research EJMT 20173161123
- 5Bouhassoun S. Naveau M. Delcroix N. Poirel N. Approach in green, avoid in red? Examining interindividual variabilities and personal color preferences through continuous measures of specific meaning associations Psychological Research 20238741232124210.1007/s 00426-022-01732-536071301 · doi ↗ · pubmed ↗
- 6Bravo-Lillo C. Komanduri S. Cranor L. F. Reeder R. W. Sleeper M. Downs J. Schechter S. Your attention please: Designing security-decision U Is to make genuine risks harder to ignore Proceedings of the ninth symposium on usable privacy and security ACM July 201311210.1145/2501604.2501610 · doi ↗
- 7Buono P. Desolda G. Greco F. Piccinno A. Let warnings interrupt the interaction and explain designing and evaluating phishing email warnings Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems ACM April 20231610.1145/3544549.3585802 · doi ↗
- 8Chai C. Zhou Z. Yin W. Hurwitz D. S. Zhang S. Evaluating the moderating effect of in-vehicle warning information on mental workload and collision avoidance performance Journal of Intelligent and Connected Vehicles 202252496210.1108/JICV-03-2021-0003 · doi ↗
