The Fraction of Broken Waves in Natural Surf Zones
Caio E. Stringari, Hannah E. Power

TL;DR
This study quantifies the fraction of broken waves (Qb) in natural surf zones using remote sensing, in-situ data, and machine learning, revealing high variability influenced by environmental factors and improving theoretical models.
Contribution
It introduces a novel quantification method for Qb across multiple beaches, linking it to environmental parameters and enhancing existing theoretical models.
Findings
Qb varies significantly with tidal and infragravity energy levels.
Existing models poorly predict Qb, but accuracy improves with Weibull distribution.
Qb correlates with beach morphodynamics and environmental forcing.
Abstract
This paper presents a novel quantification of the fraction of broken waves (Qb) in natural surf zones using data from seven wave-dominated Australian beaches. Qb is a critical, but rarely quantified, parameter for parametric surf zone energy dissipation models which are commonly used as coastal management tools. Here, Qb is quantified using a combination of remote sensing and in-situ data. These data and machine learning techniques enable quantification of Qb for a substantial dataset (>350,000 waves). The results show that Qb is a highly variable parameter with a high degree of inter- and intra-beach variability. Such variance could be explained (at least partially) by correlations between Qb and environmental parameters. Tidal variations drive changes in Qb of up to 70% for a given local water depth (h) on steep beaches, and increased infragravity energy levels decreased terminalâŠ
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16| Location | Date | Marker | |||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | Birubi Beach | 2017-07-05 | 9 | 0.330 | 7.843 | 0.589 | 15.719 | S | 0.011 | LBT | |
| 2 | Birubi Beach | 2017-07-06â | 9 | 0.378 | 12.310 | 0.641 | 12.188 | S | 0.007 | LBT | |
| 3 | Boomerang Beach | 2016-09-19â | 8 | 0.778 | 9.887 | 2.027 | 7.182 | W | 0.066 | RBB/TBR | |
| 4 | Boomerang Beach | 2016-09-20 | 5 | 0.685 | 8.104 | 1.275 | 7.889 | S | 0.085 | RBB/TBR | |
| 5 | Boomerang Beach | 2016-09-21 | 9 | 0.721 | 7.844 | 1.213 | 7.632 | SE | 0.048 | RBB/TBR | |
| 6 | Boomerang Beach | 2016-09-22 | 10 | 0.594 | 7.894 | 1.685 | 6.061 | WSW | 0.049 | RBB/TBR | |
| 7 | Frazer Beach | 2018-04-24â | 11 | 0.425 | 6.586 | 1.750 | 7.623 | SE | 0.035 | TBR | |
| 8 | Moreton Island | 2016-12-19 | 7 | 0.665 | 9.175 | 2.131 | 9.979 | SE | 0.069 | RTB | |
| 9 | Moreton Island | 2016-12-20â | 14 | 0.822 | 7.923 | 1.462 | 10.302 | SE | 0.069 | RTB | |
| 10 | One Mile Beach | 2014-08-04 | 13 | 0.669 | 8.868 | 1.457 | 8.825 | S | 0.045 | TBR/LTT | |
| 11 | One Mile Beach | 2014-08-05 | 12 | 0.645 | 8.311 | 1.188 | 9.461 | SE | 0.076 | TBR/LTT | |
| 12 | One Mile Beach | 2014-08-06 | 11 | 0.767 | 11.055 | 1.464 | 11.491 | S | 0.050 | TBR/LTT | |
| 13 | One Mile Beach | 2014-08-07â | 9 | 0.734 | 10.673 | 1.638 | 8.892 | S | 0.048 | TBR/LTT | |
| 14 | Seven Mile Beach | 2014-08-13â | 7 | 0.585 | 10.066 | 1.565 | 12.702 | SSE | 0.037 | D | |
| 15 | Seven Mile Beach | 2014-08-14 | 9 | 0.556 | 9.527 | 0.964 | 11.495 | SSE | 0.035 | D | |
| 16 | Seven Mile Beach | 2014-08-19 | 8 | 0.765 | 12.937 | 1.140 | 12.000 | SSE | 0.028 | D | |
| 17 | Seven Mile Beach | 2014-08-20 | 8 | 0.630 | 11.648 | 1.043 | 10.133 | SSE | 0.029 | D | |
| 18 | Werri Beach | 2014-08-15 | 9 | 0.959 | 10.089 | 1.519 | 11.652 | SE | 0.220 | LTT | |
| 19 | Werri Beach | 2014-08-16â | 8 | 0.956 | 9.278 | 1.557 | 12.610 | ESE | 0.114 | LTT |
| Symbol | Description | Formulation | Source |
|---|---|---|---|
| Crest water depth | Obtained from data | \citeAsvendsen2006 | |
| Leading trough water depth | Obtained from data | \citeAsvendsen2006 | |
| Following trough water depth | Obtained from data | \citeAsvendsen2006 | |
| Shape parameter A | \citeACowell1982 | ||
| Shape parameter B | \citeACowell1982 | ||
| Wave height | \citeAkomar1976beach | ||
| Wave period | \citeAkomar1976beach | ||
| Angular frequency | \citeAkomar1976beach | ||
| Wave number | \citeAkomar1976beach | ||
| Wavelength | \citeAkomar1976beach | ||
| Wave height to trough depth ratio | \citeAPower2010 | ||
| Wave steepness | \citeAsvendsen2006 | ||
| Vertical asymmetry | \citeAsvendsen2006 | ||
| Surface shape parameter | \citeAsvendsen2006 | ||
| Ursell Number | \citeAUrsell1953 | ||
| Skewness - third central moment of | \citeAholthuijsen2010 | ||
| Kurtosis - fourth central moment of | \citeAholthuijsen2010 |
| model | Parameters | N. par. | |||
|---|---|---|---|---|---|
| 1 | TG83 (=2.4, n=1) | , , | 3 | -13590 | 0 |
| 2 | TG83 (=2, n=2) | , | 2 | -9788 | 3801 |
| 3 | BJ78 | 1 | -8593 | 4996 | |
| 4 | TG83 (=2, n=4) | , | 2 | -6797 | 6792 |
| 5 | B98 (=2.4) | , | 2 | -6708 | 6882 |
| 6 | B98 (=2) | 1 | -6696 | 6894 |
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
The Fraction of Broken Waves in Natural Surf Zones
Abstract
This paper presents a novel quantification of the fraction of broken waves () in natural surf zones using data from seven wave-dominated Australian beaches. is a critical, but rarely quantified, parameter for parametric surf zone energy dissipation models which are commonly used as coastal management tools. Here, is quantified using a combination of remote sensing and in-situ data. These data and machine learning techniques enable quantification of Qb for a substantial dataset ( waves). The results show that is a highly variable parameter with a high degree of inter- and intra-beach variability. Such variance could be explained (at least partially) by correlations between and environmental parameters. Tidal variations drive changes in of up to 70% for a given local water depth () on steep beaches, and increased infragravity energy levels decreased terminal values of by about 20%. The links between and environmental forcing lead to the development of a correspondence between and the Australian beach morphodynamic model. is larger for a given normalized depth (, where is offshore wave height) for dissipative beaches than for intermediate beaches. Finally, when comparing data to existing models, three commonly used theoretical formulations for are observed to be poor predictors with errors of the order of 40%. Existing theoretical Qb models are shown to improve (revised errors of the order of 10%) if the Rayleigh probability distribution that describes the wave height is in these models is replaced by the Weibull distribution.
\drafttrue\journalname
JGR: Oceans
University of Newcastle, School of Environmental and Life Sciences, Newcastle, Australia
\correspondingauthor
C. E. [email protected]
{keypoints}
The fraction of broken waves is a highly variable parameter in natural surf zones.
There are links between the fraction of broken waves and surf zone processes such as infragravity waves and tides.
The probability distribution of broken wave heights is well described by a Weibull distribution.
1 Introduction
As waves approach shallow water, they undergo transformations where energy is dissipated via a plethora of phenomena, with wave breaking being the most significant. On gently sloping beaches, the surf zone is wide enough for most of the wave energy to be dissipated. In contrast, on steep beaches, the surf zone is narrow and the incoming energy is reflected with little or no breaking, or drives swash processes [Wright \BBA Short (\APACyear1984), Wright \BOthers. (\APACyear1982)]. The most common way of tracking the energy dissipation is through the cross-shore variation in wave height decay. Historically, wave heights in the surf zone have been modelled using probabilistic or parametric approaches [Baldock \BOthers. (\APACyear1998)]. Probabilistic models track wave height decay by propagating each component of a joint probability distribution of wave heights () and periods () shoreward. This approach frequently implies saturation, i.e., the wave height in shallow water is only a function of the local water depth () and of the wave height to water depth ratio, or breaker index, , such that [Thornton \BBA Guza (\APACyear1982), Mase \BBA Iwagaki (\APACyear1982), Dally \BBA Dean (\APACyear1986), Horikawa \BBA Kuo (\APACyear1966)]. Parametric models, on the other hand, use an energy balance between the incoming energy flux and the local energy dissipation to track wave height decay [Le Méhauté (\APACyear1962)]:
[TABLE]
in which is the incoming energy in the cross-shore direction (, positive onshore), is the wave group speed, and is the combined time-averaged energy dissipation due to wave breaking and bottom friction. \citeAThornton1983 showed that dissipation via bottom friction on sandy beaches is small and, therefore, can be neglected in the dissipation term. Thus, leaving to represent dissipation due to bottom induced wave breaking. The energy dissipation is then derived from a bore model [Le Méhauté (\APACyear1962), Battjes \BBA Janssen (\APACyear1978)]:
[TABLE]
where is the water density, is the wave peak frequency, is the gravitational acceleration, is a free parameter of order 1 (O(1)), and is the fraction of broken waves at any given point in the surf zone.
Several parametric models based on Equations 1 and 2 have been developed to describe wave energy dissipation and to predict wave heights in the surf zone. These models are computationally efficient, relatively simple to use, and are often used alongside other applications as coastal management tools [Baldock \BOthers. (\APACyear1998), Ruessink \BOthers. (\APACyear2003), Apotsos \BOthers. (\APACyear2008), Alsina \BBA Baldock (\APACyear2007), Janssen \BBA Battjes (\APACyear2007)]. The main difference between parametric models is how the dissipation term () is evaluated. From Equation 2, it follows that is a direct function of the fraction of broken waves () [Le MĂ©hautĂ© (\APACyear1962), Battjes \BBA Janssen (\APACyear1978)]. Although is a critical parameter for the model, very few authors have directly quantified , or compared it to models predictions using data from natural beaches. To the authorâs knowledge, only \citeAThornton1983 and, more recently, \citeACarini2015 have made attempts to do this.
Despite the vast literature on parametric wave modelling, there remain several unanswered questions. Firstly, no probability density function (PDF or , where is a random variable) to describe broken waves is known besides the approximations of \citeAThornton1983. Such approximations are not mathematically true PDFs and, as observed by \citeABaldock1998, may not generalise to all beach types and slopes. Secondly, the extent to which errors in affect the full dissipation in the models after they are individually optimised for specific beaches are currently unknown [Apotsos \BOthers. (\APACyear2008)]. Lastly, no links between and other surf zone processes have been clearly established. The primary objective of this paper is to address these three knowledge gaps.
In this paper, two novel techniques to obtain are presented. Firstly, by directly obtaining it from collocated video imagery and pressure transducer (PT) records, and secondly, by using state-of-the-art machine learning and artificial intelligence methods. Comparison of field data to the available analytical models of is conducted and novel links between and infragravity waves, tides, and beach morphodynamics are presented. This paper is organised as follows. Section 2 reviews the mathematical formulations of . Section 3 describes the data collection, data pre-processing, and presents direct and indirect methods to quantify . Section 4 presents the results of field observations, establishes links between and surf zone parameters, and provides a novel description of the PDF of broken wave heights. Finally, the results are discussed in Section 5, and conclusions are provided in Section 6.
2 Review of Theoretical Formulations of
There are three main formulations for estimating analytically, all of which rely on obtaining an expression derived from PDFs of the wave height (). The first approach, described in \citeABattjes1978, hereafter BJ78, uses a truncated Rayleigh PDF to approximate (Figure 1-a). In this formulation, all waves higher than a threshold wave height () are considered to be breaking such that:
[TABLE]
in which is the modal value that defines the Rayleigh for , is any given random wave height, and is the maximum wave height possible for a given sea-state. From this expression, it is possible to obtain an explicit equation for [Battjes \BBA Janssen (\APACyear1978)]:
[TABLE]
Equation 4 can be re-written as an implicit relation considering that is equivalent to :
[TABLE]
in which is the root-mean-square wave height, and can be obtained from \citeAMiche1934 formulation for the maximum wave height, or from any other compatible formulation. BJ78 observed that their model underestimated energy dissipation closer to the shore and adopted if , i.e., a saturated surf zone. This assumption has been shown to not fully account for the energy dissipation, particularly for steeper beach profiles [Baldock \BOthers. (\APACyear1998), Alsina \BBA Baldock (\APACyear2007), Janssen \BBA Battjes (\APACyear2007)]. In addition, surf has been shown to be unsaturated on multiple beaches [Power \BOthers. (\APACyear2010)]. Nonetheless, Equation 5 is frequently used in its spectral form [Eldeberky \BBA Battjes (\APACyear1996)] in wave forecasting models such as SWAN and WW3 [Holthuijsen (\APACyear2010), WW3DG (\APACyear2016)].
\citeA
Thornton1983, hereafter TG83, used a similar approach to model but instead of using a truncated Rayleigh PDF, they observed that a full Rayleigh PDF better described their field data:
[TABLE]
was then obtained by integrating empirical equations designed to describe the probability of broken waves (, dashed area in Figure 1-b)
[TABLE]
where
[TABLE]
and,
[TABLE]
in which is the wave height to water depth ratio, is the averaged local water depth, is a free parameter, and is a scaling factor that was used to give more weight to higher waves. Note that Equation 8 is not a true PDF because, if not bounded, it can result in . In their study, TG83 used, , and
[TABLE]
\citeA
Baldock1998, hereafter B98, followed TG83 and also used a full Rayleigh PDF to describe the wave height distribution in the surf zone. In their case, was normalised by such that:
[TABLE]
Given the observation in B98 that TG83âs empirical formulations for may not be universal for all surf zone conditions and beach slopes, B98 formulated so that it is obtained by integrating Equation 11 for all waves in which (hatched area in Figure 1-c):
[TABLE]
where is the limiting parameter , and can be obtained from any formulation for the maximum breaker height, for example as per \citeABattjes1985:
[TABLE]
where is the offshore wave steepness. This definition of is used hereafter a maximum breaker height, or a wave height to water depth ratio (). Note, however, that such definitions of and are not necessary equivalent [Power \BOthers. (\APACyear2010), Raubenheimer \BOthers. (\APACyear1996)]. This approach was only undertaken so that any errors this formation (Equation 13) may contain are shared between the models and do not influence in the comparisons between them. There are, nonetheless, other formulations for that can produce significantly different results for the energy dissipation when adapted into the formulations for (see Section 5.3) [Apotsos \BOthers. (\APACyear2008), Ruessink \BOthers. (\APACyear2003)]; however, investigating these is beyond the scope of this study.
2.1 Based on a Weibull PDF
A feature that all the formulations for share is the Rayleigh PDF from which is obtained. Several authors have proposed that either a full or a modified Weibull PDF is a better descriptor of , especially in shallower water depths [Mase (\APACyear1989), Hameed \BBA Baba (\APACyear1985), Battjes \BBA Groenendijk (\APACyear2000), Mendez \BOthers. (\APACyear2004), Power \BOthers. (\APACyear2016)]. For non-negative values of , is described by a Weibull PDF as:
[TABLE]
in which defines the shape of the distribution and can be any positive number, and is the scale parameter of the distribution. For shape parameter , the Weibull PDF reduces to the Rayleigh PDF. Following TG83, Equation 14 can be used to obtain in the same manner as in Equation 8:
[TABLE]
with the probability of broken waves being
[TABLE]
where
[TABLE]
Similarly, it is possible to modify B98 to use the Weibull PDF:
[TABLE]
Such that can be obtained by integrating Equation 18 for all waves greater than :
[TABLE]
Figure 2-a shows a graphical representation of the Weibull PDF for various values of , and how can be obtained from Equation 15 (Figure 2-b) and Equation 19 (Figure 2-c). The Weibull PDF with should automatically increase in TG83âs model if the exponent in the scaling factor is kept unchanged from the original values (note the orange dashed line in Figure 2-b). The major issue with the use of this alternative PDF is the inclusion of a new free parameter () that needs to be obtained from the available data. In this paper, the optimal value of found by \citeAPower2016 using an extensive natural surf zone dataset is used, and is set to 1 when TG83âs model is adapted to use the Weibull PDF.
3 Methods
3.1 Field Data Collection
PT data and video imagery of the nearshore were collected at seven different sandy micro-tidal, wave-dominated Australian beaches during 19 individual deployments (Table 1 and Figure 3). Each experiment consisted of deploying PTs attached to a chain that sat on the seabed and extended in a cross-shore orientation. This was combined with collecting video imagery of the surf zone from an elevated location (headland or house balcony) over an individual tidal cycle. The PTs were always deployed at the seabed level and covered cross-shore extents ranging from 30 to 120m (from the beach face) on surf zones of about the same width. Therefore, the vast majority of the collected data was from the outer and inner surf zones. The PTs used in this study were either INW PT2X (8 or 10) or RBR Solo (16) programmed to record at a minimum sampling rate of 8. The video cameras were consumer-grade Sony cameras attached to a surveying tripod (Sony HDR-XR200 for the 2014 experiments and Sony-HDR-CX240 for the remainder of the experiments). Video cameras were calibrated to compensate for lens distortions as per \citeAHolland1997.
In addition to the PT and video data collection, all the beaches were surveyed at low tide, and a minimum number of four ground control points (GCPs) and one beach profile were acquired. Beach profiles covered the foredune to the maximum depth allowed by environmental conditions. The beach slopes () were calculated from the surf-swash boundary to the seaward-most surveyed PT and should, therefore, be representative for the surf zone (see Figure 3). The spectral surf zone parameters in Table 1 were calculated following \citeAholthuijsen2010 and the offshore data were obtained from the New South Wales wave transformation toolbox [NSW Office of Environment and Heritage (\APACyear2018)] except for Moreton Island for which offshore data were obtained from the Brisbane wave-rider buoy [Science and Environment - Queensland Government (\APACyear2018)].
The beaches in this study covered the full range of morphodynamic states [Wright \BBA Short (\APACyear1984)] except for reflective beaches due to the absence of a surf zone in this beach type, which precludes obtaining meaningful data. The northern end of Seven Mile Beach (Gerroa, New South Wales) (Figure 3-f) represented the dissipative state, Birubi Beach (Figure 3-a) represented the alongshore bar and trough state(LBT), Boomerang Beach (Figure 3-b) and Moreton Island (Figure 3-d) represented the rhythmic bar and beach state (RBB), One Mile Beach (Figure 3-e), Frazer Beach (Figure 3-c) and Boomerang Beach represented the transverse bar and rip state (TBR) and, Werri Beach (Figure 3-g) and One Mile Beach (Figure 3-e) represented the low tide terrace state (LTT). Some beaches presented different morphodynamic states at the same time, e.g., sections of Boomerang Beach were characterised as RBB (19/09/2016) whereas others presented TBR (20/09/2016 onward) morphology. However, note that the profiles shown in Figure 3 may not fully illustrate the described classification because they are a two-dimensional representation of a three-dimensional system. The presented classification was based on Timex images [R\BPBIA. Holman \BBA Stanley (\APACyear2007)] of the beaches and additional information from \citeAshort1999beaches, short2007beaches (see Appendix 1).
3.2 Data Pre-processing
3.2.1 Wave-by-wave Analysis
PT data were processed following \citeAPower2010. For each PT, the data record was divided into 15 minutes intervals to ensure stationarity with respect to the tide, and checked visually to ensure that there were no dry periods, i.e., to ensure that the PT was not in the swash zone. Each of these time-series was re-sampled to 8 (if needed) and individual waves were extracted using a wave-by-wave algorithm that searches for local minima and maxima in the series. Although the method is similar in concept to \citeAPower2010, the implementations did not share code. The present algorithm has two free parameters: a wave height threshold and a searching window. For this study, the searching window was set to two times the sampling frequency () and the wave height threshold was 15% of the spectral significant wave height [Holthuijsen (\APACyear2010)]. Obtaining the wave height threshold was somewhat subjective and the present threshold was chosen after visual inspection of several 15-minute time-series. No frequency filters were applied to the data to preserve the infragravity wave signal that is usually lost when using the more common combination of filtering and the zero-crossings method (e.g., as done in \citeAHaller2009, Postacchini2014). Figure 4 shows an example of the results of the wave-by-wave analysis applied to Birubi Beach. Even under the noisy conditions of this deployment, the algorithm performed well (Figure 4-a).
3.2.2 Video Imagery processing
Video imagery data were processed following \citeAHolland1997. For each deployment, video data were down-sampled from to , and individual frames were extracted. These frames were then rectified using the algorithm provided by \citeAHoonhout2015. From each frame, a cross-shore array of pixels were extracted at the same location as the PT array and stacked in time, resulting in a timestack image [Aagaard \BBA Holm (\APACyear1989)]. The image coordinate system was translated and rotated so that it aligned with the beach profile and PT array (with the cross-shore coordinate oriented offshore). All timestacks were linearly interpolated with sub-pixel accuracy to a cross-shore resolution of . For each timestack, visible waves were tracked following \citeAStringari2019, resulting in a collection of tracked wave paths for each location. Figure 5 shows representative timestacks and examples of tracked waves for Boomerang Beach, Frazer Beach, and Moreton Island. Examples for the remaining locations can be found in \citeAStringari2019.
3.3 Quantifying the Fraction of Broken Waves
The collocated PT and video data allowed for the direct quantification of . This was done by aligning the tracked waves with the corresponding PT data in both space and time (Figure 6). The cross-shore spatial alignment was done directly from surveyed data, and the alignment in time was done by shifting the PT time-series by an optimal time delay. This delay was obtained via cross-spectral correlation between the pixel intensity time-series and the water surface elevation () at a given location (see \citeAStringari2019 for more details). Following the time-series alignment, each wave crest in the PT record was attempted to be matched to a tracked wave path. If there was a match within the wave path confidence interval, that wave was considered to be broken (vertical lines in Figure 6). All other waves were considered to be unbroken. was then calculated following \citeACarini2015 as the ratio between the number of waves classified as broken () and the total number of waves () in the same time interval:
[TABLE]
In the present study, one hour of collocated video and PT data were processed using this approach. As per \citeAStringari2019, data were processed in 5-minute batches. When possible, different tidal conditions were sampled to ensure a diverse dataset. This approached resulted in total number of 13,253 waves individually being classified as broken or unbroken. Some issues were observed when aligning the wave paths to waves in the PT record. Not all instances were perfectly aligned (e.g., waves 06, 23, and 26 in Figure 6) because the alignment algorithm is intrinsically non-linear, using the dynamic time warping (DTW) method to find the best peak matches [Karabiber (\APACyear2013), Vu \BBA Laukens (\APACyear2013), Hoffmann \BOthers. (\APACyear2012), Serra \BBA Arcos (\APACyear2014)]. For some deployments, not all PTs were surveyed nor were they deployed in a perfect cross-shore orientation which also compromised the data alignment. The method to obtain described above can become time-consuming because the definition of the parameters used to track the waves in the video record is not straightforward (see \citeAStringari2019 discussion); and, in some cases, the optimal time delay for the time-series synchronisation needed to be manually defined. The classification error using this approach compared to the manually quality-controlled dataset was of the order of 10%, depending on the timestack quality.
3.4 The Machine Learning Approach
In the previous section, a method for obtaining from the collocated pressure transducer and video imagery was outlined. On one hand, this deterministic approach has the advantage of giving an exact value for . On the other hand, it is labour-intensive and requires manual quality control in some cases. In this section, a novel approach is developed to use PT data alone to obtain values of . The goal is to obtain a mapping function that translates an input feature vector into a predicted label that best approximates the previously known label ([math] unbroken waves or for broken waves). The input feature vector is represented by a series of seventeen wave-by-wave parameters that are described in Table 2 and Figure 4-b. These parameters were chosen because they have been previously used in the literature to distinguish between broken and unbroken waves (e.g., \citeACowell1982, svendsen2006), or because they have historically been used to describe wave characteristics (e.g., \citeAUrsell1953).
The multi-layer perceptron (MLP) was chosen to act as the transference function . This class of supervised learning models has been widely used to perform image classification and natural language processing for the past three decades [Haykin (\APACyear1994)] but has only recently been applied to coastal engineering problems [Zanuttigh \BOthers. (\APACyear2013), James \BOthers. (\APACyear2018)]. The MLP model is organised as a series of fully-connected layers of neurons (Figure 7) in which each neuron in the hidden layers represents an activation:
[TABLE]
where is the activation of a neuron in hidden layer , is the ReLU (Rectified Linear Unit [Nair \BBA Hinton (\APACyear2010)]) activation function (Equation 22), is the weight connecting neuron in hidden layer to neuron in hidden layer , is the activation of neuron in hidden layer , and is the bias added to layer .
[TABLE]
When a set of training samples , , , is shown to the model, the MLP learns the optimal combination of weights and biases that minimises the binary cross-entropy (log-loss) cost function:
[TABLE]
where is the negative log-likelihood of the true labels (, broken wave) given the classifierâs probabilistic predictions (the activation value in the output layer). This optimisation step results in the transference function that best approximates averaged across all training samples. The learning step is accomplished via back-propagation [Rumelhart \BOthers. (\APACyear1986)] using the Adam formulation of the stochastic gradient descent (SGD) method [Kingma \BBA Ba (\APACyear2014)].
3.4.1 Implementation for surf zone data
In the present study, the training dataset consisted of the 13,253 individual waves from the unique locations described in Table 1 and was built using the results from the analysis describe in Section 3.3. Prior to the training step, each wave was visually verified to ensure that errors were not propagated into the learning algorithms (e.g., waves 6 and 23 in Figure 6). The input vector was, therefore, a x matrix and the output vector was a x array. Optimal parameters for the MLP were found using a three-fold cross-validation considering 80% of the full dataset as training samples and the remainder 20% as testing samples. The cross-validation resulted in an optimal number of three hidden layers with 512 neurons in each hidden layer. Increasing the number of hidden layers or the number of neurons per layer resulted either in no significant performance improvement or over-fitting. Other parameters required by the Adam algorithm were either kept unchanged from the defaults or also learnt via cross-validation. The numerical implementation was done in Keras [Chollet \BBA Others (\APACyear2015)] and has GPU (graphical processing unit) support. Because the MLP must be initialised from random weights and biases, a bootstrap procedure with 500 model runs was performed to account for the variability in the initialisation. The model run with performance closest to the median was chosen to perform further analyses (see next section for more details). This chosen pre-trained network and auxiliary programs to perform the wave-by-wave decomposition and prepare the and input arrays are freely available to the community at https://github.com/caiostringari/pywavelearn/deep-learning/.
3.4.2 MLP Verification and Validation
Different approaches were used to verify and validate the MLP results. In all subsequent analyses, only the test dataset is considered. The first approach used the score [Rijsbergen (\APACyear1979)] to evaluate each bootstrap run. This metric should be more robust than the standard classification score because it takes into account both precision (the number of correct positive results divided by the number of all positive results returned by the classifier) and the recall (the number of correct positive results divided by the number of all relevant samples). The median score was 86.79% for broken waves and 86.97% for unbroken waves (higher scores indicate better performing models).
To provide further evidence that the MLP is a valid surrogate for the direct method, a comparison between calculated from both methods was conducted (Figure 8-a). The results of this analysis showed a solid correspondence between both methods with and . To ensure that was no significant over-fitting, the training history was also recorded (Figure 6-b). These results strongly indicated that there was no over-fitting, and that the model generalised well for the test data. Note that even after 200 training epochs, the MLP was still slowly learning. Given these promising results, the median-performing MLP was used to classify the 333,732 individual waves measured in all experiments described in Table 1, totalling 3,639 unique 15-minute timeseries (or data runs) within all PTs . The results from this classification will be used in the analyses starting from Section 4.1.
To test the effect of inter-beach differences in the learning step, a second, independent, bootstrap procedure was conducted. For each run, one of the locations was left out of the training dataset, the MLP was trained, and then used to classify the data from the left-out location. The mean percentage error (MPE) between calculated using the true and the predicted wave labels was used as the evaluation metric (Figure 8-c). The highest errors were seen when Birubi Beach data was left out of the training step and data from the same location was classified. This was an indication that the model did not generalise as well for this location. For all other beaches, the averaged MPE was below 10% which is a good indicator that the MLP generalised well. Some counter-intuitive results were also observed, e.g., when Moreton Island was left out and data from same location was classified, the classification was better. Similar trends occurred for Boomerang Beach and Werri Beach. See Section 8 for a discussion of the possible causes for these results. Nonetheless, these results strongly indicate that the MLP can be used as a valid substitute for the collocated PT-video method presented in Section 3.3, with the clear advantage of being able to classify hundreds of thousands of waves in virtually no computational time.
3.4.3 Analysis of the Neural Network Structure
Neural networks are usually seen as black-boxes in which only node(s) of the output layer bear meaning to the classification problem in question. However, there has been a recent increase in research on how to detect which features are most important to the network and how these features relate to the final classification results (e.g., Googleâs deep dream experiment [Simonyan \BOthers. (\APACyear2014)]). Unfortunately, there is still no consensus on which techniques should be used to perform such a task. One way to tackle this problem is to use the concept of feature importance from tree-based models (e.g., decision trees) adapted to neural networks. In this study, the feature importance of the variables in the input layer was obtained by zeroing the contribution of each variable in the input vector , re-training the network, and classifying data in the test dataset. The feature importance measure was then obtained as the difference between the score from the median run (see Section 3.4.2) and the score obtained when each variable was zeroed. Due to the random initialisation of the MLP, this procedure was repeated 500 times for each variable to obtain statistically significant results.
The most important features were the shape parameters and [Cowell (\APACyear1982)] and the skewness of the surface elevation. These parameters combined accounted for 45% of the total feature importance (Figure 9-a) thus demonstrating that the MLP was identifying that the differences in wave shape were the most important feature when classifying waves into broken or unbroken. This is a sensible result as broken waves in the surf zone usually present a characteristically skewed, triangular shape (saw-tooth shape), whereas unbroken waves are more symmetrical and less skewed [Svendsen \BOthers. (\APACyear1978), Cowell (\APACyear1982)].
Analysis of the PDF of the combined shape parameter [Cowell (\APACyear1982)] showed that the majority of broken waves had shapes similar to âbore-likeâ waves (Figure 9-b). Conversely, the peak of the distribution of unbroken wave shapes was close to the expected value for a sinusoidal wave (Figure 9-c), thus being mainly composed by less skewed waves. Furthermore, the two-sample Kolmogorov-Smirnov (K-S) test indicated that the PDFs shown in Figure 9-a) and 9-b) are statistically significantly different with . This is another good indication that the MLP is indeed capable of separating broken and unbroken waves based on a plausible combination of shape parameters and that it is not learning based on an arbitrary combination of weights and biases found by the SGD algorithm.
4 Results
4.1 Results of field observations
Figure 10 shows the variation of relative to averaged water depth normalised by offshore wave height, i.e., relative water depth, which will be used as a proxy for the cross-shore distance hereafter. It should be noted, however, that this approach may not be wholly appropriate for barred beach profiles because depths on either side of the bar would appear in the same location on the -axis. The most notable feature observed was that, for similar relative water depths, there was great variability in at all locations analysed (grey markers in Figure 10). This observation of the behaviour of is novel in the literature but is consistent with observations of other natural surf zone parameters [Power \BOthers. (\APACyear2010), Postacchini \BBA Brocchini (\APACyear2014), Martins, Blenkinsopp, Power\BCBL \BOthers. (\APACyear2017)] and with the chaotic behaviour of breaking waves [Wei \BOthers. (\APACyear2018)]. values were binned at 0.1 relative depth intervals (coloured markers in Figure 10) and a four-parameter logistic curve [Richards (\APACyear1959)] was fitted to the data (thick red lines in the panels in Figure 10) to aid with visualisation of the general cross-shore structure of . For the beaches where the whole surf zone was sampled, the logistic fit agreed well with the observations (i.e., at Boomerang Beach, Moreton Island, Werri Beach, and One Mile Beach) but this fitting method could be inappropriate to model when the surf zone was only partially sampled (i.e., at Frazer Beach and Birubi Beach, and Seven Mile Beach).
An inverse linear trend was observed in the inner surf zone in some cases (e.g., Figure 10-p) and Figure 10-q). These trend are, however, not statistically significantly different from an averaged constant value of . Interestingly, terminal values of were often observed to be significantly less than one, which is consistent with a in which small unbroken waves reach the surf-swash boundary or could be caused by shoreline reflection [Martins, Blenkinsopp, Almar\BCBL \BBA Zang (\APACyear2017)]. Such terminal values would not be predicted by the models presented in Section 2 due to the constraint . Finally, the outer limit of the surf zone was observed to be in the range instead of the value of suggested by several other publications [Thornton \BBA Guza (\APACyear1982), Thornton \BBA Guza (\APACyear1983), Power \BOthers. (\APACyear2010), Ruessink \BOthers. (\APACyear1998)].
4.2 Investigation of the inter and intra-beach Variability of
The variability seen in Figure 10 is examined bellow with respect to three surf zone parameters: tidal cycle, infragravity wave energy, and beach morphology. These three parameters were chosen because, given an appropriated predictive model (in this case, a decision-tree model - not shown), they explained 90% of the variability seen in the cross-shore variation of . Similar trends were observed for all locations in the dataset but, for brevity, only the most representative cases for each parameter are analysed in detail below.
4.2.1 Tidal cycle
The tidal influence on was analysed using data from Moreton Island (20/12/2016) and Werri Beach (16/08/2014). These two deployments were chosen because: 1) they had a large number of PTs in the surf zone during the tidal cycle, 2) they had contrasting beach profile characteristics (see Figure 3), and 3) they had offshore wave conditions that remained approximately constant during the duration of the deployment. For each 15-minute run, data were extracted from the MLP classified dataset, binned into 0.1 water depth intervals, grouped by hour, and coloured by tidal water level obtained using the Pacific Ocean solution of Oregon State Universityâs Tidal Prediction Software (OTPS) [Egbert \BBA Erofeeva (\APACyear2002)]. The vertical datum used for tides was the same as in OTPS. The results of this analysis are shown in Figure 11.
For the gently sloping beach (Moreton Island), no influence of tidal control on was observed (Figure 11-a). For this profile, tides may be only responsible modifying âs behaviour in deeper water depths, closer to the seaward end of the surf zone. As such, lower values of were systematically observed during low tide. In contrast, there was a clear tidal signature in the cross-shore evolution of for the steep profile (Werri Beach, Figure 11-b). Where, during low tide, the main break shifted towards the terrace portion of the beach profile (see panel g) in Figure 3) which caused the curves to be shifted toward deeper relative depths. At higher tidal water levels, the main break point shifted shoreward to a very steep portion of the profile, causing the curves to be shifted toward shallower water depth, and therefore causing the surf zone to become narrower. On the steep profile case, the tidal cycle drove variations in of up to 70% at similar relative water depths (e.g., at ). A possible explanatory mechanism for this dynamics is discussed in Section 5.1
4.2.2 Infragravity energy
Three deployments at Boomerang Beach were chosen to highlight the influence of infragravity waves on . These deployments were chosen because they presented comparable offshore wave conditions, beach profiles, and tidal water levels but three different trends in the cross-shore variation of . Similar but less pronounced trends were observed in all other locations (not shown). For the analysed data (Figure 12), the ratio between the spectral energy in the infragravity () and sea-swell () frequency bands was calculated as
[TABLE]
in which is the total energy in each frequency band for individual 15-minutes timeseries. The results of this analysis are shown in Figure 12.
In general, increased towards shallower water depths for all data runs, which is consistent with several previous observations (see \citeABertin2018a for a recent review). During the first deployment (19/09/2016), the surf zone was dominated by waves in the sea-swell frequency and the terminal value of the fitted curve was close to 1 (Figure 12-a, ). Subsequently, infragravity wave energy started to dominate in shallower water depths, causing the terminal value of the curve to lower (Figure 12-b, ; 21/09/2016). Finally, the inner surf zone was strongly dominated by infragravity waves during the last deployment, which caused the terminal values of the curves to reach it lowest (Figure 12-c, ; 22/09/2016). These results suggest that increasing may lower the terminal values of . Possible causes for the infragravity control on are discussed in Section 5.1.
4.2.3 Beach morphodynamics
The link between and \citeAWright1984 beach morphodynamic model was investigated using the parameter , defined as :
[TABLE]
in which is the surf zone significant wave height, is the surf zone significant wave period, and is the sediment fall velocity [Dean (\APACyear1973), Gourlay (\APACyear1968)]. Data from one day at the locations where the surf zone was fully sampled were used in this analysis. The fitted curves grouped into two main clusters: one for dissipative beaches (, Seven Mile Beach; Figure 13-a), and another for intermediate (, Boomerang Beach, One Mile Beach, Werri Beach, and Moreton Island, Figure 13-b to c) which is in very good agreement with \citeAWright1984 model. These results also correlated well with the observed breaker types and surf zone widths from the video data. At Seven Mile Beach, the surf zone was observed to be wide and the waves had time to develop into fully formed bores which dissipate most of incoming energy. For the other locations, the surf zone was observed to be narrower and dominated by plunging breakers, which were not observed to develop into fully developed bores.
4.3 Broken Wave Height PDF
Here, the data obtained from the previous analysis are used to develop a novel definition for the PDF of broken waves. For each location and each data run, the wave heights of all the broken waves () were extracted, and was approximated via non-parametric kernel density estimations (KDE) [Kim \BBA Scott (\APACyear2012)] (Figure 14-a to g). Variability between PDFs at a location was observed (coloured lines in Figure 14), consistent to the results presented in Section 4.1. However, when the mean PDF for each location was calculated, the results were remarkably consistent across all locations (Figure 14-h). On average, could be approximated by a Weibull PDF with and scale . These optimal values were calculated as the average of the shape and scale parameters found after fitting Equation 18 to approximated via the KDEs. From these results, it is possible to obtain an analytic expression for :
[TABLE]
Note that is not equivalent to introduced in Section 2 because is not a true PDF whereas is.
When Equation 26 was compared to averaged PDFs (black lines in Figure 14) using the K-S test, there was no significant statistical difference between them at the 95% confidence interval. Moreover, when the t-test for the mean was applied to compared individual PDFs (coloured lines in Figure 14) to Equation 26, it was found that in 96.4% of the cases there was no statistical difference between then at the 95% confidence interval. To further test the robustness of Equation 26, it was also compared to the mean PDF at each relative depth decile considering data from all locations (Figure 14-i). Except for the deepest decile, the approximations via Equation 26 were not statically different to the averaged PDF at the 95% confidence interval. This is an important finding because from Equation 26 the can be analytically transformed into .
Let and be two functions that satisfy the following properties: and must be single valued for all and , for all and for all , and and . If these properties hold, the following is valid:
[TABLE]
in which is a transference function. Substituting = and = results in:
[TABLE]
This result is key because it allows for a mathematically rigorous transformation of the PDF of all waves into the PDF of broken waves. Solving this differential equation for the exact transformation function and applying it directly into Equation 1 is, however, beyond the scope of this paper and will be attempted in a follow-up publication.
4.4 Assessing existing Models
The models presented in Section 2 were assessed using calculated from the MLP classified data (see Section 3.4.2). For each data run, the residuals between the calculated () and the theoretically predicted () were obtained and the results are shown in Figure 15-a). As anticipated by the results seen in Section 4.1, all models performed poorly because they cannot account for intra-depth variability given similar offshore conditions nor terminal values of . The models with the lowest averaged residuals were the original (with and ) and the modified TG83 models (with and ). Both original and modified B98 models performed poorly when considering only the averaged residuals but, despite this, were the most consistent models, showing no clear bias toward one particular beach. On the contrary, BJ78 presented the highest inter- and intra-location variability in the residuals. Because the models differ in the number of input parameters, the averaged residuals alone may not be an appropriate comparative metric. Using the Akaike information criterion (AIC; \citeAAkaike1974,Aho2014) to account for the different number of input parameters, it was confirmed that the best performing model was TG83 with =2.4 and . Table 3 shows the full analysis.
Residuals were also analysed by relative water depth (Figure 15-b). The data for all locations were grouped, binned into 0.2 relative depth intervals, and the residuals were calculated as before. The results from this analysis showed that BJ78 greatly over-predicted at deeper water depths, whereas both B98 model parametrisations under-predicted at shallower water depths. For all models, the greatest averaged residuals (circular markers in Figure 15-b) were observed in the mid-surf zone (relative water depth range of 0.7-1.9). B98âs systematic underestimation of at shallower water depths could be yet another factor that affected the total energy dissipation in this model in addition to simplification of as reported by \citeAAlsina2007 and \citeAJanssen2007. No correlations were found between the errors seen in the models and other surf zone parameters (not shown).
5 Discussion
In this paper, a comprehensive review of the theoretical formulations for the fraction of broken waves () was initially presented (Section 2). It was evident from this that little testing of these formulations against natural surf zone data had been performed and further testing was required. Data from seven beaches across 19 deployments were collected and used to address this shortcoming (Section 3.1). Two novel methods were developed to obtain . Firstly by using collocated PT and video data and, secondly, by using robust machine learning techniques (Sections 3.3 and 3.4). Using the second approach, 333,732 waves were classified which resulted in 3639 unique 15-minute timeseries (data runs) being analysed. Based on this dataset, the cross-shore structure of was analysed to access the influence of environmental parameters on and develop novel definitions of the PDF for broken waves. Finally, three widely used theoretical models were assessed. In the next sections, the methods used throughout this paper and the results they produced are discussed.
5.1 Cross-shore variability of and Environmental Forcing
The analysis of the cross-shore structure of âs showed great intra-depth variability (Figure 10) which seems not to have been previously observed in the literature. To the authorâs knowledge, only TG83 measured on natural surf zones using in-situ data, and, in their work, only four cross-shore observations from one beach were shown (their Figure 11). Such a small dataset cannot account for the full natural variability of an, apparently chaotic, process. Although \citeACarini2015 provided a much larger field dataset for , these authors did not investigate the cross-shore variation of . Thus, the results presented in Section 4.1 are novel in the literature and show that is highly variable at all locations. Such observations are in agreement with previous studies of cross-shore structure of other surf zone parameters, such as and [Power \BOthers. (\APACyear2010), Power \BOthers. (\APACyear2015), Martins, Blenkinsopp, Almar\BCBL \BBA Zang (\APACyear2017)], and the instantaneous wave speed () [Postacchini \BBA Brocchini (\APACyear2014), Tissier \BOthers. (\APACyear2015)]. Nonetheless, there it was observed a clear connection between the cross-shore structure of and \citeAWright1984 Australian beach morphodynamic model (see Section 4.2.3).
In Section 4.2.1, a tidal influence on was observed for the steep beach profile case (Werri Beach). One potential mechanism controlling the changes on âs behaviour was a alteration in the dependence of wave breaking on the local [Dally \BOthers. (\APACyear1985)]. Due to the fact that the profile steepened very quickly in the high tide case (at in Figure 3-f), the waves may not have enough time to adapt to the changes in depth which lead to an increase of shore-breakers. The increase in the occurrence of this breaker type was also confirmed when analysing the raw video imagery. This mechanism seems to be consistent with the fairly constant slope of the curves regardless of the tidal level. Thus, the rate of change in the amount of wave breaking can be consistent with a lateral expansion of the surf zone with the tidal cycle, and showed a connection between and beach morphology, which was further explored in Section 4.2.3.
Some other surf zone phenomena were responsible for disturbing the observed patterns in . For instance, infragravity waves were shown to systematically reduce terminal values of in at least one beach (see Session 4.2.2). Recent research has shown that infragravity waves modify the water depth in which short waves are propagating, consequently changing the shoreward evolution of sea-swell waves [Tissier \BOthers. (\APACyear2015), de Bakker \BOthers. (\APACyear2016), Padilla \BBA Alsina (\APACyear2017)]. Therefore, if a short-wave sits on the positive part of an infragravity wave propagating shoreward, short-wave breaking may be inhibited because the local water depth () increases. Such an increase in causes the local to decrease, leading the wave to reform or not break [Dally \BOthers. (\APACyear1985)] hence lowering .
The infragravity wave control on could be consistent with break-point infragravity forcing [Symonds \BOthers. (\APACyear1982)] or edge waves [A\BPBIJ. Bowen \BBA Huntley (\APACyear1984), R\BPBIa. Holman \BBA Bowen (\APACyear1979), J. Bowen \BBA Guza (\APACyear1978)] because these waves can increase water levels asymmetrically, particularly closer to the shore. However, this dynamic is inconsistent with bound wave forcing [Longuet-Higgins \BBA Stewart (\APACyear1964), Battjes \BOthers. (\APACyear2004), Baldock (\APACyear2006)] because the net result of the water level changes should be null over a given time interval. Meaning that, in average, the negative part of the incoming bound waves balances the positive part. Therefore, should remain unchanged in the presence of wave groups. Combining \citeAStringari2019 wave tracking and \citeADeMoura2017 infragravity wave forcing detection methods could lead to the discovery of the missing link between the two processes and will be attempted in a next publication.
5.2 Wave Height PDFs
We have developed novel formulations for broken wave height PDFs that are consistent with observed wave height PDFs (Section 4.3). Unfortunately, these results are impossible to directly compare with the existing literature because all previous publications described this distribution as a function of the PDF for all waves [Baldock \BOthers. (\APACyear1998), Battjes \BBA Janssen (\APACyear1978), Ruessink \BOthers. (\APACyear2003), Alsina \BBA Baldock (\APACyear2007), Battjes \BBA Stive (\APACyear1985), Janssen \BBA Battjes (\APACyear2007)]. Despite this, the overall shape of the distributions shown in Figure 14 agreed well with TG83âs measured broken wave PDFs (the two lower panels in their Figure 10). On a more fundamental level, what these results showed was a direct manifestation of the Central Limit Theorem (CLT) [Feller (\APACyear1945)]. Given the systematic sub-sampling of the distribution of all waves () to obtain the distribution of broken waves (), it was statistically expected that be Gaussian shaped, regardless of the shape of its original distribution.
The Weibull PDF was shown to describe remarkably well, however, the choice for this distribution was based on previous literature rather than on physical or statistical reasoning [Mase (\APACyear1989), Hameed \BBA Baba (\APACyear1985), Battjes \BBA Groenendijk (\APACyear2000), Mendez \BOthers. (\APACyear2004), Power \BOthers. (\APACyear2016)]. Other PDFs, particularly the Gaussian and Gamma distributions, were more accurate than the Weibull PDF to model when the K-S test was used to measure their goodness-of-fit. It was found that 97.4% of the broken wave PDFs were statistically similar to a Gaussian PDF, 96.9% to a Gamma PDF, 95.7% to a Weibull PDF, and only 1.4% were statistically similar to a Rayleigh PDF. The Weibull PDF was then maintained because: 1) the differences seen in the K-S tests were small, and 2) the physical interpretation of the parameters of the alternative distributions are harder to correlate to physical processes (e.g., Gamma PDF), or could allow for negative wave heights (e.g., Gaussian PDF). Unfortunately, and are not usually known, regardless of the distribution used to model them; therefore, Equation 26 is not of practical use. On the other hand, Equation 28 has the potential to lead to a practical model that could provide a more realistic representation of in energy dissipation models. The approach shown in Section 4.3 (particularly Equations 27 and 28) has also the advantage of working with any alternative PDF (e.g., the better performing Gamma and Gaussian PDFs) to describe the evolution of into .
5.3 Model performance
The theoretical models performed poorly when compared to the data from the MLP (see Figure 15). Significant improvement was observed for the TG83 model when the Rayleigh PDF was replaced by the Weibull PDF and was set to 1. Such improvement also followed directly from the CLT whereby this particular combination of parameters made the shape of empirical curve that defines very close to the curves seen in Figure 14. The poor performance of the BJ78 model was due to the truncation of , which directly contradicts the observations presented here (compare Figure 14 and Figure 1) [Baldock \BOthers. (\APACyear1998), Thornton \BBA Guza (\APACyear1983), Alsina \BBA Baldock (\APACyear2007), Janssen \BBA Battjes (\APACyear2007)]. The reasons why the B98 model performed poorly are unclear, especially when there is strong evidence that its formulation is the best performing in un-calibrated situations [Alsina \BBA Baldock (\APACyear2007), Apotsos \BOthers. (\APACyear2008), Ruessink \BOthers. (\APACyear2003)]. One reason could be that the breaking criterion is not properly describing the start of wave breaking, as previously observed [Ruessink \BOthers. (\APACyear2003)], however, updating to use \citeARuessink2003 formulation did not improve the residuals for the B98 model (not shown).
Moreover, considering the dissipation term from \citeAAlsina2007, Janssen2007, linear wave theory, and conservation of momentum and mass (i.e., there is no reflection at the shoreline, or a swash zone) Equation 1 can be re-written as:
[TABLE]
After algebraic simplifications, it follows that the combination of the remaining variables are of similar order of magnitude. Therefore, even if is significantly under or over estimated (or even assumed constant), the influence of on the total energy dissipation can be counter-balanced by optimising either , , or , as done by \citeABattjes1985, Ruessink2003, Apotsos2008 which is not necessarily a physical improvement to the model.
There exists alternative approaches to model the surf zone energy dissipation which are more realistic than parametric models, e.g., \citeADuncan1981 and \citeAsvendsen2006 wave roller models, \citeASmit2013 approach used in SWASH [Zijlema \BOthers. (\APACyear2011)] and the recent Smoothed Particle Hydrodynamics (SPH) approach [Altomare \BOthers. (\APACyear2015)]. However, parametric energy dissipation models are still frequently used to drive hydrodinamic [Zhang \BOthers. (\APACyear2018)], morphological [Larson \BOthers. (\APACyear1990), Hanson \BBA Kraus (\APACyear1989), Roelvink \BOthers. (\APACyear2009), de Vriend \BOthers. (\APACyear1993)], and spectral wave models [Booij \BOthers. (\APACyear1999), Ris \BOthers. (\APACyear1999), WW3DG (\APACyear2016)]. Therefore, addressing knowledge gaps, such as the ones described in this work, could result in improvements in these other models which are often used for non-academic coastal management applications.
5.4 The Problem with Machine Learning
Machine learning algorithms, particularly neural networks, have revolutionised data problems in the last three decades but have also received intense criticism (e.g., as early as \citeAVemuri1993). Besides the technical implementation and reproducibility problems, there exists a more fundamental issue with the machine learning approach: it does not give new insights into the physical phenomena governing the analysed problem. In this paper, we analysed the feature importance of the input layer and concluded that the MLP was learning from a combination of parameters related to the wave shape, however, the neural connections in the hidden layers are significantly harder to understand. Firstly, because of the enormous number of connections (of the order of ); and secondly, because of the non-linearity between these connections. Such intrinsic complexity is a likely cause of the unexpected results shown in Figure 8-c), i.e., when the MLP classified data for Moreton Island more accurately when leaving this location out of the training dataset. This result was significantly counter-intuitive given that the practical rules of machine learning state that more training samples result in better accuracy scores [Hastie \BOthers. (\APACyear2001)]. Several alternatives to the neural network approach were attempted (e.g., logistic regression, Bayesian inference, and nearest neighbours models) but all these methods resulted in accuracy scores of the order of 60%, which is only slightly better than randomly guessing the wave label.
6 Conclusion
In this paper, data from seven different Australian beaches across nineteen unique deployments were collected and used to investigate the natural variability in the fraction of broken waves (Qb). A machine learning model that classifies waves into broken or unbroken using wave-by-wave parameters was developed from collocated remote sensing and in-situ pressure transducer data. Using over 350,000 waves classified waves, it was found that Qb is highly variable parameter with a high degree of inter- and intra-beach variability. Nonetheless, correlations between environmental forcing and Qb were found. On steeper beaches, for a given local water depth, Qb was up to 70% higher at low tide when compared to high tide. In addition, increased infragravity energy levels decreased terminal values of Qb by  20%. This correspondence between Qb and environmental parameters was linked to \citeAWright1984 beach morphodynamic model: for a given normalised water depth, Qb is higher for dissipative beaches than for intermediate beaches. Using the machine-learning Qb data (, ) three widely used Qb models were tested and, in general, were shown to perform poorly (average errors of the order of 40%). The \citeAThornton1983 model was significantly improved by replacing the original Rayleigh PDF with a Weibull PDF (average errors 10%). Finally, a mathematically sound transformation of the PDF for all wave heights into the PDF of broken wave heights was outlined based on the patterns of p(Hbr) observed here. The novel Qb dataset derived here, shows that the current theoretical parameterizations for Qb are poor predictors because they cannot account for the full natural variability of the parameter. This dataset is used to develop a novel, data-driven, method to transform into that could be used to further improve coastal management tools.
Acknowledgements.
The field work (2014 experiments) were funded by a University of Newcastle Faculty of Science and I.T. Strategic Initiatives Research Fund Grant 2014 to HEP. The authors are grateful to Alex Atkinson, Andrew Magee, Annette Burke, Emily Kirk, Daniel Harris, David Hanslow, Kaya Wilson, Madeleine Broadfoot, Michael Hughes, Mike Kinsela, Murray Kendall, Rachael Grant, Rebecca Hamilton, Samantha Clarke, and Tom Donaldson-Brown who assisted with the field data collection. Caio E. Stringari is funded by a University of Newcastle Research Degree Scholarship (UNRS) 5050UNRS and a Central & Faculty scholarship. The pressure transducer and video cameras used in the 2014 experiments were kindly lent by Tom Baldock from the University of Queensland. The sediment data used in Section 4.2.3 were kindly provided by professor Andrew Short from the University of Sydney. The authors are also thankful to the Academic Research Computing Support Team, particularly Aaron Scott, at the University of Newcastle for support with the I.T. infrastructure on which all video data pre-processing and machine-learning development were undertaken, and to Bas Hoonhout who helped providing the original image rectification codes.
Data availability
The code, data, and the pre-trained neural network used in this work will available at https://github.com/caiostringari/pywavelearn after this manuscript is published.
Appendix 1
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Aagaard \BBA Holm ( \APA Cyear 1989) \APA Cinsertmetastar Aagaard 2016 {APA Crefauthors} Aagaard, T. \BCBT \BBA Holm, J. \APA Cref Year Month Day 1989. \BBOQ \APA Crefatitle Digitization of Wave Run-up Using Video Records Digitization of Wave Run-up Using Video Records. \BBCQ \APA Cjournal Vol Num Pages Journal of Coastal Research 53547â551. \Print Back Refs \Current Bib
- 2Aho \B Others . ( \APA Cyear 2014) \APA Cinsertmetastar Aho 2014 {APA Crefauthors} Aho, K., Derryberry, D. \BCBL \BBA Peterson, T. \APA Cref Year Month Day 2014. \BBOQ \APA Crefatitle Model selection for ecologists: the worldview of AIC and BIC Model selection for ecologists: the worldview of AIC and BIC. \BBCQ \APA Cjournal Vol Num Pages Ecology 953631â636. {APA Cref URL} http://internal-pdf//Ahoetal.2014-2324678410/Ahoetal.2014.pdf {APA Cref DOI} 10.1890/13-1452.1 \Print Back · doi â
- 3Akaike ( \APA Cyear 1974) \APA Cinsertmetastar Akaike 1974 {APA Crefauthors} Akaike, H. \APA Cref Year Month Day 1974. \BBOQ \APA Crefatitle A New Look at the Statistical Model Identification A New Look at the Statistical Model Identification. \BBCQ \APA Cjournal Vol Num Pages IEEE Transactions on Automatic Control 196716â723. {APA Cref DOI} 10.1109/TAC.1974.1100705 \Print Back Refs \Current Bib · doi â
- 4Alsina \BBA Baldock ( \APA Cyear 2007) \APA Cinsertmetastar Alsina 2007 {APA Crefauthors} Alsina, J \BPBI M. \BCBT \BBA Baldock, T \BPBI E. \APA Cref Year Month Day 2007. \BBOQ \APA Crefatitle Improved representation of breaking wave energy dissipation in parametric wave transformation models Improved representation of breaking wave energy dissipation in parametric wave transformation models. \BBCQ \APA Cjournal Vol Num Pages Coastal Engineering 54765â769. {APA Cref DOI} 10.1016/j.co · doi â
- 5Altomare \B Others . ( \APA Cyear 2015) \APA Cinsertmetastar Altomare 2015 a {APA Crefauthors} Altomare, C., DomĂnguez, J \BPBI M., Crespo, A \BPBI J \BPBI C., Suzuki, T., Caceres, I. \BCBL \BBA GĂłmez-Gesteira, M. \APA Cref Year Month Day 2015. \BBOQ \APA Crefatitle Hybridization of the Wave Propagation Model SWASH and the Meshfree Particle Method SPH for Real Coastal Applications Hybridization of the Wave Propagation Model SWASH and the Meshfree Particle Method SPH for Real Coastal Applica · doi â
- 6Apotsos \B Others . ( \APA Cyear 2008) \APA Cinsertmetastar Apotsos 2008 {APA Crefauthors} Apotsos, A., Raubenheimer, B., Elgar, S. \BCBL \BBA Guza, R \BPBI T. \APA Cref Year Month Day 2008. \BBOQ \APA Crefatitle Testing and calibrating parametric wave transformation models on natural beaches Testing and calibrating parametric wave transformation models on natural beaches. \BBCQ \APA Cjournal Vol Num Pages Coastal Engineering 55224â235. {APA Cref DOI} 10.1016/j.coastaleng.2007.10.002 · doi â
- 7Baldock ( \APA Cyear 2006) \APA Cinsertmetastar Baldock 2006 {APA Crefauthors} Baldock, T \BPBI E. \APA Cref Year Month Day 2006. \BBOQ \APA Crefatitle Long wave generation by the shoaling and breaking of transient wave groups on a beach Long wave generation by the shoaling and breaking of transient wave groups on a beach. \BBCQ \APA Cjournal Vol Num Pages Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences 46220701853â1876. {APA Cref DOI} 10.1098/rspa.2005 · doi â
- 8Baldock \B Others . ( \APA Cyear 1998) \APA Cinsertmetastar Baldock 1998 {APA Crefauthors} Baldock, T \BPBI E., Holmes, P., Bunker, S. \BCBL \BBA Weert, P \BPBI V. \APA Cref Year Month Day 1998. \BBOQ \APA Crefatitle Cross-shore hydrodynamics within an unsaturated surf zone Cross-shore hydrodynamics within an unsaturated surf zone. \BBCQ \APA Cjournal Vol Num Pages Coastal Engineering 34173â196. \Print Back Refs \Current Bib
