TL;DR
This study compares Wavelet Packet Transform and Ensemble Empirical Mode Decomposition for chatter detection in turning, demonstrating that EEMD offers superior transfer learning performance with high accuracy across different machine configurations.
Contribution
It introduces a comparative analysis of WPT and EEMD for chatter detection and highlights EEMD's better transfer learning capabilities in metal cutting applications.
Findings
EEMD outperforms WPT in transfer learning scenarios.
Both methods achieve over 94% accuracy when trained and tested on the same configuration.
WPT's feature selection based on energy ratios may miss chatter frequencies, reducing accuracy.
Abstract
The increasing availability of sensor data at machine tools makes automatic chatter detection algorithms a trending topic in metal cutting. Two prominent and advanced methods for feature extraction via signal decomposition are Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We apply these two methods to time series acquired from an acceleration sensor at the tool holder of a lathe. Different turning experiments with varying dynamic behavior of the machine tool structure were performed. We compare the performance of these two methods with Support Vector Machine (SVM), Logistic Regression, Random Forest Classification and Gradient Boosting combined with Recursive Feature Elimination (RFE). We also show that the common WPT-based approach of choosing wavelet packets with the highest energy ratios as representative features for chatter does not always result…
| Stickout length (cm (inch)) | Stable | Mild chatter | Chatter | Total |
| 5.08 (2) | 17 | 8 | 11 | 36 |
| 6.35 (2.5) | 7 | 4 | 3 | 14 |
| 8.89 (3.5) | 7 | 2 | 2 | 11 |
| 11.43 (4.5) | 13 | 4 | 5 | 22 |
| Stickout length (cm (inch)) | Chatter frequency range (Hz) | Informative wavelet packets | Informative IMF |
| 5.08 (2) | – | Level 1 :1, Level 2: 1, Level 3: 2, Level 4: 3 | 2 |
| 6.35 (2.5) | – | Level 1 :1, Level 2: 1, Level 3: 3, Level 4: 4 | 2 |
| 8.89 (3.5) | – | Level 1 :1, Level 2: 2, Level 3: 3, Level 4: 6 | 1 |
| 11.43 (4.5) | – | Level 1 :2, Level 2: 3, Level 3: 5, Level 4: 10 | 1 |
| Stickout length (cm (inch)) | Chatter frequency range (Hz) | Informative wavelet packets (Predicted) | Informative wavelet packets (Selected) |
| 5.08 (2) | – | Level 4: 3-4 | Level 4: 3 |
| 6.35 (2.5) | – | Level 4: 4-5 | Level 4: 4 |
| 8.89 (3.5) | – | Level 4: 6 | Level 4: 6 |
| 11.43 (4.5) | – | Level 4: 10 | Level 4: 10 |
| Features | |
| (Mean) | (Clearance Factor) |
| (Standard Deviation) | (Shape Factor) |
| (RMS) | (Impulse Factor) |
| (Peak) | (Mean Square Frequency) |
| (Skewness) | (One Step Auto Correlation Function) |
| (Kurtosis) | (Frequency Center) |
| (Crest Factor) | (Standard Frequency) |
| Feature | Equation |
| Energy ratio | |
| Peak to Peak | |
| Standard Deviation | |
| Root Means Square (RMS) | |
| Crest Factor | |
| Skewness | |
| Kurtosis |
| Classification Results | Time Comparison (seconds) | ||||
| Stickout Length | WPT Level | WPT | EEMD | WPT | EEMD |
| 5.08 cm (2 inch) | 1 | 115.99 | 14540.06 | ||
| 6.35 cm (2.5 inch) | 2 | 36.65 | 3371.58 | ||
| 8.89 cm (3.5 inch) | 1 | 4.51 | 1583.38 | ||
| 11.43 cm (4.5 inch) | 2 | 6.53 | 3096.07 | ||
| WPT | EEMD | |||||||
| Stickout Length | SVM | Logistic Regression | Random Forest | Gradient Boosting | SVM | Logistic Regression | Random Forest | Gradient Boosting |
| 5.08 cm (2 inch) | ||||||||
| 6.35 cm (2.5 inch) | ||||||||
| 8.89 cm (3.5 inch) | ||||||||
| 11.43 cm (4.5 inch) | ||||||||
| WPT | EEMD | |||||||
| Stickout Length | SVM | Logistic Regression | Random Forest | Gradient Boosting | SVM | Logistic Regression | Random Forest | Gradient Boosting |
| 5.08 cm (2 inch) | ||||||||
| 6.35 cm (2.5 inch) | ||||||||
| 8.89 cm (3.5 inch) | ||||||||
| 11.43 cm (4.5 inch) | ||||||||
| Training Set: 5.08 cm (2 inch) Test Set: 11.43 cm (4.5 inch) | Training Set: 11.43 cm (4.5 inch) Test Set: 5.08 cm (2 inch) | |||||||
| Method | SVM | Logistic Regression | Random Forest | Gradient Boosting | SVM | Logistic Regression | Random Forest | Gradient Boosting |
| WPT Level 1 | ||||||||
| WPT Level 4 | ||||||||
| EEMD | ||||||||
| Training Set: 5.08 cm (2 inch) and 6.35 cm (2.5 inch) Test Set: 8.89 cm (3.5 inch) and 11.43 cm (4.5 inch) | Training Set: 8.89 cm (3.5 inch) and 11.43 cm (4.5 inch) Test Set: 5.08 cm (2 inch) and 6.35 cm (2.5 inch) | |||||||
| Method | SVM | Logistic Regression | Random Forest | Gradient Boosting | SVM | Logistic Regression | Random Forest | Gradient Boosting |
| WPT Level 4 | ||||||||
| EEMD | ||||||||
| Classifier: SVM | 5.08 cm (2 inch) | 6.35 cm (2.5 inch) | ||
| Features | Test Set | Training Set | Test Set | Training Set |
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 8.89 cm (3.5 inch) | 11.43 cm (4.5 inch) | ||
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 5.08 cm (2 inch) | 6.35 cm (2.5 inch) | ||
| Features | Test Set | Training Set | Test Set | Training Set |
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 8.89 cm (3.5 inch) | 11.43 cm (4.5 inch) | ||
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 5.08 cm (2 inch) | 6.35 cm (2.5 inch) | ||
| Features | Test Set | Training Set | Test Set | Training Set |
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 8.89 cm (3.5 inch) | 11.43 cm (4.5 inch) | ||
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 5.08 cm (2 inch) | 6.35 cm (2.5 inch) | ||
| Features | Test Set | Training Set | Test Set | Training Set |
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 8.89 cm (3.5 inch) | 11.43 cm (4.5 inch) | ||
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 5.08 cm (2 inch) | 6.35 cm (2.5 inch) | ||
| Features | Test Set | Training Set | Test Set | Training Set |
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Classifier: SVM | 8.89 cm (3.5 inch) | 11.43 cm (4.5 inch) | ||
| , | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| ,, | ||||
| Training Set: 5.08 cm (2 inch) | |||
| Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | Training Set: 11.43 cm (4.5 inch) | |
| Features | Test set (Validation Set) | ||
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| Training Set: 11.43 cm (4.5 inch) | |||
| Test Set:5.08 cm (2 inch) | Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | |
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| Training Set: 5.08 cm (2 inch) | |||
| Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | Training Set: 11.43 cm (4.5 inch) | |
| Features | Test set (Validation Set) | ||
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| Training Set: 11.43 cm (4.5 inch) | |||
| Test Set:5.08 cm (2 inch) | Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | |
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| Training Set: 5.08 cm (2 inch) | |||
| Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | Training Set: 11.43 cm (4.5 inch) | |
| Features | Test set (Validation Set) | ||
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| Training Set: 11.43 cm (4.5 inch) | |||
| Test Set:5.08 cm (2 inch) | Test Set: 6.35 cm (2.5 inch) | Test Set: 8.89 cm (3.5 inch) | |
| , | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
| ,, | |||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLogistic Regression
On Transfer Learning For Chatter Detection in Turning Using Wavelet Packet Transform and Ensemble Empirical Mode Decomposition
Melih C. Yesilli
Department of Mechanical Engineering
Michigan State University
Firas A. Khasawneh
Department of Mechanical Engineering
Michigan State University
Andreas Otto
Institute of Physics
Chemnitz University of Technology
Abstract
The increasing availability of sensor data at machine tools makes automatic chatter detection algorithms a trending topic in metal cutting. Two prominent and advanced methods for feature extraction via signal decomposition are Wavelet Packet Transform (WPT) and Ensemble Empirical Mode Decomposition (EEMD). We apply these two methods to time series acquired from an acceleration sensor at the tool holder of a lathe. Different turning experiments with varying dynamic behavior of the machine tool structure were performed. We compare the performance of these two methods with Support Vector Machine (SVM), Logistic Regression, Random Forest Classification and Gradient Boosting combined with Recursive Feature Elimination (RFE). We also show that the common WPT-based approach of choosing wavelet packets with the highest energy ratios as representative features for chatter does not always result in packets that enclose the chatter frequency, thus reducing the classification accuracy. Further, we test the transfer learning capability of each of these methods by training the classifier on one of the cutting configurations and then testing it on the other cases. It is found that when training and testing on data from the same cutting configuration both methods yield high accuracies reaching in one of the cases as high as and , respectively, for WPT and EEMD. However, our experimental results show that EEMD can outperform WPT in transfer learning applications with accuracy of up to .
Keywords: Machine learning, transfer learning, Wavelet analysis, Empirical mode decomposition, chatter detection, turning
1 Introduction
Turning, boring, milling, and drilling operations constitute a major part of manufacturing processes. One challenging problem that all these processes have in common is the occurrence of large amplitude, detrimental oscillations called chatter [1, 2, 3]. Since chatter leads to increased tool wear, poor surface finish and noise, it is extremely important to anticipate and avoid its occurrence. Alternatively, several chatter mitigation techniques including increasing stiffness in machine tools, and active and passive damping techniques also exist [4]. Efficient methods for the identification of the stability lobes that separate stable cutting and chattering motion [5, 6] can help keep the machine away from chatter via selecting parameters in the safe area below the stability lobes. However, these models often do not account for the effect of the changing dynamics or for highly complex cutting operations. This led to the emergence of in-situ methods for chatter detection based on instrumenting the cutting center with sensors and analyzing the resulting signals [7, 8, 9, 10, 11, 12].
The majority of available in-process methods for chatter identification rely on extracting certain features from the acoustic, vibration, or force signals and comparing them against some predefined markers of chatter [13, 14, 15, 16, 9, 17, 18, 19, 20, 21, 22, 23]. They can be broadly categorized into two groups as shown in Fig. 1. The most prevailing methods are Wavelet Packet Transforms (WPT) and Empirical Mode Decomposition (EMD) or the Ensemble Empirical Mode Decomposition (EEMD). Generally, such decomposition-based methods for analyzing the cutting signal follow the same procedure. First, the signal is decomposed into different parts using some transformation. Then, the decomposed portions or packets of the signal which include the relevant information about machine tool chatter are selected to reconstruct a new signal. These packets are chosen by applying the Fast Fourier Transform (FFT) to the different parts or packets and choosing the ones that overlap with the known chatter frequencies of the system. Finally, various time and frequency domain features are computed from these packets. In several papers, these features are ranked and are utilized as the input for the machine learning classifiers. Support Vector Machine (SVM) algorithm is the most common classifier used for chatter classification [11, 24, 25, 26, 27, 28, 29]. Other less common classifiers include quadratic discrimination analysis [30], Hidden Markov Model (HMM) [31], generalized HMM [32], and logistic regression [33] (cf. Fig. 1).
Wavelet packet decomposition and wavelet transform are widely adopted in machining state monitoring. Chen and Zheng [26] generated feature matrices for chatter classification using wavelet packets whose frequency bands contain the chatter frequency. Yao et al. [11] used the standard deviation and the energy of the decomposition obtained using the Discrete Wavelet Transform and the WPT for chatter detection from acceleration signals in a boring experiment. The energy of the wavelet packets were also utilized in turning experiments with comparison of different levels of WPT [34, 32]. Ding et al. [33] used wavelet packet entropy as a feature for early chatter detection. In addition to WPT, EMD and EEMD are also often utilized to featurize cutting signals. Ji et al. [24] proposed EMD to both eliminate noise from milling vibration signals and to extract features from informative Intrinsic Mode Functions (IMF). Chen et al. [25] used top-ranked features extracted from the IMFs obtained from EEMD for machining state detection. Li et al. [35] used the energy spectrum of the IMFs as features for chatter detection. The resulting features are ranked by using Fisher Discriminant Ratio (FDR) [25] and, when the number of features is high, recursive feature elimination (RFE) is used to reduce the number of features [26]. Although EMD/EEMD is typically applied to vibration signals, Liu et al. [27] also used EMD to extract features from the servo motor current time series.
In addition to WPT and EMD-based approaches, there are other methods for feature extraction from metal removal processes. For example, Thaler et al. [30] used Short-Time Fourier Transform to extract the frequency domain features of the feed force, acceleration, and sound pressure signals in band sawing operation. Moreover, the Q-factor and the power spectrum of the signal were used for chatter classification in milling [28]. Cao et al. [36] applied the Hilbert Huang transform to signals reconstructed using only the informative wavelet packets. Yesilli et al. used the topological features obtained from Topological Data Analysis and to predict chatter in turning [37] and milling [38] process. Also, similarity measure method, Dynamic Time Warping was used to detect chatter in turning process [39].
Chatter detection strategies based on WPT or EEMD require deciding on which informative parts of the signal to use. However, since searching for the informative parts of the decomposition is a multi-step process, these approaches become impractically laborious. Although the time required to obtain the needed WPT and EEMD decompositions is relatively low, choosing the informative decompositions in WPT and EEMD is often not straightforward. This is because the featurization process involves looking into the power spectra and the energy ratio plots for each signal in order to determine the most informative parts of the decomposition. Consequently, only a few cases are often analyzed and the chosen packets or decompositions are fixed and used for feature extraction for all the subsequent data sets. For example, in the WPT-based approach, the standard procedure is to pick the packets with the highest energy ratio as the most informative part of the decomposition.
Unfortunately, the resulting informative packets or decompositions may not contain chatter information especially if the system parameters shift during operation, e.g., due to the movement of the machine center which may involve changing the overhang distance of the tool and thus the flexibility of the cutting tool. Therefore, in these situations the classifier is required to categorize signals that may carry different characteristics and chatter features than the ones it was trained on. In other words, the ability of the classifier to achieve transfer learning is tested in these situations. However, there has not been any studies on the transfer learning capabilities of WPT and EEMD. Further, the common approach for picking the informative packets in WPT is to choose the packets with the highest energy. However, these packets do not necessarily contain the chatter frequency bands, and thus they may not be the most suitable markers for chatter detection.
In this paper, we compare the performance and the transfer learning capabilities of two chatter detection algorithms based on WPT and EEMD on a set of turning experiments where the cutting tool is instrumented with accelerometers. A total of four tool stickout lengths are used, which correspond to changing the eigenfrequencies of the machine-tool structure. In addition, a variety of depths of cut and cutting speeds are tested. We establish a criteria for tagging the resulting signals into chatter-free, mild chatter, or full chatter. Then, we split the data for each stickout length into training and testing sets, train our two methods with the training set and use them to classify the test signal as chatter or chatter-free. We investigate the classification performance not only of the SVM algorithm—the most popular tool for machine learning on chatter signals, but also of the logistic regression, random forest, and gradient boosting. Upon obtaining a classifier for data that corresponds to data taken from a specific cutting configuration, we test the classifier on data from other cutting configurations. We repeat the above process ten times where each time the data is randomly split into training/testing sets and we compare the average test accuracy and standard deviation of the featurization methods. We then evaluate the classification results and we comment on (1) the ease of feature extraction, i.e., the effort required and the potential for automating feature extraction. (2) the classification accuracy within a fixed cutting configuration (but with varying spindle speeds and depths of cut). And (3) transfer learning capabilities, i.e., the accuracy associated with using certain feature vectors and classification algorithms to train and test on two different cutting configurations.
Based on our investigations, we believe that WPT and EEMD are more conducive to automatic feature extraction than traditional featurization methods of chatter signals. Further, the results based on our datasets show that classifiers based on random forest, gradient boosting, and logistic regression have higher accuracy than SVM. Our results also show that when training and testing on signals from the same cutting configuration, the WPT method gives higher classification accuracy rates than EEMD. However, when testing the obtained classifier on data from different cutting configurations, we show that, for our specific cutting data, the EEMD method outperforms WPT, i.e., we show that EEMD has superior transfer learning capabilities than WPT. In addition, we discuss in Section 3 how wavelet packets with higher energy ratios do not necessarily contain chatter information. Specifically, we found in multiple cases that the packets ranked second or even third in terms of the energy ratio can include the chatter frequency band in their signal spectrum. Therefore, fixing certain informative packets or parts of the signal may not be a viable option especially when the cutting process leads to changes in the dynamic behavior of the tool-workpiece system.
The paper is organized as follows. Section 2 describes the experimental setup and the procedure for tagging the data as chatter versus non-chatter. Sections 3 and 4 describe combining the WPT and the EEMD methods, respectively, with machine learning tools for chatter detection. Section 5 gives reviews the classification algorithms used in this study briefly. Section 6 presents the results of our investigations including comparisons of the accuracy, transfer learning capabilities, and runtime of both methods, while the discussion and our concluding remarks can be found in Section 7. A contains supplementary classification results that are referenced from the main text of the manuscript.
2 Experimental Setup
Figure 2 shows the turning experiment that was used to collect the measurement data for training and testing of the chatter detection algorithms. It consists of a aluminum cylindrical workpiece mounted into the chuck of the spindle of a Clausing-Gamet cm ( inch) engine lathe. An S10R-SCLCR3A boring bar from the Grizzly T10439 carbide insert boring bar set with an attached cm ( inch) radius Titanium nitride coated cutting insert is secured to the tool holder.
The stiffness of the rod, and therefore, the eigenfrequencies of the tool vibration, are varied by changing the overhang or stickout length of the rod. Four stickout lengths are used in the experiment: 5.08 cm ( inch), 6.35 cm ( inch), 8.89 cm ( inch), and 11.43 cm ( inch). In order to obtain more accurate measurements, the stickout length is measured as the distance between the flat, back surface of the tool holder and the heel of the boring rod. The visual representation of stickout distance is given in right hand side of Fig.2. This means that increasing the stickout length leads to a stiffer cutting tool and higher eigenfrequencies for lateral vibrations. Since the lateral direction is the most flexible and chatter frequencies appear in the neighborhood of dominant eigenfrequencies of the structure, the dominant chatter frequencies increase with increasing stickout length.
The boring rod is instrumented with two PCB 352B10 miniature, lightweight, uni-axial ceramic shear accelerometers that are ninety degrees apart to measure lateral vibrations of the rod. The two accelerometers are superglued onto the rod at about 3.81 cm ( inch) away from the cutting tool to protect them from moving parts and cutting debris. A PCB 356B11 triaxial, miniature ceramic shear accelerometer is also attached to the bottom clamp of the tool holder as shown in Fig. 2. The data from all the accelerometers are collected on the analog channels of an NI USB-6366 data acquisition box using Matlab. No in-line analog filter is used; however, the signals are oversampled at kHz. Digital filtering is used before subsampling thus eliminating noise while avoiding the undesirable effects of antialising. In particular, we use a Butterworth low-pass filter with order and a cutoff frequency of kHz. The data is then downsampled to kHz without risk of causing aliasing effects. The resulting conditioned data is what we consider in Section 2.1. In addition, we provide both the raw and filtered data in a Mendeley repository [40]. Also, one can find our codes for this study in a GitHub repository.
2.1 Data Labeling
Before tagging the signals, we analyzed the time series of the two uni-axial accelerometers on the boring rod as well as the signals from the tri-axial accelerometer on the tool post, see Fig. 2. We found that although the data of the accelerometers is mostly redundant, the -vibration at the tool post, which is measured by the -axis signal of the tri-axial accelerometer, had the best signal-to-noise ratio. Therefore, we performed the data tagging exclusively using the data from this channel. Another sanity check was the comparison of the tagged signals with a few photographs of the resulting machined surface taken during the experiment, as shown in Fig. 3.
Each time series from every cutting test was examined and the different parts of the signals were labeled as either no chatter, mild/intermediate chatter, chatter, or unknown. Figure 4a shows an example of how one time series is labeled using these categories. The separation into different parts has been done based on the characteristics of the amplitude in the time domain. In particular, parts with a low amplitude were separated from parts with a large amplitude. In addition, parts with an impact-like structure with an abrupt very strong increase of the accelerations and a relatively fast decay were also separated. Then the frequency domain characteristics were studied for a final classification of the signal. In the frequency domain only the frequency components lower than kHz were considered. Specifically, the criteria that we used for classifying the signals are:
No chatter (stable):
- (a)
Low amplitude in the time domain 2. (b)
Low amplitude in the frequency domain (highest peaks at spindle rotation frequencies [41]) 2. 2.
Mild or intermediate chatter:
- (a)
Low amplitude in the time domain 2. (b)
Large amplitude in the frequency domain (highest peaks at chatter frequencies) 3. 3.
Chatter:
- (a)
Large amplitude in time domain 2. (b)
Large amplitude in the frequency domain (very high peaks at chatter frequencies which are not equal to the spindle rotation frequencies) 4. 4.
Unknown:
- (a)
All other cases
The unknown data are parts of the time series with a large amplitude in the time domain but no large peaks in the frequency domain at chatter frequencies (lower than kHz). Typically, this corresponds to the parts with impact-like structure, which might occur due to chip breakage or other inhomogeneities during the process. Also, there can be another eigenmode at 10kHz is vibrating (chattering) for unknown portion of the time series in Fig.4. However typical chatter frequencies in this process lie between 0-150 Hz for structural modes, 200-800 Hz for workpiece vibrations, and 1000-3000 Hz for tool vibrations. Therefore, it is not clear if this is chatter or something else, and therefore it was excluded from the analysis. The time domain and the frequency domain characteristics for an example time series that includes all four classes are shown in Figure 4a and Figure 4b, respectively. The first part of the time series was not classified, because in this case the tool is still not engaged in the workpiece.
From Figure 4 it becomes also clear that a process with fixed cutting conditions is not necessarily clearly stable or unstable, which is another reason why chatter detection algorithms might be helpful for practical applications. For example, the process in Figure 4 is stable at the beginning between s and s. Then, a strong perturbation drives the system away from the stable state to some chattering motion, which is a reasonable scenario because there can be a bistability between stable cutting and chatter [42, 43, 44]. Table 1 shows breakdown and the total number of the tagged time series for each stickout length.
3 Wavelet Packet Transform with Recursive Feature Elimination (RFE)
In this section we describe the Wavelet Packet Transform (WPT) with Recursive Feature Elimination (RFE) for chatter detection in metal cutting. The method can be divided into four steps, which are summarized in Fig. 5. The first step is the decomposition of the time series into wavelet packets. This is a technique from signal processing that is especially useful for a high resolution time-frequency analysis. The motivation for an additional decomposition of the signal is the increase of the signal-to-noise ratio and an increasing sensitivity for chatter features [26]. The output of the WPT are different wavelet packets and the second step is the selection of the informative packets, based on the properties of the wavelet packets and the characteristics of chatter in the considered process. The third step is the feature extraction and its automatic ranking with the RFE method, which is used to distinguish between chatter and chatter-free motion. On the basis of the extracted features the fourth step is the classification into chatter/chatter-free cases via a Support Vector Machine (SVM), Logistic Regression (LR), Random Forest (RF) classification and Gradient Boosting (GB).
3.1 Wavelet Packet Transform
We follow Ref. [26] and apply the WPT to the time series before feature extraction and classification. The WPT is an extension of the discrete wavelet transform. One level of the discrete wavelet transform decomposes the signal into a low and a high frequency component by passing it simultaneously through a low and a high pass filter. The properties of the two filters are related to each other and are determined by the chosen wavelet basis. According to [26], we use the Daubechies orthogonal wavelet db10 as the wavelet basis function. The outputs of the low and the high pass filter give the approximation coefficients and detailed coefficients denoted by and , respectively, where the subscript specifies the level of the decomposition. The resulting signal after the decomposition is called wavelet packet and can be reconstructed from the approximation or detailed coefficients by using the filter properties [26]. In the discrete wavelet transform only the output is passed again through both filters to generate two additional outputs and in the next level.
In contrast, in the WPT approach the output of the low pass filter as well as as the output of the high pass filter are both again low- and high-pass filtered to generate the wavelet packets , , and in the next level. This means that the WPT generates wavelet packets at the th level, see Fig. 6 for a schematic of level WPT. In Fig. 6, for example, denotes the packet in the third level, where in the first and the second level the low pass filter and in the third level the high pass filter have been applied. Before passing through the filters in the next level the signal is downsampled by a factor of two, which increases the frequency resolution. Moreover, since after each decomposition the two resulting wavelet packets contain only one-half of the frequencies of the input data this downsampling is possible without losing information. As a consequence the resulting wavelet packets in one level contain only a frequency band, which is mainly distinct from the bands of the other packets. Even if the frequency bands become narrower in each level, the packets contain rich information of the original signal due to the increase of the frequency resolution. The location of the frequency band is determined by the chronological order of the applied filters, which are used to generate the wavelet packet (cf. Fig. 6). In the following, the wavelet packets are labeled according to the order of their frequency band beginning with 1 for the packet with the lowest frequencies () resulting only from low pass filtering to for the packet containing the highest frequencies () resulting from a successive application of the high pass filter.
3.2 Selection of informative wavelet packets
The next step is the selection of the informative wavelet packets, which are best suited to distinguish between stable cutting and chattering motion. The criteria for the selection of the informative wavelet packets are a high signal energy in comparison to other packets for a good signal-to-noise ratio, and a significant overlap of the frequency band of the packet with possible chatter frequencies.
The identification of the band of chatter frequencies is done by examining the FFT of the signals tagged as stable, intermediate chatter, and chatter (see Section 2.1 for a description of the data labeling). Figure 7 shows example time series and the corresponding Fourier spectra for three tagged signals for the case whose stickout length, rotational rpm, and depth of cut are cm ( inch), rpm, and mm ( inch), respectively. For stable cutting the dominant frequencies are low and correspond to the spindle rotation frequency. In addition, there is a significant peak at Hz, which can be found in all measurements and probably comes from an external source. For intermediate chatter and chatter a significant part of the energy in the signal is contained at high frequencies near Hz, which is close to the eigenfrequency of the lateral tool vibration. As a consequence, these chatter frequencies become larger for increasing stickout length and for each of the four different stickout lengths a different range of chatter frequencies has been identified.
In order to analyze the properties of the wavelet packets, levels , , , and WPT are obtained from the experimental data. Figure 8 shows the resulting level energy ratios of the wavelet packets for two example cases. The energy ratios represent the fraction of energy in each packet relative to the total energy in all the packets. It is obvious from the figure that for stable cutting most of the energy in this case is concentrated in the first wavelet. In contrast, for the intermediate chatter and the chatter regions the energy is concentrated mainly in the first, third and fourth wavelet packets. This is consistent with the behavior of the frequency spectrum of the original data in Figure 7 since increasing the number of the wavelet packets corresponds to a higher frequency band.
Upon identifying the wavelet packets whose energy ratios are relatively high with respect to the other packets, the third step is to identify the packets whose spectrum has significant peaks that overlap with the chatter frequencies given in Table 2 [26]. Specifically, we reconstruct a time domain signal for each wavelet packet and obtain the corresponding FFT for each of the reconstructed signals. For the two examples with stickout length 5.08 cm ( inch) the frequency spectrum of the reconstructed signals obtained from the first four wavelet packets for the intermediate chatter and chatter regions are provided in Fig. 9. It can be seen that the peaks in the spectrum of the rd and th wavelet packet overlap with the band of the previously identified chatter frequency (– Hz, see Table 2 and Fig. 7). Since on average for the stickout length 5.08 cm ( inch) the energy ratios and the amplitudes in corresponding FFT (see Fig. 9) are slightly higher in the rd wavelet packet than in the th wavelet packet, we choose the rd packet as the informative wavelet for chatter detection at level WPT. An overview of the selected informative wavelet packet for each level of the WPT can be found in Table 2. For higher stickout length the dominant chatter frequencies increase, and therefore, in general, a wavelet packet with a higher frequency band is selected as the informative wavelet packet.
We note that the informative wavelet packet is not necessarily the one with the highest energy because it is important that the range of possible chatter frequencies are in the frequency band of the informative wavelet packet. In fact, often the first packet has the highest energy ratio but its frequency band does not overlap with the chatter frequencies, which are mainly contained in packets with a higher index (cf. Table 2).
Since the frequency band of the wavelet packets can be predicted from the WPT tree in Fig.6, it is also possible to predict the informative wavelet packet that contains information about chatter frequencies. For example, from the sampling rate kHz it follows that the first wavelet packet in level 3 corresponds to the frequency band [math]– Hz. The upper frequency limits for other packets in level 3 are equal to the corresponding wavelet packet number times the upper frequency level of the first wavelet packet (cf. Fig.6). Table 3 provides the predicted and the selected informative wavelet packets for the level WPT. For all cases, the selected informative wavelet packets are consistent with the predicted ones.
3.3 Recursive Feature Elimination
The reconstructed signal from the informative wavelet packet allows the extraction of both frequency domain as well as time domain features for chatter identification. A collection of frequency domain and time domain features, which are taken from Ref. [26], are tested in this paper and are provided in Table 4.
We used Python to train an SVM classifier combined with Recursive Feature Elimination (RFE) where in this case we have a maximum of features at the level WPT. Recursive feature elimination is an iterative process that eliminates one of the features in each iteration until all the features are removed for classification [26], which means that the number of iterations for RFE equals the number of the considered features. Elimination of features is based on their influence on the classification: the feature with the smallest effect, is eliminated in each iteration [45]. At the end, RFE returns a feature ranking list corresponding to one specific training set.
The ranked features are used to generate feature vectors where the first vector contains only the first ranked feature, while each consecutive feature vector adds the subsequent feature in the ranking until all the features are included in the th vector at the fourth level of WPT, see Tables 11–14 for examples. The classification accuracy is calculated for all feature vectors. In other words, in the first step only the top ranked feature is used, and in each further step, the next highest ranked feature was added to the feature matrix and the classification accuracy is computed again.
4 Ensemble Empirical Mode Decomposition (EEMD) with Recursive Feature Elimination(RFE)
In this section we describe the Ensemble Empirical Mode Decomposition (EEMD) for chatter detection in metal cutting. The structure of the method is similar to the WPT based approach, which is described in Section 3. However, in contrast to the WPT method here the EEMD is used for the decomposition of the original time series and the output of the EEMD are intrinsic mode functions (IMF) instead of wavelet packets. After the decomposition, the informative IMF is selected and various features for chatter detection are extracted. The features are automatically ranked via the RFE method and SVM is used to classify into chatter/chatter-free cases.
4.1 Ensemble Empirical Mode Decomposition
EEMD is based on the Empirical Mode Decomposition (EMD), which is an elementary step in the Hilbert-Huang transform [46]. Similar to WPT, EMD is useful for non-stationary signals since the resulting IMFs contain time and frequency information of the signal. The main difference in contrast to WPT and other linear decomposition methods is that the expansion bases of EMD are not fixed but are rather adaptive and they are determined by the data. On the one hand, this means that EMD is a nonlinear decomposition and, on the other hand, it is suitable for analyzing nonlinear and non-stationary data [46].
The algorithm for the decomposition of a given time series can be described as follows. The first residue is equivalent to the original data, i.e. . Then the IMFs with are generated from the residues by repeated application of the so-called sifting process described below. After extracting the th IMF , the next residue is calculated by
[TABLE]
This procedure is repeated until the result of Eq. (1), that is the th residue , becomes a monotonic function and no more IMFs can be extracted. As a result the decomposition of the original data can be given by
[TABLE]
The sifting process for the generation of the th IMF from the residue is done via the following iterative scheme. A lower and upper envelopes of the data are generated by using cubic splines for an interpolation between the local minima and maxima of the residue, respectively. The mean of the lower and upper envelope is calculated. The first guess for the IMF is obtained by the difference between the residue and . Then the first guess for the IMF is treated as the new data and the sifting process is repeated until a given stoppage criterion is fulfilled. As a consequence of the iteration, the lower and the upper envelopes of the final IMF are nearly symmetric and the mean of the latter is approximately zero. Moreover, the number of extrema and the number of zero crossings is equal or differs at most by one. IMFs with lower indices correspond to high frequency bands while the ones with higher indices correspond to lower frequency bands. These properties of the decomposition make it useful for further data analysis.
However, one major problem with the original EMD is the occurrence of mode mixing, which means that one IMF contains two signals, whose frequency bands are totally different, or a signal of similar scale is observed inside different IMFs whose frequency bands are different [47]. EEMD was developed to solve the mode mixing problem in EMD [48]. Accordingly, Wu and Huang [47] proposed the following steps for EEMD:
Create an ensemble from the original data by adding white noise. 2. 2.
Decompose each member of the ensemble into IMFs. 3. 3.
Compute the ensemble means of the corresponding IMFs.
The added white noise amplitude must not exceed of the standard deviation of the original signal while the ensemble size for the EEMD can be selected as [25]. For our analysis we used the Python package PyEMD with the default stoppage criterion [49, 50]. We set the ensemble number and the noise width parameter to and (), respectively.
4.2 Selection of informative intrinsic mode function
In order to obtain features for machine learning from vibration signals using EEMD, we first decompose the vibration signals into IMFs, see Fig. 10 for an example. For long time series, we reduced the computation time for this step by dividing the signal into shorter segments whose length is approximately points. The informative IMF selection process is very similar to their WPT counterparts, see Sec. 3.2. Specifically, the power spectrum in Fig. 11 shows that the first IMF includes the high frequency vibrations while higher order IMFs include the low frequency ones. For example, for the 5.08 cm ( inch) stickout case, the FFT of the second IMF matches the chatter frequency region (– Hz). Therefore, in this case, the second IMF is selected as the informative IMF. The informative IMFs for the other stickout cases are summarized in Table 2.
4.3 Feature extraction using EEMD
Similar to Chen et al. [25], we extract seven time domain features from the informative IMF. These features are listed in Table 5 and they include the energy ratio, peak to peak value, standard deviation, root mean square, crest factor, as well as skewness and kurtosis of the signals. The features are computed and then ranked using the Recursive Feature Elimination (RFE) method which was introduced in [51] and is described in Sec. 3.3. The feature matrix for classification is formed starting with the top-ranked feature by itself and then by concatenating, in descending order, the rest of the features one at a time. This results in seven combinations of features, which are then used for classification into chatter and chatter-free cases via four different classifiers similar to the WPT approach (cf. Sec. 5.1).
5 Classification Algorithms
This section gives background information on the different classifiers used to test the performance of considered feature extraction methods, namely SVM, logistic regression, random forest classification, and gradient boosting.
5.1 Support Vector Machine
A Support Vector Machine (SVM) is used to classify the time series by using the feature vectors. The Support Vector Machine algorithm is a supervised machine learning technique for finding the optimal hyperplane that separates two classes of a training data set. This hyperplane can then be used to classify the test data. The two dimensional case of a linear SVM is illustrated in Fig. 12. The feature vectors corresponding to two different classes, e.g. chatter (crosses) and no-chatter (circles), form two linearly separable data sets. The optimal hyperplane is selected such that the perpendicular distances from the feature vectors, which are closest to the hyperplane and also called support vectors, are equal. This means that the optimal hyperplane has the largest margin [52]. In general, it can be described by the set of points satisfying
[TABLE]
and the dashed lines where the support vectors lie on are defined according to
[TABLE]
Then, the margin of the optimal hyperplane can be denoted as . The two hyperplanes with Eq. 4, and therefore the optimal hyperplane from Eq. (3), can be found by maximizing the distance or by minimizing with the constraints
[TABLE]
The classification for a feature vector of the test set can be made by checking the sign of the expression , which defines the label for the two classes. For the theory behind multi-class classification with SVM, one can refer to [53]. For some cases the training data are not separable by a linear hyperplane. In this case, the SVM can be extended to nonlinear classification with the help of kernel functions [54].
5.2 Logistic Regression
Logistic regression is a supervised learning classification algorithm that computes the probability of two class labels for a given dependent variables [55]. It is quite similar to linear regression but if differs in that is output is divided into two categories [56]. Figure 13 illustrates linear and logistic regression on a binary dataset. In this figure, is the set of elements in the feature vector while is the dichotomous outcome variable. For dichomotous output, linear regression can be applied but the model will not fit well as shown in Figure 13a. There are are two main reasons for why the linear equation does not explain the relation between the variables and [57]: (1) the relationship between the variables does not have a linear trend and (2) the errors are not constant or they are not normally distributed. However, this problem can be solved by introducing the logit transformation.
Let be the expected value of given the value of . The regression model and the logit transformation , respectively, are defined according to [56]
[TABLE]
[TABLE]
where is defined as the sigmoid function for logistic regression in Eq. 6b. The regression model is expressed as linear function, however it is converted into nonlinear probability function with logit transformation. Although Eq. 6a is defined for only one independent input variable , the model can be further extended to a multivariate version. To assign labels for a given input , the decision boundary must first be formed. In Fig. 13b this boundary is the sigmoid function that splits tags [math] and . The values which satisfy form the decision boundary [55], and the probability at the boundary, per Eq. (6b), is . The parameters and in the regression model can be identified using maximum likelihood estimators [55].
5.3 Random Forest Classification
Ensemble learning is using multiple methods to get higher prediction rates for a given problem. Random forest is an ensemble learning method composed of decision trees where the number of these trees is part of the user input to the algorithm [58]. Each of the decision trees is composed of branch nodes with two branches emanating from each root node; hence, they are called binary trees. The nodes that have no descendants are termed leaf nodes or leafs. Assuming numeric inputs, each branch node corresponds to one variable and its split point, while the leaf nodes correspond to output variables. Fig. 14 illustrates decision tree classification using two classes ([math] and ) and two input variables ( and ). The first step is to partition the input space of the training set into rectangles (or hyper-rectangles in higher dimensions), in this case through . Selecting the partitions is based on making each subset of the training set purer, i.e., with fewer mixed labels, than the training set itself [59]. The goodness of each partition is defined by an impurity function, see [59] for a discussion on optimum splits. After defining the partitions for the training data set (left graph in Fig. 14), a tree is formed (right graph in Fig. 14).
The branch nodes of the tree correspond to conditions, either on or on , such that the samples through can be placed in one of the leaf nodes. Each leaf node is then labeled by following the plurality rule [59]: the most frequent labels in any node are assigned as the label for that node. For example in Fig. 15, leaf nodes , , and are labeled as class [math], while leafs and are labeled as class . Given a new input, a tag is generated by traversing the tree starting at the root node of the tree. The new input’s label is then matched to the the leaf node it ends up in within the tree.
In Random forest classification, there are decisions trees, and each tree votes for a label for a given test sample. The algorithm chooses a number of samples to generate the decision trees, and this is iterated until the desired number of trees is obtained. The estimation for the label of the sample is made with respect to the most frequent votes [60] as shown in Fig. 15.
5.4 Gradient Boosting
Gradient boosting algorithm was introduced by Schapire to answer the question of whether the performance of a single strong learner is equal to the set of weak learner performance [61]. Gradient boosting was proposed as an algorithm which provides more accurate predictions for regression and classification problems by generating new base models which can be linear models, smooth models and decision trees [62]. Gradient Boosting aims to correct the previous models by adding new base models to minimize the loss function. When the decision trees are used as new base models, a new decision tree is added after computing the loss function. The new decision tree is generated by parametrizing it, so that it can decrease the loss of the existing model. Specifically, the gradient descent is used to minimize the loss function value and it is applied in functional space since each of tree (base learner) can be represented as functions. Gradient boosting algorithm fits the new base models to the negative gradient of the loss function, where the choice of the loss function is user-dependent, to increase accuracy of the overall model [63].
6 Results
This section shows the classification accuracy for the different featurization methods discussed in this paper. Specifically, Sections 6.1 and 6.2 show the WPT-based and the EEMD-based results, respectively. The results are obtained by randomly splitting the data from each stickout case into training and testing sets. As described in Sections 3 and 4, we extract the features from the informative wavelet packet or informative IMF and use four different classification algorithm on the training set to obtain classifiers. Then, we test the accuracy of the classifier using the corresponding test set. We repeat this split-train-test process times, and we tabulate the averages and standard deviations of the resulting classification accuracy. In addition, transfer learning results for several cutting configurations are provided for both WPT and EEMD methods in Sec. 6.3.
6.1 Wavelet Packet Transform with RFE
In each realization of training data and test data, we repeat the feature ranking vial RFE as described in Sec. 3.3. Since the training and test sets are different in each realization, ten different rankings of the features are obtained. Figure 16 shows the ranking for the iterations where each bar corresponds to a feature whose equation is provided in Table 4. The height of the bar in the figure shows the number of times each feature is ranked for the corresponding rank number. For instance, feature (standard frequency) is the feature with the most influence on the classification in all realizations. On the other hand, features , and are ranked second, respectively, in three, four and three out of ten split-train-test realizations. In general, the features based on the frequency domain are higher ranked than the time domain features. Feature ranking plots for other stickout cases are provided in Figs. 20–22 of the Appendix.
The mean and the standard deviation of the accuracy of the classification for the realizations of training and test sets based on the level WPT method are presented in Fig. 17 for all stickout cases. In this figure, it is seen that when the number of the features is or , adding lower ranked features into the feature vector does not affect the result. This shows that RFE ranked the features properly and that lower ranked features do not have influence on the results.
One difference between the WPT-based approach that we describe in this paper and the one described in [26] is that we investigate the accuracy of the classifier using informative wavelet functions computed at each level of the WPT. On average the level and level WPT leads to better classification results in the test sets than the level and level WPT. This might be attributed to the fact that the lower level WPT contain information in a broader frequency range than the higher level WPT, and for chatter detection only the detection of chatter frequencies in the spectrum is relevant but not their frequency value or the exact shape of the peaks. We tabulate the full classification results for each level of the WPT up to level in Tables 11–14 of the Appendix. Since the feature ranking is different for each realization of the splitting into training and test data, the th ranked feature is only denoted by . Below in Table 6 we report the WPT results with the highest average accuracy out of all the different combinations of WPT levels and feature vectors and compare them to the results of the EEMD method. We also test the performance of the both method WPT and EEMD with the classifiers explained in Sec.5. Tables 7– 8 provide the accuracies obtained from Level 1 and Level 2 WPT and EEMD feature extraction methods with four different classifiers and compare the methods to each other.
6.2 Ensemble Empirical Mode Decomposition with RFE
Similar to Section 6.1, we combine EEMD with RFE and utilize four different classifiers in each realization of the splitting into test and train data set. The classification accuracy is on average better than the results from the level and level WPT and comparable to the accuracy of the lower level WPT. Table 15 of the Appendix lists the resulting mean accuracies and standard deviations for all stickout cases and feature vectors. The combination with the best accuracies in each cutting case is reported when comparing the different methods in Table 6. In this table, the results highlighted with dark blue represent the highest accuracy across a given row while those highlighted in light blue have an average accuracy which is encapsulated by the error bars of the method with the highest average accuracy.
Table 6 shows that features based on the WPT algorithm give the highest accuracy for three stickout cases out of four cutting configurations. Specifically, feature extraction with WPT and RFE is the most accurate for the 5.08 and 6.35 cm ( and ) stickout cases scoring , and respectively. While the results from EEMD give the highest accuracies for 8.89 cm (3.5 inch) stickout cases, WPT result for this case still lies within the error bars of EEMD results. We also see the results when we used different classifiers other than SVM in Tab.7 and 8. In Table 7, performance of Level 1 WPT is better than EEMD since WPT has the highest accuracies in three cutting configuration cases and EEMD results are in error bars of WPT results. On the other hand, Table 8 indicates that both methods has the highest accuracy for two cutting configurations. These two table also provide the evidence that lower level (Level 1) WPT outperforms EEMD. Further, 100% accuracy is observed in Tab.7 and 8 for two different cutting configurations. These cutting configurations has the lowest number of time series as experimental data. Since time series are not split into smaller pieces for WPT method, so the size of the test set is quite small and it is possible to get such high results.
The standard deviation of the WPT results is quite high as it is seen from Tab.6 and Fig.18 since the computation time for this method does not require splitting a long time series into smaller pieces. Therefore, the total number of samples for identical stickout cases is smaller in comparison to the EEMD method where long time series were split into shorter ones of approximately points, thus increasing the number of samples and resulting in tighter error bars. Therefore, the amount of deviation can be reduced, especially for the WPT-based approach, by increasing the size and the number of the training sets. In addition, Table 6 compares the run time in seconds for each of the different featurization methods for chatter detection. These comparisons were performed using a Dell Optilex 7050 desktop with Intel Core i7-7700 CPU and 16.0 GB RAM. It can be seen that feature extraction with WPT and RFE is the fastest across all of the stickout cases. We point out that the built-in WPT package that we used is highly optimized, whereas in comparison the EEMD does not enjoy the same level of code optimization. Moreover, for EEMD the EMD is performed for an ensemble of time series with ensemble size , which needs much higher computation effort and can be reduced by varying the ensemble parameters of the EEMD.
6.3 Transfer learning capabilities
Transfer learning was applied to the WPT and the EEMD: A classifier is trained on the data of the 5.08 cm (2 inch) or the 11.43 cm (4.5 inch) cases and tested on the data of another stickout length. The reason why these two cutting configurations were chosen for the training data set is that these are the ones with the largest number of cases (see Table 1). The classifier is trained with of the training data set and tested on 70 % of the test set. Classification is repeated for realizations of the split-train-test process. The classification has been done for the level WPT, the level WPT and the EEMD methods. The mean and the standard deviation of the classification accuracies for WPT and EEMD methods can be found in Tables 16–18 of the Appendix. The main result can be seen already in the best results for transfer learning between the 5.08 cm (2 inch) and the 11.43 cm (4.5 inch) stickout cases presented in Table 9.
Table 9 shows that even the highest accuracies for level 4 transform are either equal to or smaller than the ones for level 1 WPT , which means that lower level WPT achieve better transfer learning classification rates with SVM. This statement is true for the results obtained with other classifiers except in the first application of transfer learning where we train classifier on 5.08 cm (2 inch) case and test it on 11.43 cm (4.5 inch). In general, lower level wavelet packets have broader frequency range compared to higher ones, so wavelet packets in the lower level transform are more likely to include the chatter frequency information for the cases whose informative wavelet packet number is different. When the informative wavelet packet number of the training and test sets are the same, Tables 16–17 show that higher transfer learning classification scores are achieved. For instance, a classifier trained with 5.08 cm (2 inch) stickout case has the first wavelet packet as the informative one in level 1 WPT. When this classifier is tested on other stickout cases, it is seen that it gives high accuracies for 6.35 and 8.89 cm (2.5 and 3.5 inch) cases as expected since these cases have also the first wavelet packet as the informative one. However, when the classifier is tested on the 11.43 cm (4.5 inch) case, where the second packet is the informative one (see Table 2), the classification accuracy dramatically decreases. For level 4 WPT, all cutting configurations have different informative packet numbers. Therefore, classification results are not as high as in level 1 WPT for all stickout cases.
Further, EEMD has better performance than WPT in transfer learning, although the best accuracy obtained for the classifier trained with 5.08 cm (2 inch) belongs to WPT. Fig.19 shows that the WPT method provides the highest accuracy for that application of transfer learning with high amount of deviation. In addition, we see the same trend when we train a classifier with two different cutting size data set features and tested it on other remaining two cutting size data set features in Tab. 10. The plot of the best accuracies with error bands are provided in Fig.23. In Table 10, first application of transfer learning provides the highest accuracy with EEMD methods whose deviations is 0.6% while WPT outperforms the EEMD with accuracy difference of 3.7% and WPT has the 12.1 % for the highest accuracy when training set and test set cases are interchanged. Moreover, it is seen that there is drops and sudden increases in accuracy when we use different classifiers in right hand side of Tab.10. The reason for this is that the classifiers which provides low accuracies has over fitting problem even though the RFE is utilized during classification. It is worth to note that we have used the default parameters for all classifiers except Random Forest Classification. Number of decision trees and maximum depth of trees are selected as 100 and 2, respectively. The reason why EEMD has low accuracy for Random Forest and Gradient Boosting can be explained with the classifier parameters which are not tuned. If one increase the maximum depth of decision trees, which will lead to generate purer leaves, or tune the parameters for Gradient Boosting, the accuracy can increase and over fitting can be solved. However, we keep parameters for all classifiers fixed in all classifications.
Low order IMFs include the chatter frequency in all the studied cases, and the difference between informative IMF numbers is not large as in the WPT-based approach. Recall that the informative IMF is the second IMF for the 5.08 cm (2 inch) and the 6.35 cm (2.5 inch) case and the first IMF for the 8.89 cm (3.5 inch) and the 11.43 cm (4.5 inch) case. It is expected that information on machine tool chatter is contained in both IMFs for all stickout cases, which may explain the high accuracies for some transfer learning cases, where the IMF is not the same. Specifically, this can be seen for the EEMD based classifier, which was trained on the 5.08 cm (2 inch) case and tested on the 8.89 cm (3.5 inch) case, as well as the one, which was trained on the 11.43 cm (4.5 inch) case and tested on the 5.08 cm (2 inch) case (cf. Table 18 of the Appendix).
7 Conclusion
Two advanced chatter detection methods, i.e. the Wavelet Packet Transform (WPT) and the Ensemble Empirical Mode Decomposition (EEMD) with Recursive Feature Elimination (RFE) have been used for the classification of recorded acceleration signals from a turning process into chatter-free cutting or chattering motion. We use the two algorithms not only to classify measured test data with the same cutting conditions as used in the training phase but also for transfer learning, which means that the test data originates from a cutting process with different cutting conditions. In particular, the chatter frequencies between the training data and the test data differ significantly.
Our results in Table 6 show that WPT have the highest accuracies for three cutting configurations when the classifier is trained and tested on the same data set while EEMD provides the best results for one cutting configuration. In addition, training different classifiers other than SVM leads to increase in mean accuracies for both method as shown in Tab. 7 and 8. These two tables is the evidence that WPT performance decreases as we increase the level of transform. On the other hand, Table 9 and 10 show that EEMD performs better when the transfer learning approach is used, although WPT provides the best results for a few transfer learning applications in these tables. This is due to the fact that the selected informative intrinsic mode functions (IMFs) of the EEMD contain information of a broader frequency band than the informative wavelet packets of the WPT. For transfer learning applications where the WPT methods has the highest accuracy (see Tab. 9 and 10), it seen that the WPT provides the high accuracy with larger deviations and the EEMD results are in the error range of these best results. For similar reasons, in general, low level WPT (level 1 or 2) performs better than higher level WPT (see Table 6) because the informative wavelet packets of a lower level contains a broader frequency band and it is more likely that the signatures of chattering motion or chatter-free cutting can be accurately detected. Nevertheless, for a specific process without many changes in the dynamics during the process also a high level WPT with a narrow frequency band may be useful. In addition to accuracy comparisons, we also recorded the overall runtime or each method. Table 6 shows that WPT has the fastest runtime, while EEMD method clocks the longest runtime. This slowdown is mostly related to the computation of the ensemble of IMFs and can be reduced by changing the ensemble parameters and optimizing the code.
There are two main drawbacks of the methods.
- the WPT featurization process is cumbersome since it requires taking the WPT of the signal, investigating the packets that contain the chatter frequencies, and then choosing the packet that has considerably high energy ratio and that includes chatter frequency. Once these packets are found they are fixed for the investigated process and are used for chatter classification. However, inherent to this process is the a priori identification of chatter frequency and the assumption that the chosen packets (referred to as the informative packets) will always contain it. This is a limitation since (a) it requires highly skilled users for analyzing the signal and extracting the informative packets, and (b) the chatter frequency band can move during the cutting process which will yield the informative packets ineffective for chatter classification. Further, Section 3 points out that the informative packets are not necessarily the ones with the higher energy ratio. This makes automating the feature selection process more difficult in the WPT approach. The EEMD also suffers some of these drawbacks since the process for choosing the informative IMFs and the informative packets in WPT is quite similar. The second drawback is that 2) it is not always possible to differentiate between intermediate and full chatter. Specifically, although the intermediate chatter time series (Fig. 7c) and the chatter time series (Fig. 7e) are visually very different in the time domain, their energy content shown in the top graph of Fig. 8 can be too close to distinguish between the two cases.
Acknowledgement
This material is based upon work supported by the National Science Foundation under Grant Nos. CMMI-1759823 and DMS-1759824 with PI FAK.
Appendix A Supplemental classification results
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1[1] F. W. Taylor, “On the art of cutting metals,” Transactions of ASME , vol. 43, pp. 31–350, 1907.
- 2[2] Y. Altintas and M. Weck, “Chatter stability of metal cutting and grinding,” CIRP Annals , vol. 53, pp. 619 – 642, 2004.
- 3[3] G. Quintana and J. Ciurana, “Chatter in machining processes: A review,” International Journal of Machine Tools and Manufacture , vol. 51, no. 5, pp. 363–376, 2011.
- 4[4] J. Munoa, X. Beudaert, Z. Dombovari, Y. Altintas, E. Budak, C. Brecher, and G. Stepan, “Chatter suppression techniques in metal cutting,” CIRP Annals , vol. 65, no. 2, pp. 785–808, 2016.
- 5[5] Y. Altintas, Manufacturing Automation: Metal Cutting Mechanics, Machine Tool Vibrations, and CNC Design . Cambridge University Press, 2012.
- 6[6] A. Otto, S. Rauh, M. Kolouch, and G. Radons, “Extension of tlusty’s law for the identification of chatter stability lobes in multi-dimensional cutting processes,” Int. J. Mach. Tools Manuf. , vol. 82–83, pp. 50 – 58, 2014.
- 7[7] S. Smith and J. Tlusty, “Stabilizing chatter by automatic spindle speed regulation,” { CIRP } Annals - Manufacturing Technology , vol. 41, no. 1, pp. 433 – 436, 1992.
- 8[8] Y. Altintas and P. K. Chan, “In-process detection and suppression of chatter in milling,” International Journal of Machine Tools and Manufacture , vol. 32, no. 3, pp. 329 – 347, 1992.
