Convolutional herbal prescription building method from multi-scale facial features
Huiqiang Liao, Guihua Wen, Yang Hu, Changjun Wang

TL;DR
This paper introduces a multi-scale convolutional neural network approach to predict Traditional Chinese Medicine prescriptions from facial images, leveraging facial features at different granularities.
Contribution
It proposes a novel multi-scale CNN model that captures facial features at organ, local, and whole-face levels to generate TCM prescriptions from face images.
Findings
Multi-scale CNNs outperform single-scale models.
Facial features contain significant information for TCM prescription prediction.
The method demonstrates promising results in mining face-prescription relationships.
Abstract
In Traditional Chinese Medicine (TCM), facial features are important basis for diagnosis and treatment. A doctor of TCM can prescribe according to a patient's physical indicators such as face, tongue, voice, symptoms, pulse. Previous works analyze and generate prescription according to symptoms. However, research work to mine the association between facial features and prescriptions has not been found for the time being. In this work, we try to use deep learning methods to mine the relationship between the patient's face and herbal prescriptions (TCM prescriptions), and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient's face image. It is a novel and challenging job. In order to mine features from different granularities of faces, we design a multi-scale convolutional neural network based on three-grained face, which mines the…
| Parameters | Values | Explanations |
|---|---|---|
| rotation_range | 25 | A value in (0-180) within which images are randomly rotated |
| width_shift_range | 0.05 | Translate horizontally by a certain percentage of width |
| height_shift_range | 0.05 | Translate vertically by a certain percentage of height |
| zoom_range | 0.2 | The zoom range is [1-0.2, 1+0.2] |
| horizontal_flip | True | A flag for randomly flipping half of the images |
| Parameters | values |
|---|---|
| number of face images | 9653 |
| number of face images after data augmentation | 18463 |
| size of face images | 224x224x3 |
| size of local region block images(left cheek, right cheek, chin) | 112x112x3 |
| size of organ block images(left-eye, right-eye, nose, mouth) | 56x56x3 |
| size of test set selected in 5-fold cross-validation | 500 |
| Parameters | values |
|---|---|
| number of prescriptions | 9653 |
| number of Chinese herbal medicine species | 559 |
| maximum length of TCM prescriptions | 28 |
| minimum length of TCM prescriptions | 2 |
| average length of TCM prescriptions | 14 |
| Average times each Chinese herbal medicine appears in prescriptions | 240 |
| Proportion of Chinese herbal medicine that appear in more than 100 prescrptions | 36% |
| Parameters | values |
|---|---|
| the size of convolutional kernel | 3x3 |
| numbers of convolutional kernel in conventional CNN | 32,64,128 |
| numbers of convolutional kernel in multi-scale CNN based on three-grained face | 16,32,64 |
| the size of max-pooling | 2x2 |
| dropout rate | 0.4 |
| number of neurons in the fully connected layer | 256 |
| activation of convolutional layer and fully connected layer | relu |
| number of output layer | 559 |
| activation of output layer | sigmoid |
| learning rate | 0.01 |
| batch size | 64 |
| number of epochs | 300 |
| model | precision(%) | recall(%) | f1-score(%) |
|---|---|---|---|
| random forest - baseline | 37.25 2.12 | 35.50 2.97 | 33.07 1.96 |
| conventional | 38.60 2.76 | 37.60 1.53 | 36.69 1.63 |
| conventional | 39.34 2.56 | 38.21 1.92 | 37.66 1.97 |
| multi-scale based on three-grained face | 41.44 2.65 | 40.67 1.70 | 39.88 0.73 |
| multi-scale based on three-grained face | 42.56 2.48 | 41.15 2.01 | 40.69 1.67 |
| image size | 32x32 | 56x56 | 84x84 | 112x112 | 168x168 | 224x224 | average |
|---|---|---|---|---|---|---|---|
| precision(%) of conventional | 38.97 | 39.42 | 39.27 | 38.83 | 38.18 | 36.69 | 38.48 |
| precision(%) of based on three-grained face | 37.88 | 38.28 | 38.66 | 39.93 | 40.04 | 39.88 | 39.36 |
| recall(%) of conventional | 38.76 | 39.10 | 39.34 | 38.41 | 37.70 | 37.60 | 38.48 |
| recall(%) of based on three-grained face | 37.97 | 38.37 | 38.70 | 40.16 | 40.59 | 40.67 | 39.41 |
| f1-scores(%) of conventional | 38.97 | 39.42 | 39.27 | 38.83 | 38.18 | 36.69 | 38.61 |
| f1-scores(%) of based on three-grained face | 37.88 | 38.28 | 38.66 | 39.93 | 40.04 | 39.88 | 39.11 |
|
|
real prescription | 甘草 白芍 川芎 当归 茯苓 党参 白术 黄连 熟地黄 肉桂 厚朴 白芷 蔓荆子 |
|---|---|---|
| (Radix Glycyrrhizae, Radix Paeoniae Alba, Rhizoma Chuanxiong, Angelica, Poria Cocos, Radix Codonopsis, Macrocephalae Rhizoma, Goldthread, Prepared Rehmannia Root, Cinnamon, Mangnolia Officinalis, Radix Angelicae Dahuricae, Fructus Viticis) | ||
| conventional | 甘草 茯苓 白术 浙贝母 砂仁 蜈蚣子 | |
| (Radix Glycyrrhizae, Poria Cocos, Macrocephalae Rhizoma, Thunberg Fritillary Bulb, Fructus Amomi, Scolopendra) | ||
| conventional | 甘草 法半夏 茯苓 党参 白术 山药 | |
| (Radix Glycyrrhizae, Rhizoma Pinellinae Praeparata, Poria Cocos, Radix Codonopsis, Macrocephalae Rhizoma, Dioscoreae Rhizoma) | ||
| based on three-grained face | 甘草 茯苓 党参 白术 蜈蚣 | |
| (Radix Glycyrrhizae, Poria Cocos, Radix Codonopsis, Macrocephalae Rhizoma, Scolopendra) | ||
| based on three-grained face | 甘草 白芍 茯苓 党参 白术 蜈蚣 | |
| (Radix Glycyrrhizae, Radix Paeoniae Alba, Poria Cocos, Radix Codonopsis, Macrocephalae Rhizoma, Scolopendra) | ||
|
|
real prescription | 甘草 法半夏 茯苓 前胡 桔梗 薏苡仁 浙贝母 细辛 天麻 鳖甲 款冬花 莪术 炙麻黄 蜈蚣 白花蛇舌 |
| (Radix Glycyrrhizae, Rhizoma Pinellinae Praeparata, Poria Cocos, Radix Peucedani, Platycodonis Radix, Coicis Semen, Fritillaria Thunbergii Miq, Asarum Sieboldi Mig, Gastrodiae Rhizoma, Trionycis Carapax, Flos Farfarae, Curcuma Zedoary, Fried Herba Ephedrae, Scolopendra, Herba Hedyotidis) | ||
| conventional | 甘草 柴胡 党参 酸枣仁 生地黄 红花 延胡索 浙贝母 山药 天麻 鳖甲 蜈蚣 白花蛇舌 天山雪莲 半枝莲 | |
| (Radix Glycyrrhizae, Radix Bupleuri, Radix Codonopsis, Semen Zizyphi Spinosae, dried Rehamnnia Root, Carthamus Tinctorious, Corydalis Rhizoma, Thunberg Fritillary Bulb, Dioscoreae Rhizoma, Gastrodiae Rhizoma, Trionycis Carapax, Scolopendra, Herba Hedyotidis, Saussureae Involucratae Herba, Scutellariae Barbatae Herba) | ||
| conventional | 甘草 茯苓 党参 白术 山药 天麻 鳖甲 蜈蚣 白花蛇舌 天山雪莲 半枝莲 | |
| (Radix Glycyrrhizae, Poria Cocos, Radix Codonopsis, Macrocephalae Rhizoma, Dioscoreae Rhizoma, Gastrodiae Rhizoma, Trionycis Carapax, Scolopendra, Herba Hedyotidis, Saussureae Involucratae Herba, Scutellariae Barbatae Herba) | ||
| based on three-grained face | 甘草 茯苓 薏苡仁 党参 天麻 鳖甲 蜈蚣 白花蛇舌 天山雪莲 | |
| (Radix Glycyrrhizae, Poria Cocos, Coicis Semen, Radix Codonopsis, Gastrodiae Rhizoma, Trionycis Carapax, Scolopendra, Herba Hedyotidis, Saussureae Involucratae Herba) | ||
| based on three-grained face | 甘草 茯苓 薏苡仁 党参 浙贝母 天麻 鳖甲 蜈蚣 白花蛇舌 天山雪莲 | |
| (Radix Glycyrrhizae, Poria Cocos, Coicis Semen, Radix Codonopsis, Fritillaria Thunbergii Miq, Gastrodiae Rhizoma, Trionycis Carapax, Scolopendra, Herba Hedyotidis, Saussureae Involucratae Herba) | ||
|
|
real prescription | 法半夏 茯苓 前胡 桔梗 防风 白芷 款冬花 紫菀 白前 百部 炙甘草 辛夷 紫苏梗 广藿香 蜜麻黄 |
| (Rhizoma Pinellinae Praeparata, Poria cocos, Radix Peucedani, Platycodonis Radix, Radix Saposhnikoviae, Radix Angelicae Dahuricae, Flos Farfarae, Aster Tataricus Linn, Rhizoma Cynanchi Stauntonii, Radix Stemonae, Radix Glycyrrhizae Preparata, Flos Magnoliae, Perilla Stem, Herba Pogostemonis, Honey-fried Herba Ephedrae) | ||
| conventional | 白芍 陈皮 防风 党参 枸杞子 白芷 炙甘草 首乌藤 | |
| (Radix Paeoniae Alba, Tangerine Peel, Radix Saposhnikoviae, Radix Codonopsis, Fructus Lycii, Radix Angelicae Dahuricae, Radix Glycyrrhizae Preparata, Caulis Polygoni Multiflori) | ||
| conventional | 麻黄 白芍 川芎 防风 荆芥穗 白芷 豆蔻 炙甘草 辛夷 广藿香 | |
| (Herba Ephedrae, Radix Paeoniae Alba, Rhizoma Chuanxiong, Radix Saposhnikoviae, Herba Schizonepetae, Radix Angelicae Dahuricae, Fructus Amomi Rotundus, Radix Glycyrrhizae Preparata, Flos Magnoliae, Herba Pogostemonis) | ||
| based on three-grained face | 陈皮 法半夏 茯苓 前胡 太子参 款冬花 紫菀 北沙参 炙甘草 | |
| (Tangerine Peel, Rhizoma Pinellinae Praeparata, Poria Cocos, Radix Peucedani, Radix Pseudostellariae, Flos Farfarae, Aster Tataricus Linn, Radix Glehniae, Radix Glycyrrhizae Preparata) | ||
| based on three-grained face | 法半夏 茯苓 前胡 桔梗 防风 款冬花 紫菀 白前 百部 北沙参 炙甘草 广藿香 炒紫苏子 蜜麻黄 | |
| (Rhizoma Pinellinae Praeparata, Poria cocos, Radix Peucedani, Radix Platycodi, Radix Saposhnikoviae, Flos Farfarae, Aster Tataricus Linn, Rhizoma Cynanchi Stauntonii, Radix Stemonae, Radix Glehniae, Radix Glycyrrhizae Preparata, Herba Pogostemonis, Fried Perilla Fruit, Honey-fried Herba Ephedrae) | ||
| The red bold type of Chinese herbal medicines indicate that it has appeared in the real prescription. | ||
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTraditional Chinese Medicine Studies · Image Processing Techniques and Applications · Advanced Image Processing Techniques
∎
11institutetext: Huqiang Liao 22institutetext: School of Computer Science and Engineering in South China University of Technology, Guangzhou, China
22email: [email protected] 33institutetext: GuiHua Wen( ✉ ) 44institutetext: School of Computer Science and Engineering in South China University of Technology, Guangzhou, China
44email: [email protected] 55institutetext: Yang Hu 66institutetext: School of Computer Science and Engineering in South China University of Technology, Guangzhou, China
66email: [email protected] 77institutetext: Changjun Wang 88institutetext: Department of Traditional Chinese Medicine in Guangdong General Hospital, China
88email: [email protected]
Convolutional herbal prescription building method from multi-scale facial features
Huiqiang Liao
Guihua Wen
Yang Hu
ChangJun Wang
Abstract
In Traditional Chinese Medicine (TCM), facial features are important basis for diagnosis and treatment. A doctor of TCM can prescribe according to a patient’s physical indicators such as face, tongue, voice, symptoms, pulse. Previous works analyze and generate prescription according to symptoms. However, research work to mine the association between facial features and prescriptions has not been found for the time being. In this work, we try to use deep learning methods to mine the relationship between the patient’s face and herbal prescriptions (TCM prescriptions), and propose to construct convolutional neural networks that generate TCM prescriptions according to the patient’s face image. It is a novel and challenging job. In order to mine features from different granularities of faces, we design a multi-scale convolutional neural network based on three-grained face, which mines the patient’s face information from the organs, local regions, and the entire face. Our experiments show that convolutional neural networks can learn relevant information from face to prescribe, and the multi-scale convolutional neural networks based on three-grained face perform better.
Keywords:
Convolutional neural networks Face Prescription Traditional Chinese Medicine
1 Introduction
TCM (Traditional Chinese Medicine) was developed through thousands of years of empirical testing and refinement, and played an important role in health maintenance for the Chinese ancient people Cheung (2011). It is a theoretical system that is gradually formed and developed through long-term medical practice. TCM has the advantages of convenience, cheap and low side effects, and is suitable for use in hospitals, even in community hospitals with poor conditions.
Prescription in TCM consists of a variety of herbs, which is the main way to treat diseases for thousands of years. In the long Chinese history, a lot of prescriptions have been invented to treat diseases and more than 100,000 have been recorded Qiu (2007). An example of a prescription in Dictionary of Traditional Chinese Medicine Prescriptions is given in Figure 1 H. Peng (1996); Yao et al. (2018).
There are four important diagnostic methods in TCM: Observing, Listening, Inquiry, Pulse feeling. Observing understands the state of health or disease through objective observation of all visible signs and effluents of the whole body and part of the body. Face diagnosis is a common method of observing, which can understand the pathological state of various organs in the body by observing changes in facial features Yiqin (2012). Face appearance signals information about an individual Jones (2018). The face is rich with capillaries, which is like a mirror that reflects the physiological pathology of humans. From the view of TCM, the characteristics of the various regions of the face represent the health status of various internal organs of the human body. The doctor can judge the physical condition of the patient by observing the facial features of the patient.
Computer aided diagnosis (CAD) based on artificial intelligence (AI) is an extremely important research field in intelligent healthcare Chen et al. (2017). According to a survey, deep learning algorithms, especially convolutional neural networks(CNN), have been widely used in various fields of medical image processing in recent years due to their excellent performances in the field of computer vision, such as disease classification, lesion detection, and substructure segmentation Litjens et al. (2017). From the 306 papers reviewed in this survey, it is evident that deep learning has pervaded every aspect of medical image analysis Litjens et al. (2017). End-to-end convolutional neural network training has become a good choice for medical image processing tasks.
However, to the best of our knowledge, there has not been research work that mines the relationship between the patient’s face and TCM prescriptions. In realistic TCM, doctors prescribe through features of face, tongue, pulse, voice, and symptoms. Using face images to generate TCM prescription is of great significance to assist doctors in the diagnosis and treatment. Especially for some young doctors, the generated prescriptions can give them some references. It can recommend prescriptions to doctors. After making some modifications, doctors can apply them to practice. It saves treatment costs compared to directly prescribe from scratch and improve the efficiency of the doctor’s prescribing. A large number of data samples can be used to learn the relevant information of patient’s face and TCM prescription. Learning to how to prescribe through the patient’s diagnosis data can provide a reference for TCM doctors to observe and diagnose patients.
In this paper, we propose to use deep learning (convolutional neural network) to prescribe (TCM prescriptions) based on the patient’s face image. The main work is as follows:
-
A conventional convolutional neural network was designed to encode the patient’ face image features and generate TCM prescriptions.
-
Considering different facial organs(eyes, nose, mouth) and regions(cheeks, chin) represent the status of internal organs(heart, liver, spleen, lungs and kidney) in various parts of the human body, a multi-scale convolutional neural network based on three-grained face is proposed to extract feature of facial organs, facial regions and entire face to learn to generate TCM prescriptions.
-
We conduct experiments to verify the effectiveness of convolutional neural network for face feature encoding and prescription generation.
The rest of this paper is organized as follows. In Section 2, we discuss some related work on TCM prescriptions and medical image processing. Section 3 illustrates task description and methodology. Section 4 elaborates and analysis the experiment results. We have some discussions in Section 5 and conclude this paper in Section 6.
2 Related Work
Deep learning in medical image processing. Deep learning and convolutional neural network have become popular topics in medical image processing. There are already a lot of research works that apply deep learning to medical image processing. In terms of disease classification, there are studies on breast cancer image classification Bayramoglu et al. (2016); Chougrad et al. (2018), lung pattern classification Anthimopoulos et al. (2016), Alzheimer’s disease classification Hon and Khan (2017). In the detection of lesion targets and diseases, there are cancerous tissue recognition Stanitsas et al. (2017), detecting cardiovascular disease Wang et al. (2017), and melanoma recognition Yu et al. (2017). In the segmentation of organs and substructures, there are studies on skin lesion segmentation Yuan et al. (2017), microvasculature segmentation of arterioles Kassim et al. (2017), tumor segmentation Zhao et al. (2018). In addition, there are many other applications, such as studies of visual attention of patients with Dementia Chaabouni et al. (2017), diagnosis of cirrhosis stage Xu et al. (2017), constructing brain maps Zhao et al. (2017).
TCM prescriptions. On the other hand, some work has been devoted to the study of TCM prescriptions. Some studies analyzed and explored TCM prescriptions and discovered the regularity Liu et al. (2012); Zheng et al. (2014); Xie et al. (2017); Zhang et al. (2012). Some studies used topic model to discover prescribing patterns Yao et al. (2018, 2015). There are other studies such as TCM medicine classification and recognition Dehan et al. (2014); Weng et al. (2017), knowledge graph Yu et al. (2017); Weng et al. (2017) for TCM.
In practice, TCM doctors can judge the health of various internal organs of the body by observing the patient’s face. Combining other characteristics, doctors can give TCM prescription based on their knowledge. Our work is to try to simulate and learn this process. Using deep learning techniques, we can learn how to prescribe from a large amount of medical data. At the current stage, the medical data from which we learn are the patient’s face images and corresponding prescriptions. The study is of great importance to assist doctors to diagnosis and treat.
3 Methodology
3.1 Data Collection and Task Description
The data set used in our study are collected from cooperative hospitals. After preprocessing, there are 9,653 data pairs totally. Each data pair contains a patient’s face image and a corresponding TCM prescription.
All Chinese herbal medicines are included in a unified dictionary . The i-th element in represents the i-th Chinese herbal medicines, and there are Chinese herbal medicines. In our dataset, is 559. Each TCM prescription can be represented by a vector . The element in can only be 1 or 0, indicating whether the Chinese herbal medicine is prescribed. Each patient’s face image is represented by a pixel matrix , and the size of is 224x224x3. represents all face images in dataset and represents all prescriptions in dataset.
The task of this paper is to input a patient’s face image (pixel matrix ) and output the patient’s corresponding prescription . The prescription is a multi-label vector. In fact, the task is a multi-label learning. Multi-label learning studies the problem where each example is represented by a single instance while associated with a set of labels simultaneously Zhang and Zhou (2014).
3.2 Construction of conventional Convolutional Neural Network
Deep convolutional neural networks are widely used in the field of image processing. It can extract potential features from the original pixel matrix with RGB color channels for use in various image tasks such as classification, detection, segmentation. Classical convolutional neural network structures include AlexNet Krizhevsky et al. (2012), VGGNet Simonyan and Zisserman (2015), GoogleNet Szegedy et al. (2015), ResNets He et al. (2016); Xie et al. (2017); Zagoruyko and Komodakis (2016), DenseNet Huang et al. (2017) and SENet Hu et al. (2018).
The convolutional neural network used for prescription generation is composed of several convolutional modules and fully connected layers. Each convolutional module includes a convolutional layer and a pooling layer. In order to extract features from the image, the convolutional layer uses some convolutional kernels to scan image matrix to reconstruct a feature map . A convolutional kernel is a weight matrix. We use to represent it. The above operation can be abstracted as a function with the relu Glorot et al. (2011) activation function:
[TABLE]
In order to extract more important features and reduce the computational complexity, the max-pooling layer is used to downsample the feature map , which can be represented by the following function (the parameters of max-pooling layer are omitted):
[TABLE]
Three consecutive convolution and pooling operations can be abstracted into the following function:
[TABLE]
In order to encode features, several fully connected layers are usually connected to the end of several convolution modules. The weight parameters of the fully connected layer layer are denoted by . An operation of fully connected layer (with a relu activated function) can be abstracted as the following function:
[TABLE]
The last layer is the output layer, which is a fully connected layer with sigmoid activation function. The weight is represented by . It outputs the probability of whether each Chinese herbal medicine is prescribed, which can be abstracted as the following function:
[TABLE]
is the set of all parameters, and the convolutional kernel for each convolutional operation described above is different.
The loss function of the convolutional neural network is designed as the average value of multiple cross-entropy. Each cross-entropy measures the difference between the probability of prescribing of each Chinese herbal medicine and actual output . The neural network minimizes the loss function by optimizing all parameters using stochastic gradient descent Bottou (2012), which can be abstracted as the following functions( is the size of the dataset):
[TABLE]
[TABLE]
The structure of the convolutional neural network is shown in Figure 2. It contains three convolution modules for extracting features, a fully connected layer for coding features, and the final output layer. All the sizes of convolution kernels are 3x3. The input of the network is the face image matrix of the patient, and the size is 224x224x3. The number of elements in the output layer is , the size of the Chinese herbal medicine dictionary , and each unit represents the probability that a certain Chinese herbal medicine is prescribed. The number of dimensions of the real output is , and each value is 0 or 1, indicating whether to prescribe. The loss is the average cross-entropy loss calculated from the network output and the real output . contains the probabilities of being prescribed for all Chinese herbal medicine in dictionary . Finally, according to dictionary , a final prescription is obtained by sampling from through a probability threshold t.
3.3 Construction of multi-scale Convolutional Neural Network based on three-grained face
Different regions of a face image have different local statisticsTaigman et al. (2014). Taigman et al. Taigman et al. (2014) use locally connected layers, which like a convolutional layer but every location in the feature map learns a different set of filters, to deal with this problem. However, the use of local layers greatly increases the parameters of the model. Only a large amount of data can support this approach, so instead of doing this, we extract features of different facial regions using different small convolutional networks.
According to TCM, the characteristics of various regions of the face represent the health of various internal organs of the human body. In order to encode the features of each region of the face more efficiently, the paper proposes a multi-scale convolutional neural network based on three-grained face. The “three-grained” refers to the organ block, the local region block, and the face block. Each block extracts characteristics of the face area from different granularities. The organ block includes the left eye, right eye, nose, and mouth. The local region block includes the left cheek, right cheek, and chin. The face block means the entire face. The network is expected to extract and encode more effective facial features from different granularities, thereby improving the effectiveness of prescription generation.
In the data preprocessing stage, the patient’s face is segmented to obtain different region images of the face. An example of different region images after cutting the face Jain and Learned-Miller (2010) is given in Figure 3. The sizes of different regions images are reduced. The organ block images includes a left-eye image , a right-eye image , a nose image , a mouth image , and their sizes are 56x56x3. The local region block images includes a left cheek image , a right cheek image and a chin image , and their sizes are 112x112x3. The face block means to the entire face , and the size of face image is 224x224x3.
3.3.1 Extracting feature of facial organ
Firstly, feature extraction is performed on the organ block. After convolution of four organ block images, concatenate the four feature maps. The operation can be abstracted as the following functions:
[TABLE]
[TABLE]
In the field of computer vision applications, there is often not enough data, and the overfitting of models easily occur. Usually, dropout Srivastava et al. (2014) is used to prevent overfitting. Dropout randomly discards neural units during training phase. This prevents units from co-adapting too much and force the network to learn more robust features. It reduces the size of the network during the training phase and gets a number of more streamlined networks that have similar integration effects Srivastava et al. (2014). After dropout the above feature map , a convolution operation is performed again to obtain a feature map , which extracts features of organ block. The above operations can be abstracted as the following function:
[TABLE]
3.3.2 Extracting feature of facial local region
Secondly, feature extraction is performed on the local region block. After convolution and max-pooling of the three local region block images, concatenate the three local region block feature maps together with the feature map extracted by the organ block. The above operation can be abstracted as the following functions:
[TABLE]
[TABLE]
After dropout the above feature map , convolution and max-pooling operations are performed to extract features to obtain a feature map (fuses the features of the organ block and local region block). The above operation can be abstracted as the following function:
[TABLE]
3.3.3 Extracting feature of entire face
Finally, feature extraction is performed on the face block. After several convolution and max-pooling of the entire face, concatenate the face block feature map together with the feature map . The above operation can be abstracted as the following function:
[TABLE]
[TABLE]
After dropout the above feature map , two fully connected layers are used to encode feature to get the final features (fuse the features of organ block, region block and face block). The above operation can be abstracted as the following function, where and are the weights of the fully connected layers.
[TABLE]
3.3.4 Training based on three-grained face features (organ, local region, entire face)
The convolutional neural network has three output layers. The first output uses the feature map , which extracts the features of organ block, to predict. The second output uses the feature map , which extracts the features of organ block and region block, to predict. The third output uses the final feature , which extracts the features of organ block, region block and face block, to predict. The above operation can be abstracted as the following function, where , and represent the weights of output layers.
[TABLE]
[TABLE]
[TABLE]
, , and denote the probabilities of being prescribed for all Chinese herbal medicine in dictionary . Among them, is the main output of the neural network, which is the decision output of the final generation. and are auxiliary outputs, which are used to assist the training of the entire network. The final loss is addition of three losses, which are calculated by , , and and the real output . We use stochastic gradient descent to optimize the parameters so that the final loss is minimized. The loss functions are as follow, where denote the set of all parameters of the neural network and means the dimension of each real prescription .
[TABLE]
[TABLE]
[TABLE]
[TABLE]
The multi-scale convolutional neural network based on three-grained face structure is shown in Figure 4, in which the sizes of the input organ block images are 56x56x3, and the sizes of the input region block images are 112x112x3, the size of the input face block image is 224x224x3. All the sizes of convolution kernels are 3x3.
The network is divided into three parts. The first part extracts the features of organ block to obtain output . The second part extracts the features of region block and then merges them with the features of the organ block to continue to extract feature to get the output . The third part extracts the features of face block and then merges them with the features of the organ block and region block to continue to extract feature to get the output . The three outputs denote the probabilities of being prescribed for all Chinese herbal medicine in dictionary . The loss used to train the entire network is addition of three losses, which are calculated by , , and the real output . Finally, the final generated prescription is obtained by sampling from the output through the probability threshold .
3.4 Data augmentation
In the real world, patient’s medical data is precious and difficult to collect. Therefore, the data collected from the patient’s faces and prescriptions are very limited. Due to the limited data set, it is easy to cause the model to overfit, which is one reason for not choosing an overly complex network. Data augmentation is an effective way to cope with not enough data. It can reduce overfitting of the model and improve the model’s predictive performance.
In order to make full use of limited data, data augmentation is performed. The “data augmentation” randomly extracts some of the original patient’s face images, then randomly transforms the images (such as rotation, zoom) and then saves the image as a new patient’s face image. The original patient’s prescription are used as the prescription labels of the new patient’s face image. Data augmentation can increase the size and diversity of the data set. The sample size of the original data set is 9653. After data augmentation, the data set size increases to 18,463. Some parameters used in data augmentation are shown in Table 1.
4 Experiment
4.1 Dataset
“Face image - TCM prescription” dataset is collected from some cooperative hospitals. Due to the limited collection conditions, the collected raw data have a certain noise. For example, there are different medicine names but exactly they are the same medicine. After some preprocessing, the experimental dataset is obtained. The size of experimental dataset is 9653. After data augmentation, the size of dataset is increased to 18463, and the dataset is denoted as . In order to train multi-scale convolutional neural network based on three-grained face, the face images are segmented into different face areas: eyes, nose, mouth, cheeks, and chin. The specific description of the dataset is shown in Table 2 and Table 3.
In order to enhance the accuracy and persuasiveness of the experimental results, we use 5-fold cross-validation method to train and evaluate model: repeatedly performs training for five times and 500 samples are taken as test set for each time(conventional approach should divide data into five equal parts, each equal part is taken as the test set for each time. Only 500 samples are taken as test set for each time here due to the limited dataset). The 500 test samples taken for each time do not overlap. The average of five evaluation results is used as the final evaluation result.
4.2 Experimental setup
According to conventional convolutional neural network, multi-scale convolutional neural network based on three-grained face, and data augmentation, five models are run for TCM prescription generation, briefly described as follows:
Random forest (baseline): Random forest Breiman (2001) classifier is used to generate TCM prescriptions. The features are face images matrix and the labels are multi-label vectors representing the TCM prescriptions.
Conventional : Construct a CNN as described in section 3.2 to train according to face images and TCM prescriptions to obtain a model for generating TCM prescriptions. The experimental data set used is .
Conventional : The method is the same as conventional , but the experimental data set used is .
Multi-scale based on three-grained face: Construct a CNN as described in section 3.3 to train to obtain a model for generating TCM prescriptions according to images of different face regions and TCM prescriptions. The experimental data set used is .
Multi-scale based on three-grained face: The method is the same as multi-scale based on three-grained face, but the experimental data set used is .
The structure and some parameters of the conventional and multi-scale based on three-grained face have been described in section 3.2 and section 3.3. The more specific parameters are shown in Table 4. The optimization algorithm is SGD (stochastic gradient descent), and learning rate decay is 1e-6, and momentum is 0.9.
4.3 Evaluation metrics
In order to measure the similarity between the generated TCM prescription and the actual TCM prescription, the indicators precision, recall, and f-score are set as shown in the following formulas. denotes the number of Chinese herbal medicine appearing in both the i-th generated prescription and the i-th real prescription. denotes the number of Chinese herbal medicine appearing in the i-th generated prescription. denotes the number of Chinese herbal medicine appearing in the i-th real prescription. measures the how the Chinese herbal medicines are precise in generated prescription, and measures the how the Chinese herbal medicines are complete in generated prescription. () is the harmonic mean of and , neutralizing these two indicators.
[TABLE]
[TABLE]
[TABLE]
The indicators are calculated for each sample generated by the model, and then averaged to obtain the indicators used to evaluate the quality of the model:
[TABLE]
[TABLE]
[TABLE]
where m is the size of the dataset. Test set is used to evaluate the model and the size is 500.
For each example , is the harmonic mean of and . But note that , , are averages, so is not the harmonic mean of and .
4.4 Results and Analysis
4.4.1 Training process
In order to prevent overfitting, the model uses data augmentation, dropout methods. In addition, the strategy “EarlyStopping” is also used in the experiment. During training, a certain percentage of data is divided from the training set as a validation set for training observations. The proportion used in the experiment is 0.1. The 10% of training data is used as a validation set that does not participate in training. During the training process, observe the loss of the model on the validation set. After the validation set loss is no longer declining, wait for a certain number of iterations (we use 10 in the experiment) to stop the training. This can prevent the model from overfitting the training set and make a better prediction of the test set.
Take one of the training results in the 5-fold cross-validation. The changes of the training set and the validation set’s loss during the training process are shown in Figure 5 and Figure 6. It can be seen that although the number of epochs is 300(ensure sufficient number of iterations), training is usually stopped at about 30-70 iterations, and the later iterations overfit in the training set. With data augmentation, compared to the conventional CNN, the relative gap between the loss of the training set and the validation set in multi-scale CNN based on three-grained face is smaller, which indicates that the generalization ability of the multi-scale CNN based on three-grained face is relatively high.
4.4.2 Influence of threshold parameter
From the final output of the neural network, a series of probability values can be obtained. Finally, the outputs are 559 neurons, representing 559 Chinese herbal medicines. Finally, 559 corresponding probability values are obtained. The final prescription is predicted based on a threshold value t. The Chinese herbal medicine is prescribed if the probability of the Chinese herbal medicine is more than t.
One general choice for threshold is 0.5. Furthermore, when all the unseen instances in the test set are available, the threshold can be set to minimize the difference on certain multi-label indicator between the training set and test set Zhang and Zhou (2014). As shown in Figure 6 and Figure 7, setting different thresholds, the final evaluation results will be different (the results in the figure are the average results of 5-fold cross validation). When a larger threshold is set, a higher precision will be obtained because the prescription generated by the model try to be as precise as possible without errors, and it prefer to give fewer medicines to prevent errors. When a smaller threshold is set, a higher recall is achieved because the prescription generated by the model attempted to be as complete as possible and at the expense of a certain of precision. The “f1_score” is the harmonic mean of precision and recall, which neutralizes the accuracy and completeness. Note that the f1_score shown in the experimental data is not harmonic mean of precision and recall, because the f1_score is an average. We choose 0.25 as the final threshold, because at this time the value of f1_score is high relatively, and the difference between precision and recall is small, which can ensure high precision and recall simultaneously.
4.4.3 Performance Comparison
The experimental results of the five models are shown in Table 5. In order to enhance accuracy and persuasiveness of results, the evaluation results are averaged by 5 results, calculated by 5-fold cross validation methods. The values after “” indicate the standard deviation of the 5 results.
Random forest is a ensemble learning technique, which should give good performances. However, it can be seen from the experimental results that the other four models improve the performances compared to the baseline classifier random forest, indicating that the convolutional neural network is better than the random forest in this task. The neural network can extract and represent useful features from large and complex data. There are a large number of original image features that need to be extracted and represented on the task, so using a convolutional neural network for image processing to build a model is a better choice.
The performances of conventional are better than conventional , and the performance of multi-scale based on three-grained face are better than the multi-scale based on three-grained face. It can be seen that after using data augmentation, the models perform better because using data augmentation increases the size and diversity of the data, allowing the convolutional neural network to learn more knowledge when training. It can reduce overfitting of model.
The performances of multi-scale based on three-grained face are better than conventional , and the performances of multi-scale based on three-grained face are better than the conventional . A reasonable explanation for this result is that the multi-scale based on three-grained face extracts features from different granularities(organs, local regions, and the entire face), and it can extract and utilize local features and global features more effectively.
As shown in Table 7, three samples were taken to show the actual predicted results. For each example, patient’s face image and corresponding prescriptions are shown. It can be seen that the results of the model prediction have certain similarities with the actual prescriptions, which shows that the model has indeed learned something. In the four models, the results of multi-scale based on three-grained face(we omit the “multi-scale” just for neat alignment in table 7) are the most precise and complete. It can be seen that for common Chinese herbal medicines, the prediction of the model will be more accurate, such as Radix Glycyrrhizae and Poria Cocos. For some unusual Chinese herbal medicines, the model cannot accurately predict, such as Perilla Stem and Curcuma Zedoary. A reasonable explanation for this phenomenon is that common Chinese herbal medicines always appear in the training samples, and the model can learn more useful distinguishable features from a large number of training data. However, it is rarely used for the unusual Chinese herbal medicines, which only occasionally used by a few patients. With a small amount of data, the model is difficult to learn. The model cannot find distinguishable features.
4.4.4 Effect of different image size
The input size of conventional is 224x224. The “multi-scale” of based on three-grained face means to the three-scale input 56x56, 112x112, 224x224, but the actual input size is still 224x224, which is the size of face image. We just input a 224x224 face image and the face image is segmented to 112x112 local region block images and 56x56 organ block images during preprocessing. So we say that the input size of used above is 224x224. However, the size of the patient’s face image in reality is uncertain.
In order to verify that the CNN models can adapt to various images of different sizes, we retrain the networks of different input sizes with and get the experiment results, as shown in Table 6. The evaluation results(precision, recall, f1-score) are calculated by 5-fold cross validation methods. For multi-scale based on three-grained(we omit the “multi-scale” just for neat alignment in table 6), the image size means the size of face, and the size of local region block images is half of the size of face, the size of organ block images is half of the size of local region block images. The “average” in Table 6 means the average results of different image sizes(32, 56, 84, 112, 168, 224).
It can be seen from Table 6 that the models obtain similar results for different sizes of input images, indicating the robustness of the models. From the average, multi-scale based on three-grained face performs better than conventional . In addition, for smaller image sizes(32, 56, 84), multi-scale based on three-grained face is slightly worse than conventional , but when the input image is relatively large(112, 168, 224), multi-scale based on three-grained face is still excellent in the three evaluation indicators. We conjecture that multi-scale based on three-grained face needs to capture three-grained face features and fine-grained features are difficult to mine when the image is small. But on average, multi-scale based on three-grained face is still superior, and in reality it is unlikely to take too small patient’s images. From the results of larger image sizes(112, 168, 224), the performances of multi-scale based on three-grained face are all higher than conventional .
5 Discussion
Our results show that convolutional neural networks are capable of mining the prescription information from patient’s face images to generate prescription, and the multi-scale convolutional neural networks based on three-grained indeed can generate prescriptions that are closer to real prescriptions, as shown in the actual prediction results in Table 7 and the evaluation results in Table 5 and Table 6. By building such a prescription generation system, the doctors can obtain recommended prescription, and then modify it, finally apply it to the actual treatment.
Generation of TCM prescriptions from face image using deep learning can provide us with a possible result. Although the predicted result is not an inevitable conclusion, it provides us with a choice, a kind of opinion for reference, which greatly reduces the blindness of work. In fact, in reality, different TCM doctors do not always give the same prescriptions to patients, and there may be multiple prescriptions for the same patient. It is possible that system-generated prescriptions can inspire doctors to develop new useful prescriptions.
6 Conclusion
In this paper, we propose to use convolutional neural network to generate TCM prescriptions according to the patient’s face image. In order to more fully and effectively extract and utilize the features of the patient’s face, we propose a multi-scale convolutional neural network based on three-grained face and compare it with the conventional convolutional neural network. In addition, we use data augmentation to increase the size and diversity of the data to improve the effect.
To the best of our knowledge, few people do the work to generate TCM prescriptions. Chinese herbal medicine is a medical asset accumulated by the Chinese ancient people’s long-term practice. It is extremely rich and precious. It is of great significance to fully mine and learn information from the prescribing data of patients using deep learning technique.
In fact, when treating patients, doctors of TCM need to integrate multiple features (face, tongue, pulse, voice, symptoms) and their own experience to give solutions, which can overcome the limitations of using face images alone. Due to the limited data, in our preliminary research work we only consider to using patient’s face image to generate TCM prescriptions. In the future work, we plan to collect more quantities, more types of patient data.
Acknowledgements.
This study was supported by the China National Science Foundation (60973083, 61273363), Science and Technology Planning Project of Guangdong Province (2014A010103009, 2015A020217002), and Guangzhou Science and Technology Planning Project (201504291154480, 2016040- 20179, 201803010088).
The reference list from the paper itself. Each links out to its DOI / PubMed record.
- 1Cheung (2011) Cheung F (2011) TCM: Made in China. Nature 480:S 82
- 2Qiu (2007) Qiu J (2007) Traditional medicine: A culture in the balance. Nature 448:126
- 3H. Peng (1996) H Peng (1996) Dictionary of Traditional Chinese Medicine Prescriptions. People Health Press: Beijing,China
- 4Yao et al. (2018) Yao L, Zhang Y, Wei B, Zhang W, Jin Z (2018) A Topic Modeling Approach for Traditional Chinese Medicine Prescriptions. IEEE Transactions on Knowledge and Data Engineering 30(6):1007–1021
- 5Yiqin (2012) Yiqin W (2012) Objective Application of TCM Inspection of Face And Tongue. Chinese Archives of Traditional Chinese Medicine 30(2):349–352
- 6Jones (2018) Jones AL (2018) The influence of shape and colour cue classes on facial health perception. Evolution and Human Behavior 39(1):19–29
- 7Chen et al. (2017) Chen M, Shi X, Zhang Y, Wu D, Guizani M (2017) Deep Features Learning for Medical Image Analysis with Convolutional Autoencoder Neural Network. IEEE Transactions on Big Data p 1
- 8Litjens et al. (2017) Litjens G, Kooi T, Bejnordi BE, Setio AAA, Ciompi F, Ghafoorian M, van der Laak JAWM, van Ginneken B, Sánchez CI (2017) A survey on deep learning in medical image analysis. Medical Image Analysis 42:60–88
