LightNet: A Novel Lightweight Convolutional Network for Brain Tumor Segmentation in Healthcare

Diagnosis, treatment planning, surveillance, and the monitoring of clinical trials for brain diseases all benefit greatly from neuroimaging-based tumor segmentation. Recently, Convolutional Neural Networks (CNNs) have demonstrated promising results in enhancing the efficiency of image-based brain tumor segmentation. Most current work on CNNs, however, is devoted to creating increasingly complicated convolution modules to improve performance, which in turn raises the computing cost of the model. This work proposes a simple and effective feed-forward CNN, LightNet (Light Network). Based on multi-path and multi-level, it replaces traditional convolutional methods with light operations, which reduces network parameters and redundant feature maps. In the up-sampling stage, a light channel attention module is added to achieve richer multi-scale and spatial semantic feature information extraction of brain tumor. The performance of the network is evaluated in the Multimodal Brain Tumor Segmentation Challenge (BraTS 2015) dataset, and results are presented here alongside other high-performing CNNs. Results show comparable accuracy with other methods but with increased efficiency, segmentation performance, and reduced redundancy and computational complexity. The result is a high-performing network with a balance between efficiency and accuracy, allowing, for example, better energy performance on mobile devices.

M AGNETIC Resonance Images (MRIs) offer versatility in soft tissue investigation.Unlike X-ray or computer tomography, it does not expose humans to ionizing radiation and can take detailed images of the body's internal structure.MRIs can be utilized as inputs in deep learning-based segmentation techniques for segmenting body tissue types, which is highly helpful for the automated segmentation of brain tumor.Some processing experience can be learned from [1] and [2].Through resource allocation and integration, the entire framework can understand what should be matched first.The majority of brain tumors are categorized based on the behavior or cell source [3].Non-malignant tumors and Malignant tumors are referred to Low Grade Gliomas (LGG) and High Grade Gliomas (HGG), respectively.They both usually contain two classes, where LGG is divided into grade I or II and HGG is divided into Grade III or IV.HGG begin in brain cells and typically necessitates both surgical removal and radiation therapy.
In recent years, traditional machine learning methods and deep learning methods have had increasing success in the task of brain tumor segmentation.Traditional machine learning methods, including Fuzzy C-Means (FCM) [4], Support Vector Machine (SVM) [5], Random Forest (RF) [6], etc., have had good performance in image-based tasks.These machine learning methods rely on the characteristic attributes of the image itself to divide the image into different distinctive regions [7].When faced with complex images with high resolution, high contrast, and multiple imaging sequences, such as MRI images, these approaches struggle to achieve high accuracy with increasing computational capabilities.There used to be some ways to enhance computing efficiency and save energy consumption, such as [8] and [9], but it was difficult to apply these to machine learning.Deep learning algorithms have overtaken these traditional methods, and deep CNNs significantly improve the performance of image-based tasks.Some methods can even make the model achieve good performance in the absence of labels, such as the weakly-supervised model [10].Recent studies have concentrated on three important aspects in improving these neural networks: depth, width, and cardinality.Hussain et al. [11] proposed a multi-scale CNN for segmenting brain tumors.This network combines the feature information of two-dimensional images of different scales in a cascaded manner.These methods are focusing on the network width.K.He et al. [12] propose a residual network to extend the network deeper by exploiting skip connection, focusing on the depth.Xie et al. [13] propose ResNeXt to expand the size of the set of transformations, focusing on cardinality.
To attain adequate accuracy, however, these methods often require a huge number of parameters and floating point operations (FLOPs), with ResNet-50 [12], for example, having roughly 25.6 M parameters and needing 4.1B FLOPs to analyse a picture of size 224 × 224.An important future application of medical image segmentation is implementation on mobile devices.Thus, exploring efficient and portable network designs with sufficient performance will be a key focus of future work in the field.
The features extracted by deep neural networks (DNNs) have both rich and redundant information, which can guarantee that the neural network completely comprehends the input image.For example, some feature maps generated by the first layer in ResNet-50 are shown in Fig. 1.Although it is difficult to distinguish differences between these similar brain MRI images, they will be very different from the expert's perspective.To capture this, the groups that look the most similar are marked.The frequency of occurrence of these repeating features can be reduced, but as they are not entirely insignificant, they cannot be completely eliminated.Redundant feature maps could be helpful for DNN training because they can increase the richness of features, as features cannot be exactly the same.The current trend is to accept redundant feature maps, but to do it in a cost-effective manner.
A new convolution module is established here to replace the original traditional CNN with light operations.Based on Ghost-Net, unnecessary processes are streamlined.In addition, a U-Net symmetric network is utilized, which incorporates a global attention channel mechanism into both the down-sampling and up-sampling of information at the same level, allowing the neural network to capture more details.The following are the contributions that this article makes: r An efficient multi-path structure is proposed to reduce redundant features and replace the traditional convolutional layer, which significantly reduces the computational complexity and number of parameters while maintaining competitive results.
r A new feature extraction mechanism, by generating cor- responding weights, is designed to redefine the downsampling features and rich the features of each layer during the up-sampling process to generate better results.
r The two proposed structure is compatible with symmetric networks used for segmentation tasks, and the original network's parameters can be decreased while maintaining network performance.
r This method has achieved increased computational effi- ciency, reduced number of parameters, and competitive segmentation accuracy on the BraTS 2015 dataset.

II. RELATED WORK
Existing semantic segmentation model methods can be viewed from three perspectives: capacity of the network, attention mechanism, and lightweight models.

A. Network Computational Capbility
Increasing the depth is an straightforward and intuitive technique for expanding.Szegedy et al. [14] introduce an inception network, Inception-ResNet [15], that uses a multi-branch architecture to increase the capacity of the network.
However, gradient propagation gets more challenging as the network depth increases.The layer jump link proposed by ResNet [12] helps alleviate the optimization problem of the deep network, and many models have been based on ResNet, including WideResNet [16], PyramidNet [17] and ResNeXt [13].In comparison to traditional residual networks, WideResNet employs a greater number of convolution filters and a shallower depth.The pyramid network is an iteration of WideResNet.It has a structure in the shape of a pyramid for extracting features, and the network width grows gradually.With the help of grouped convolution, ResNeXt demonstrates that raising the cardinality improves the accuracy of the classification.DenseNet [18] establishes a dense connection of all previous layers to the later layers to achieve feature reuse, so that each convolutional block can receive the feature information from all previous blocks.With these enhancements, DenseNet can outperform ResNet while using fewer parameters and computational cost by reusing features.
U-Net [19] has a better effect on the segmentation of small targets, and its structure is extensible, leading to many improvements and expansions on the U-Net design [20], [21], [22].Due to the excellent performance of Unet on medical images, a complete symmetric segmentation network will often adopt a structure similar to U-Net to ensure that each layer corresponding to the up-sampling and down-sampling of features has the same level of semantic features.H-DenseUNet [23] is obtained by mixing 3D convolution, DenseNet, and U-Net structures to explore the mixed features of liver and tumor segmentation.MRAnet [24] proposes a complex network structure to extract rich spatial features and semantic information at different scales, which performs well in brain tumor segmentation tasks.However, this continuous increase in the capacity and volume of the network results in lower network operating efficiency and higher network parameters.Given the growing importance of mobile health care, it will be crucial to strike a balance between processing efficiency and network size via lightweight networks.

B. Attention Mechanism
Attention plays a significant role in human vision.Everyone's focus will be different.This is the same in CNNs.A series of mechanisms can be used to selectively allow the network to focus on salient regions to capture the feature information better.Several works [25], [26] incorporate attention processing in classification tasks to achieve this.Residual Attention Network [25] uses attention structure stacking in an end-to-end mode to change the attention of features, which can be adapted to the deepening of the network.SENet [26] improves the characterization ability of neural networks by modeling the relationship between channels to correct the characteristics of channels.To compute channel-wise attention, the features using global average-pool is used in their Squeeze-and-Excitation module.However, they fail to take into account the spatial attention, which plays a significant part in determining "where" to focus, as seen in [27].Therefore, the design that has been proposed for LightNet makes use of both spatial attention and channel attention in order to construct an effective architecture.

C. Lightweight Model
A series of lightweight models have been proposed in response to the need for deployment of neural networks on portable mobile devices.MobileNets [22] is a series of lightweight DNNs based on deep separable convolution.A unit is constructed with depth-wise and point-wise convolution to approximate the original convolution with bigger filters.MobileNet with this structure can maintain comparable performance.To improve performance and consume fewer FLOPs, a reverse residual block is proposed in MobileNetV2 [28], while Machine Learning technology (Au-toML) is further utilized in MobileNetV3 [21].ShuffleNet [29] implements a channel shuffle operation to enhance the communication between different channels.When developing the lightweight models, ShuffleNetV2 [30] Specifically takes into account the real speed of the target hardware.Xception [31] uses deep convolution operations to use model parameters more efficiently.But the correlation between normal and redundant features has not been fully exploited, even though these models have achieved good performance with a small number of FLOPs.
Instead of removing all redundant features, Ghostnet [32] proposes cutting down redundant features to reduce network complexity and computational volume.It uses a series of simple convolution kernels to replace traditional convolution and significantly minimizes the number of parameters needed for natural picture segmentation algorithms.Unfortunately, this method of over-compressing the network structure has yet to achieve good results in medical images.This method's excessive pursuit of simplification leads to an insufficient number and type of features.
To investigate light model, a series of structure have been proposed, including knowledge distillation [33], [34], low-bit quantization [35], [36], network pruning [37], [38], etc. Han et al. [37] propose pruning the unimportant weights in neural networks.Knowledge distillation was first proposed by Hinton et al. [33] to create a small light model.To ensure effective CNNs, Li et al. [39] employ L1-norm regularization to get fewer filters.In order to achieve high compression and speed-up ratios, Rastegari et al. [35] quantize the weights to 1-bit format.However, the performance of these methods is usually limited by their original pre-trained DNN.There is significant potential for developing low-parameter, highly efficient, and low-calculation DNNs through the proper design of efficient architecture.

III. METHOD
LightNet adopts a fully convolutional semantic segmentation network with asymmetric encoder-decoder architecture.In down-sampling process, feature information can be kept from being lost by using convolution instead of pooling operations.The network consists of a feature extraction stage, an upsampling structure, and skip connections contacting the up-sampling and down-sampling stages, but instead of simply using the connection operation in the skip connection of the basic U-Net, an attention module is used to implement semantic features.The selection of information thus yields a very complex architecture that results in better performance.The feature mask output in each attention structure can be used as prior information to form the feature mask of the current level.A novel skip connection with attention structure is used in LightNet to merge the down-sampled weighted information and the up-sampled features, which can provide multi-level features in both encoder and decoder process.
The summary of LightNet's structural design is shown in Fig. 3, where the s means "stride".The LightNet mainly contains three modules: light module, light bottleneck and attention module, as shown in Figs.4-6.

A. Light Module
The light module in LightNet is inspired by GhostNet, but unlike in the ghost module, the identify connection is canceled.This way, more feature maps can be generated with fewer parameters.The ordinary convolutional layer in the light module is divided into two parts.Campared with ordinary convolutional layer, the convolution in the light module is divided into two process, as shown in Fig. 4. The first process is an ordinary convolution with strictly controlled features to get the intrinsic features.Then many new features are generated by applying a series of simple linear operations (called light operation).The output feature map size remains the same, but the light module has a reduced number of total parameters and a lower computational complexity compared to ordinary CNNs.This allows    the light module to be more efficient and potentially faster in terms of processing, without sacrificing the quality of the output feature maps.In practice, the transformation method f i , which is a linear transformation, is not fixed.It can be a 3 × 3 or 5 × 5 linear kernel.Combining convolution kernels of different sizes can be used for linear transformation operations.Considering the GPUs' performance and reducing the parameters as much as possible, all convolution kernels used here are 1×1.
In conventional convolution, X ∈ R c×h×w is taken as input data and Y ∈ R c ×h ×w as output data, where c and c represent Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the number of input channels and the number of output channels, respectively.Where w and h are used as the width and height of the X, w and h represent the width and height of the Y .The convolution operation ε of any convolutional layer used to generate c features can be expressed as: Where * denotes the convolution operation, and the bias term is represented by b.In light module, intermediate features are first generated by the 3x3 convolution shown in Fig. 4(b), and then Y ∈ R c ×h ×w is generated from intermediate features by the light operations.This process is to reduce the number of repeated features, which can be formulated as follows: Where f j represents the j-th light operation for generating the j-th light feature map, 1 ≤ j ≤ c .In order to maintain the spatial size of the output features consistently, hyper-parameters (i.e., padding, stride) are the same as those in the general convolution (Equation.1)and the bias term is removed for clarity.The r represents the change multiple of the number of features after the first convolution operation in the light module.For example, if the number of features before the first convolution is 200 and after the first convolution is 100, then r = 2.
The required number of FLOPs can be calculated as h where k represents the kernel size of the filters.In general, due to the huge number of filters and the channel number c (e.g., 512 or 1024), there are often hundreds of millions of FLOPs during processing.The theoretical efficiency e of upgrading ordinary convolution with the light module can be calculated as: In the light module, the features of the final output are set to be twice as large as those of the first input, so the efficiency e could be calculated as: Because the upgraded structure is more efficient than the previous one.Taking into account the practical issue of efficiency e and make it meaningful, it must be greater than 1, so r must be greater than 1.5.In other words, r is a variable that can be controlled by the experiment.As long as r is determined, e can be guaranteed to be greater than 1.When e is less than 1, the parameters are increased compared to traditional convolution, which is not what we expect to see.So only consider the case where e is greater than 1 in our experiments.

B. Light Bottleneck
The Light Bottleneck takes advantage of this light module design.As shown in Fig. 5, the light bottleneck is similar to the basic residual block in ResNet, as it combines multiple convolutional layers with shortcuts.There have two stacked light modules in the light bottleneck.The first light module serves as an extension layer to increase the number of feature maps, while the second light module reduces the number of feature maps to match the shortcut path.Finally, the inputs and outputs of the second light module are connected using shortcuts.The skip connections and 1×1 convolution are used to increase feature richness since using light module would make features too thin.Unlike MobileNetV2, ReLU is not utilized after the second light module, and instead, Batch Normalization (BN) and Lrelu nonlinear activation (Lrelu) are employed after each layer.In particular when stride = 2, there is a down-sampling layer at the last layer in the light bottleneck, which is used to reduce the size of the feature maps and achieve the features progression to higher levels.

C. Attention Module
An attention module is integrated into the skip connection between the encoder and decoder process, which can effectively reduce the redundancy of feature information.Additionally, it allows the network to focus more intently on the critical Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE I OVERALL ENCODER ARCHITECTURE
information pertaining to tumor regions.The specific structure of the attention module is displayed in Fig. 6.The feature masks obtained through the feature extraction part are added to the attention module, and the one-dimensional spatial information is obtained by compressing the feature map using maximum and average pooling.Then the one-dimensional vector containing spatial information of features is sent to a shared network, which is compressed, summed, and merged, pixel by pixel.After the above process, the weighted value at each pixel is reconstructed and obtained.The final attention feature mask is obtained after the weighted value is multiplied by the pixels of the feature mask extracted by up-sampling structure with the same-level.Compared with the feature mask extracted by downsampling, the attention feature mask suppresses background regions unrelated to brain tumors, thus enabling the output layer to achieve higher performance.Each pixel in the feature mask receives feedback thanks to the average pooling method used during compression process, while the pixel with the largest response in the feature mask can benefit from the max pooling technique used during gradient back-propagation computations and gradient feedback.The attention module is described by Equation (5).
The input feature x, obtained by the feature extraction stage, undergoes global average pooling and global maximum pooling operations based on width and height, respectively, and then forward propagation of feature information through Shared Linear-layer Structure(SLS).Following an element-wise addition of the feature information, the sigmoid activation function, σ, is applied to generate the final attention feature map.
The overall encoder and decoder architecture are shown in Tables I and II, respectively.SE represents the Squeeze-and-Excitation Module [26].

D. Loss Function
To refine the training procedure of the network, a consistency loss function is proposed to maintain appearance and spatial consistency, while promoting the integrity of edge segmentation in brain tumor regions.The proposed consistency fusion loss is described in Equation ( 6), which consists of softmax crossentropy loss and edge consistency loss.
Loss total = α * Loss sof tmax + γ * Loss boundary (6) Where α is set to 0.6 and γ to 0.4, the values of hyperparameter α and γ will be analyzed during the experimental part.The softmax cross-entropy loss is a pixel-level loss function that is mainly used to quantify the difference between the groundtruth and the prediction mask.Similar to the losses adopted in other deep learning methods, the cross-entropy function is utilized as the basis to compute the edge consistency loss.The softmax cross-entropy loss function is described in below Equation (7).
Where θ is a pixel in the input features with a value normalized to 0 to 1. G(θ) is the ground-truth of θ while P (θ) represents the prediction mask of θ.If the results of P (θ) and G(θ) are close, the loss is small, while a large gap between the two results leads to a large loss.The proposed consistency loss, Loss boundary , is a measure of the distance of the contour (or region boundary) within the brain tumor region.∂G is defined as the boundary form of the real region G, and ∂P represents the boundary form of the predicted segmentation region as calculated by the network.Edge consistency should alleviate the above-mentioned problem of unbalanced segmentation since it uses integration over boundaries (interfaces) between regions rather than unbalanced integration within regions.In addition, the boundary loss provides different information from the regional loss and can be used to complement the regional loss.Considering the difference between the real region boundary ∂G and the predicted segmentation region boundary ∂P , the proposed edge consistency loss incorporates an asymmetric L2 distance function, as shown in Equation (8).
The ∂G(θ) denotes the real value of the target boundary θ, and ∂P (θ) is the predicted value of θ.This L2 distance function is proposed to calculate the minimum square error between the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.target and predicted values of θ.This loss function only evaluates the pixels on the boundary, and it is 0 if it matches the boundary of Ground Truth.For a pixel that does not match, evaluate its loss according to its distance from the boundary of Ground Truth.
The edge consistency loss takes the pre-computed minimum square error, Distance(∂G, ∂P ), between the target value of the boundary pixel and the predicted value, then uses the boundary information of the real region G to encode the distance between each pixel θ and ∂G, as shown in Equation (9).

E. LightNet Overview
The amount of parameters is significantly reduced by transforming the traditional convolution into an innovative lightweight module.And the lightweight bottleneck module is proposed to complement the use of the lightweight module to make the network structure more unified.In addition, the features of each layer after being redefined by the attention module in the downsampling process are added on the same layer in the upsampling process, which greatly enriches the recognizability of the features.Finally, the novel loss function is presented to adjust the training process to achieve the best experimental results.An overview of LightNet's details is shown in Fig. 2.

A. Dataset
This article mainly employs the MRI data released in the BraTS 2015 [42] to conduct experiments, proving that the multi-level feature extraction strategy is an effective method for segmentation task in brain tumor.Each MRI image in the dataset consists of four modalities: T2 -Weighted (T2), T1-weighted contrast-enhanced (T1c), T1-weighted (T1), and Fluid-attenuated inversion recovery (Flair).To make it clear, a sample is visualized from BraTS 2015 dataset in Fig. 7.The images provided in the dataset are processed by image correction operations such as registration, interpolation, and skull separation.The dimension of all modal data is 155 × 240 × 240 voxels.Different modalities can emphasize certain sub-regions, which can offer further data for analyzing the brain tumor.
Every MRI sequence contains up to five tumors types: Necrotic (NCR), Edema (ED), Non-Enhancing tumor (NET), Enhancing Tumor (ET), and others.To align with standard evaluation, segmentation performance is evaluated according to the three officially-defined tumor sub-regions, namely: Complete Tumor (CT), that is, the region containing all tumor categories (includes NCR, ED, NET, and ET); Tumor Core (TC), an area containing all tumors with the edema removed (includes NCR, NET, and ET), and Enhancing Tumor (ET), an area containing only enhancing tumors (only includes ET).It can be observed in Fig. 7 that the CT region can be clearly identified in the FLAIR, while the T1c shows further details about the TC.It makes sense that comprehensive modalities would enable the optimum segmentation outcome.However, due to motion artifacts, scan corruption, and constrained scan times, it is typical in clinical practice to have one or more missing modalities.
The BraTS 2015 dataset includes training and test sets.There are 220 HGG samples and 54 LGG samples in the training set.Since there is no official validation set, a subset of 40 is randomly selected from these totally 274 samples as the validation set while the remaining samples as the training set.The official online test dataset contains 110 MRI images of HGG and LGG samples whose tumor grade and accurate label of segmentation were unknown.

B. Experimental Mechanism
The loss functions used in the experimental network are the weighted softmax loss and the edge loss function.All network models are implemented based on TensorFlow and run in parallel on GTX 1080Ti.The size of the input samples is 155 × 240 × 240 and the batch size is set to 10.The model is trained for 33 epochs and optimized using the Adam optimizer, where the momentum term β is set to 0.9 and the initial learning rate is set to 0.0001.In order to have zero mean and unit variance for each modality, each MRI data was normalized by subtracting the voxel intensity in the brain region of each modality from the mean intensity of the modality and dividing by the standard deviation.

C. Evaluation Indicators
The official evaluation method, the Dice coefficient, is used to evaluate the segmentation performance of the model though assessing the voxel coincidence between the true value and the predicted result, which is calculated for the three tumor Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

TABLE III PERFORMANCE OF NETWORKS EVALUATED ON THE BRATS 2015 ONLINE TESTING DATASET
subregions (CT, TC, and ET) and defined as follows: Dice Score = T P + T P F N + T P + T P + F P (10) Where FN, FP, TN, and TP represent False Negative, False Positive, True Negative, and True Positive voxels, respectively.

D. Result Analysis
The LightNet approach is evaluated on the BraTS 2015 datasets by firstly comparing with other benchmark semantic segmentation methods using the Dice score as a metric.Then the segmentation time and number of parameters for these networks are investigated.Finally, the experimental accuracy and efficiency of LightNet are measured using dice score and compared to published methods.Then the proposed modules are evaluated individually to investigate their respective effectiveness in LightNet.In the following part, the more detailed analysis is also provided.
The performance of benchmark segmentation networks is compared with LightNet in Table III.These results are from testing with the online test dataset.For example, although Light-Net requires only 0.83 M parameters, more than ten times less than the 10.8 M of ResNet-101, Table III shows LightNet's performance is significantly better than that of ResNet-101.In addition, the time required to run LightNet is also much lower than other methods.LightNet can therefore maintain high accuracy while ensuring operational efficiency.Although the Dice score improvement of LightNet is small, the reduction in the segmentation time and the number of parameters is significant, showing LightNet's efficiency gains without sacrificing performance.
Compared with other methods in Table III, LightNet achieves the highest Dice Score in CT, TC and ET.GhostNet is also included in this table to highlight LightNet's significant improvements on the base concept.From the results, networks with larger parameters usually have better results, which shows that in general parameters and accuracy are positively correlated.However, there are some exceptions, for example, ResNeXt has about four times more parameters than DenseUnet, but the results are not as good as DenseUnet.This indicates that the optimization of the internal structure of the network also has an important effect on the results.While compared to its rivals, LightNet consistently performs better across a wide range of computational complexity levels due to its more effective use of compute resources for producing feature maps.Overall, LightNet can maintain high accuracy while ensuring operational efficiency.
The performance of several segmentation methods on the BraTS 2015 validation dataset is shown at every epoch in Tables IV-VI.As seen in Table IV, for example, LightNet achieves the finest result for CT, followed by the DenseNet.The performance of LightNet is also relatively stable, remaining better than average compared with other networks, even before the 15th epoch.It then remains in the lead from the 15th epoch, and by epoch 33.It is almost 5.9% higher than the second-ranked network(DenseNet) for CT.The prediction results of several methods campared to Light-Net are also visualized in Fig. 10.In the results of LightNet, it can be found that subregions are more continuous, and the contours of subregions are clearer and more precise compared with other segmentation methods.It shows that LightNet overwhelm other methods in detail on high-complexity tumors (such as the results Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.  in the third and fourth row).LightNet also has better results along glioma boundaries and contains less segmentation noises than other methods (such as the results in first and last row).Meanwhile, the accuracy of the segmentation of tumors with a single structure (such as the results in the second row) is also higher.

TABLE VII COMPARED WITH OTHER STATE-OF-THE-ART METHODS
In a comprehensive comparison, LightNet's structure has better operating efficiency and a stronger advantage over other larger networks, which significantly reduces the parameters two and computing time and ensures a better result.

E. Preliminary
To investigate the effectiveness of each component, the proposed modules are evaluated individually in Table VIII."LB" denotes using Light Bottleneck."AM" means using the attention module.And "BL" means using boundary loss function.Among them, the " √ " in the first three columns indicates that it is used, and "✗" demonstrates that it is not.The last three columns show the experimental results corresponding to different module combinations.The ResNet-50 and ResNet-101 are used as the network's backbone.The residual bottleneck is simply replaced with the light bottleneck.It can be seen that the result of ET improves more obviously by using the boundary loss.Additionally, the attention mechanism also contributes to the segmentation results.After using light bottleneck, the experimental results are almost unchanged compared to the original network with a significantly reduced number of parameters.
Table IX shows that the network achieves the best result when e = 2 and r = 3 within a specific range.It can be seen that when e = 3, the result is poor, which is probably because the amount of features is reduced too much after the first convolution of the light module, resulting in a robust information decay.Fig. 11 shows some features produced by the first layer in LightNet.Compared with the features produced by ResNet-50, it can be seen that there are relatively few repeated features.

V. CONCLUSION
A novel network, LightNet, is proposed for brain tumor segmentation in an end-to-end training system.This network improves the structure of traditional convolution, reducing redundant features to achieve parameter optimization, and adds an attention mechanism to make feature expression more effective and targeted.LightNet achieves a good balance between accuracy and efficiency: while maintaining a high Dice score, it dramatically improves the speed of network operation and significantly reduces network parameters.
In future work, more efficient networks will be explored.With the gradual popularization of mobile devices and the increasing demand for mobile medical and healthcare, efficiency is crucial in future network design.It also reduces energy consumption, prolongs battery life, and protects the environment.
LightNet: A Novel Lightweight Convolutional Network for Brain Tumor Segmentation in Healthcare Dongyuan Wu , Junyi Tao , Zhen Qin , Member, IEEE, Rao Asad Mumtaz , Jing Qin , Linfang Yu , and Jane Courtney Abstract-Diagnosis, treatment planning, surveillance, and the monitoring of clinical trials for brain diseases all benefit greatly from neuroimaging-based tumor segmentation.Recently, Convolutional Neural Networks (CNNs) have demonstrated promising results in enhancing the efficiency of image-based brain tumor segmentation.Most current work on CNNs, however, is devoted to creating increasingly complicated convolution modules to improve performance, which in turn raises the computing cost of the model.This work proposes a simple and effective feed-forward CNN, LightNet (Light Network).Based on multi-path and multilevel, it replaces traditional convolutional methods with light operations, which reduces network parameters and redundant feature maps.In the up-sampling stage, a light channel attention module is added to achieve richer multiscale and spatial semantic feature information extraction of brain tumor.The performance of the network is evaluated in the Multimodal Brain Tumor Segmentation Challenge (BraTS 2015) dataset, and results are presented here alongside other high-performing CNNs.Results show comparable accuracy with other methods but with increased I. INTRODUCTION

Fig. 1 .
Fig. 1.Visualization of some feature maps generated by the first residual group in ResNet-50.It can be seen that there are several repeated features.Similar features are annotated with boxes of the same color.

Fig. 2 .
Fig. 2. Overview of LightNet's details, consisting of two phase: Training Phase and Testing Phase.LB represents "Light Bottleneck".In order to have zero mean and unit variance for each modality, each MRI data was normalized by subtracting the voxel intensity in the brain region of each modality from the mean intensity of the modality and dividing by the standard deviation.

Fig. 3 .
Fig. 3. Summary of LightNet's structural design.The boxes in the bottom right explain the operations represented by the different shapes.

Fig. 4 .
Fig. 4. Visualization of the general convolution and the light module.f i represents the light operation.

Fig. 5 .
Fig. 5. Visualization of light bottleneck.Left: the stride is 1 in light bottleneck; Right: the stride is 2 in light bottleneck, which is only used to the down sample process.

Fig. 8
shows the comparison between different experimental methods in terms of Segmentation time and Parameters.The position closer to the bottom left shows a better trade-off between running time and parameters: fewer parameters and less computation time.The comparison of parameter and dice score between the different methods is shown in Fig. 9 on Complete Tumor, where the position closer to the upper left corner indicates that Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.

Fig. 8 .
Fig. 8.Comparison between different experimental methods in terms of Segmentation time and Parameters.

Fig. 9 .
Fig. 9. Comparison between different experimental methods in terms of Dice score and Parameters.

Fig. 10 .
Fig. 10.Prediction results of MobileNet V3 (c), GhostNet (d) and LightNet (e).Each image in the first column (a) contains four small images corresponding to four modalities of MRI, where the top left is Flair, the bottom left is T1c, the top right is T1, and the bottom right is T2.The second column (b) represents the Ground truth of input (a).

Fig. 11 .
Fig. 11.Visualization of some features produced by the first layer in LightNet.

Table V and
Fig. VI show that the LightNet also outperformed other methods in terms of Tumor core and Enhancing tumor.Its performance becomes the top after the 21st epoch and especially the improvement of the LightNet was fairly noticeable.In Table VII, LightNet's Dice score is compared with other published brain tumor segmentation methods, showing LightNet maintaining high Dice scores across all tumor types compared with other methods.Regarding segmentation time, the produced network overwhelmingly outperforms the counterpart methods.This proves that our method has strong competitiveness.

TABLE IV DICE
SCORE OF COMPLETE TUMOR USING SEVERAL METHODS AT EVERY EPOCH WITH THE VALIDATION SET TABLE V DICE SCORE OF ENHANCING TUMOR USING SEVERAL METHODS AT EVERY EPOCH WITH THE VALIDATION SET

TABLE VI DICE
SCORE OF TUMOR CORE USING SEVERAL METHODS AT EVERY EPOCH WITH THE VALIDATION SETthe experimental method achieves better dice scores with fewer parameters.

TABLE VIII COMPARISON
AMONG DIFFERENT SETTINGS OF PROPOSED METHOD(DICE SCORE:%) TABLE IX PERFORMANCE OF THE PROPOSED LIGHTNET WITH DIFFERENT E(DICE SCORE:%)