Data sharing is one of the major challenges of machine learning models. The advent of techniques such as federated learning, differential privacy, and split learning have solved data silos, privacy, and regulatory issues in a big way.
In this article, we’ll look at split learning, a new technique developed at the MIT Media Lab that allows machine learning models to be trained without sharing raw data. The technique solves challenges like data silos, data sharing, etc.
Most importantly, Split Neural Networks (SplitNN) does not share raw data or model details with collaborating institutions. Configurations meet the practical parameters of entities holding different modalities of patient data; centralized and local health entities collaborating on multiple tasks; and learning without sharing labels, the article, Split learning for health: Distributed deep learning without sharing raw patient data, showed this.
The researchers compared the performance and resource efficiency tradeoffs of split learning and other methods like federated learning and large-batch synchronous stochastic gradient descent. The results showed the
How Split Learning Works
SplitNN is a distributed and private deep learning technique for training deep neural networks on multiple data sources without the need to directly share labeled raw data. SplitNN solves the problem of training a model on multiple data entities.
The model is divided into several split-training sections, each trained on a different client. For example, the data being trained may reside on a supercomputing resource or on multiple clients participating in the collaborative training. However, none of the clients forming the model can “see” each other’s data.
Techniques are applied on the data, which encode the data in a different space before sending it to train the model. Since the model is split into several sections and each of these sections is trained on a different client, the network training is deferred by transferring the weights from the last layer of each section to the adjacent (or next) section. Thus, only the weights of the last layer (also called the cut layer) of each section are sent to the next client, and no raw data is shared between clients.
As shown in the figure above, the training layer of SplitNN is marked by the green line, representing the cut layer. The upper part of the model is trained on the server and the lower part of the model is trained on multiple clients.
These steps continue until the distributed shared learning network is trained without looking at each other’s raw data.
For example, a split learning setup allows local hospitals with smaller individual data sets to collaborate and build machine learning models that deliver superior healthcare diagnostics without sharing raw data.
Simple Vanilla Split Learning
This is the simplest SplitNN configuration, as shown in Figure (a). For example, in this framework, each customer (for example, a radiology center) forms a partial model down to a specific layer called the “cutting layer”. Then, the outputs at the cut layers are sent to a server which completes the rest of the training without seeing the raw data (example: radiology images) from the clients.
This completes a forward propagation cycle without sharing any raw data. Gradients are now propagated back to the server from its last layer to the cut layer. Finally, the gradients at the cut layers are sent back to the radiology customer centers.
The rest of the backpropagation is now complete in the radiology customer centers. This process continues until the SplitNN is trained without looking at the raw data of the other.
Label-free split learning
As shown in the image above (Figure (b)), the network is wrapped around the server network edge layers and the output is fed back to the client entities. While the server keeps most of its layers, the clients generate the gradients from the final layers. This is then used for backpropagation without sharing the corresponding tags.
For example, tags contain very sensitive information such as patient status. The configuration is ideal for distributed deep learning.
Split learning for vertically partitioned data
This type of setup allows multiple institutions holding different modalities of patient data to learn distributed models without revealing or sharing the data. As shown in the image above (figure c), the configuration of SplitNN is suitable for multi-modal multi-institutional collaboration.
For example, radiology centers want to collaborate with pathology testing centers and a server for disease diagnosis. Therefore, radiology centers holding imaging data modalities form a partial model down to the cutting layer. Similarly, pathology test centers with patient test results form a partial model down to its own cut layer.
Once done, the cut layer level outputs from these two centers are then concatenated and sent to the disease diagnosis server which forms the rest of the model. These steps are repeated to train the distributed deep learning model without sharing each other’s raw data.
Simple distributed deep learning setups cannot handle various practical configurations of collaboration between healthcare entities. This is where SplitNN comes in. Additionally, SplitNN is versatile, allowing many plug-and-play configurations depending on the application. SplitNN is also scalable for large scale settings. Furthermore, the limits of resource efficiency can be pushed further in distributed deep learning by combining SplitNN with neural network compression methods for seamless distributed learning at the edge.
This article is written by a member of the AIM Leaders Council. AIM Leaders Council is an invitation-only forum of senior executives from the data science and analytics industry. To check if you are eligible for membership, please complete the form here.