The insurance industry has been in existence for decades since its development. With the rapid change of all kinds of social activities, more and more new types of insurance have emerged, and traffic violation insurance is one of them. Insurance involves risk control from pricing, purchase, and payment, which leads to a top priority of user’s risk control modeling. An effective and accurate risk control model can help insurance companies to control costs, increase revenue, and further expand business.
However, even for one of the top car rental service providers in the country, users’ risk control management is still a big challenge because of the data. Not only a large number of new users have no history data for modeling, but the collection of old users’ data is limited as well because of the few features of user behaviors.
Based on the above-mentioned new challenges, a brand-new intelligent right platform (traffic violation insurance) emerged as the times require. It is the first insurance pricing platform using Federated Learning, powered by Webank and a car rental service provider.
Since the historical data is already existed, that is, the business party can provide a label, as the result of an actual violation or not, and what is missing is more user tags, which will be filled with user characteristics in Internet. There comes a classic heterogeneous federated learning situation: one with labels and few features requiring more user behavior tags.
Through FATE(Federated AI Technology Enabler), an open-source project initiated by Webank’s AI Department, has provided a secure computing framework to support the federated learning needs just like this. It implements secure computation protocols based on homomorphic encryption and multi-party computation (MPC) while supports federated learning architectures and secure computation of various machine learning algorithms, including logistic regression, tree-based algorithms, deep learning and transfer learning.
After a simple deployment of FATE in both parties, with the raw features and labels of users as well as more user characteristics tags in Internet, a heterogeneous logistic regression model can be trained just with a few configurations without exposing the raw data of each party. Each party owns part of the fitted model and when new sample arrives, they perform partial calculation with their local model. Their partial results are then combined to give prediction without disclosing each other’s data or local model.
In a glimpse of the scene, there is also a classic refusal problem: for the transactions rejected by the previous model, if discarded, the new joint model will be very biased. It would be difficult to apply the probabilities to the pricing model; and we cannot be knowing whether the transactions rejected is really a violation of the traffic rules or not, therefore, for the data of the training model, we applied some positive and unlabeled learning technologies as using a few of the original violations of the historical data to find out those transactions with a high probability of violations in the online data. The data is added to the training part, so that the model is no longer biased and the newly trained model probabilities can be directly applied to pricing, and because of the addition of more user characteristics, more actual traffic violations can be identified:
For example, taking some violations in a certain period of time to compare, because of the lack of user characteristics, the original model cannot distinguish those high-risk transactions from the ordinary ones, while they could be identified and rejected by the federated model because of the new addition of more user data, eventually improving profits both in new and old customers.
Just as the figure shows, both revenues and profits of federated models grows more rapidly than traditional ways, reaching to one and a half times higher. What is more, duo to the new introduced data, personalized pricing model can cover over 98% users as the number is only 10% at the beginning, meaning almost all users’ risk can be quantified.
All in all, the new intelligent right platform provides personalized pricing model within federated risk models for almost every user, solving the problem of ‘blank new user’ and ‘old user with lack of features’ for insurance providers, in the meantime, follow the rules of personalized data privacy and protection. Thank to Federated learning, thanks to fate, the new intelligent right platform has promoted the development of insurance business to a higher new stage.