With the number of people driving a car increasing every day,
there has been a proliferation in the number of cars insurance
claims being registered. The life cycle of registering, process-
ing and making a decision for each claim involves the manual
examination by the service engineer who creates the damage
report followed by the physical inspection by a surveyor from
the insurance company which makes it a long drawn out pro-
cess. We propose an end to end system to automate this pro-
cess, which would be beneficial for both the company and the
customer. This system takes images of the damaged car as
input and gives relevant information like the damaged parts
and provides an estimate of the extent of damage (no dam-
age, mild or severe) to each part. This serves as a cue to then
estimate the cost of repair which would be used in deciding
insurance claim amount. We have experimented with popular
instance segmentation models like the Mask R-CNN, PANet
and an ensemble of these two along with a transfer learn-
ing [1] based VGG16 network to perform different tasks of
localizing and detecting various classes of parts and damages
found in the car. Additionally, the proposed system achieves
good mAP scores for parts localization and damage localiza-
tion (0.38 and 0.40 respectively).
Index Terms— Deep Learning, Automated Car Insur-
ance Claim System, Mask R-CNN, PANet, Ensemble.
- INTRODUCTION
In a country like India that has an approximate of 230 million
vehicles [2], auto insurance has become a burgeoning market,
that is still dependent on traditional manual methods of mak-
ing repair claims. It requires a survey inspector to physically
look over each vehicle reported to be damaged, and make an
assessment regarding the damages and the claim amount that
has to be paid. With such a considerable number of vehicles
being used, it is reasonable to say that each of the insurance
companies receives numerous claims (ranging from hundreds
to thousands based on the size of the company) for smaller
repairs on a day to day basis.
*Authors contributed equally
A customer who wishes to make a claim must first sur-
render their vehicles to an authorized service centre where
the surveyor assesses the damage condition and provides an
estimate for the repairs. This is followed by the visit of the
insurance personnel who has to examine the damages and the
estimates of repair provided, and make a decision regarding
the approval of the claim. In such cases, the manpower and
logistics needed for scheduling inspections, processing claims
and getting the approvals could become a very cumbersome
process for the company as well as the customer. The whole
process of claim approval often becomes a stressful situation
for the customer also who is left stranded without their pri-
mary means of transport.
Minor claims often have repetitive tasks that consume a
lot of the time of a skilled inspection officer whose exper-
tise would otherwise be essential for complex cases where
the damages are severe or involve extensive interior damages.
Many a time it happens that the cost incurred during the in-
spection phase exceeds the claim amount being made for both
the company and the customer. Hence it has become essential
that a more prudent method is implemented that can prune the
increasing costs of maintaining multiple personnel and also
the time taken to reach a claim decision. In a fast-paced envi-
ronment like today, improving the damage claim processing
time would also work in favour of the company by increasing
customer satisfaction.
With the advancements in visual perception systems that
employ deep learning models, the process of automating this
whole system has become viable. It improves the claims life
cycle and reduces the time of the whole process. Therefore,
we aim to design a system that automates the processing re-
pair claims by employing different deep learning techniques,
so as to alleviate the dependence on manual inspection and
the bias invariably introduced by a human surveyor.
The main objectives of that a survey inspector follows
upon receiving the images of the car is to determine all the
damaged parts and the extent of damage on each of these
damaged parts. By coupling the damaged part details and
the details of the car’s make and model, along with the insur-
ance policy details they make a decision whether to approve
the claim or not. Hence the system we have designed has the
following task sub-division so as to closely resemble the work flow followed by the inspector: - Detect the make and model of a car.
- Detect and localize different parts on the exterior of a
car. - Localize and classify the types of damages present on
any part of a car. - Provide a decision about damage extent- whether the
part can be repaired or replaced.
The system would collect the data in the form of images, ana-
lyze it and provide an estimate about the extent of damage to
the car. As the preliminary step, we have created a multi-part
system which takes as input the images taken of the damaged
car for which the claim is made, and outputs relevant informa-
tion such as – the parts that have been detected as damaged,
the damage type of it and whether that part needs to be re-
paired or replaced. The first task has been achieved by train-
ing a transfer learning model to identify the model of a car.
Currently our dataset comprised of only two popular hatch-
back models and we have used a VGG16 [3] based model to
classify each car into these two categories. For the second
task, we have to segment each part of a car, for which we
have used Mask R-CNN by He et al. [4], a popular instance
segmentation model which had a high score on the instance
segmentation task for the COCO instance segmentation chal-
lenge1 . We trained two sets of this model, Parts MRCNN,
and Damage MRCNN, for performing the tasks of localizing
parts and that of damages respectively. The Parts MRCNN
has been trained to detect the different parts of a car like –
front bumper, hood, fender, door, mirror etc. Sample results
are shown in Figure 6. The Damage MRCNN is responsible
for localizing different types of damages found on a car like-
scratch, major dent, minor dent, crack etc similar to Figure 8.
Along with Mask R-CNN, we also trained the top perform-
ing model for the COCO instance segmentation challenge of
2018 – Path Aggregation Network for Instance Segmentation
(PANet) by Shu Liu et al. [5], namely, the Parts PANet on our
dataset for the detection of parts. By ensembling the results of
the Parts MRCNN and Parts PANet models the performance
of the system has significantly improved as discussed in detail
in Section 5.
In addition to these two models, our framework also em-
ploys a transfer learning based CNN model, Damage Net (D-
Net) which classifies each of the detected part as – damaged or
not. The parts detected from the ensemble of Parts MRCNN
and Parts PANet is given as input to D-Net which classifies
the part as being damaged or not damaged. The parts classi-
fied as damaged is then combined with the damages localized
by the Damage MRCNN to provide a decision of the dam-
aged part, the type of damage present on it and if it needs to
be repaired or replaced.
The rest of the paper is organized as follows. In Section 2
we discuss some of the relevant previous research and Sec-
tion 3 gives a complete detailed overview of the framework
and its various components. Section 4 explains the various
aspects of the experiment like dataset details and system de-
tails of development and deployment. Section 5 reports the
results and discusses about the performance of the system.
Finally, Section 6 concludes and puts forth future work. - RELATED WORK
A comprehensive vehicle damage detection system from im-
ages has been put forth by the author Srimal Jayawardena [6],
but they have employed 3D CAD models and rely on tradi-
tional image processing methods to do so. With the recent
advent of AI based deep learning techniques such traditional
methods are easily replaced. Most of the recent work uses
CNN based models to perform the classification of a limited
number of damage types on a car. In the work by Patil et al.
[7], the authors have used basic transfer learning and ensem-
bling of CNNs to achieve damage classification from images
of cars. In an alternative approach, Li et al. [8] have used
the object detection model YOLO [9] and by fusing differ-
ent backbones for the model have detected limited damage
classes present on a car image. In their study, the authors
of Deep learning for structural health monitoring: A dam-
age characterization application [10] have applied CNNs to
assess structural damage and have proposed a method they have termed as Structural Health Monitoring (SHM) where - they characterize damage based on the cracks formed in a
- composite material. In their paper Mohan and Poobal [11]
- have reviewed in detail the task of crack detection but by only
- employing image processing techniques, while the authors of
- Deep Learning-Based Crack Damage Detection Using Con-
- volutional Neural Networks [12] have used CNNs to detect
- crack based damages. None of these published works provide
- a comprehensive and end to end pipeline for the purpose of
- automating insurance claim process, which is what we have
- proposed and described in detail in this paper.
- FRAMEWORK OVERVIEW
To automate insurance claim for a car model image, we exper-
imented by training various deep learning models described in
this section. While designing the Car Damage Predictor, we
separate each task into a different module. The first module
detects and localizes the parts in a car image. Part detection
is needed to identify the part being damaged. The second
module classifies if the detected parts by the first module are
damaged or not. This filters the undamaged parts which are
then overlapped by damage localization module to localize
and classify the damage extent.
All the modules are combined, integrated and deployed
as explained in Section 4.3 and Section 4.4. Figure 1 summa-
rizes the basic system design of Car Damage Predictor.
3.1. Parts Detection and Localization
For detecting and localizing the different parts we train two
different models, Mask R-CNN and PANet and used an en-
semble of both.
There are many image segmentation models by Ron-
neberger et al. [13], Long et al. [14], Ganin and Lempit-
sky [15], Gupta et al. [16] and Hariharan et al. [17]. However,
Mask R-CNN is one of the top-performing models in image
segmentation and object detection. Mask R-CNN has shown
top results on all three tracks of COCO suite of challenges
[18]. Pre-trained Mask R-CNN model on COCO dataset can
be easily fine-tuned [19] on limited training data. Hence, we
used Mask R-CNN model as our first model and fine-tuned
that on our car damage dataset described in 4.1.
Path Aggregation Network for Instance Segmentation
(PANet) [5] is an improvement over Mask R-CNN. It was
the top performing model in the COCO instance segmentation
challenge of 2018 and achieved a better performance than the
previous Mask R-CNN for that particular track.
Details of these models are as follows: - Parts MRCNN: Parts MRCNN is Mask R-CNN
trained on our car damage dataset with the number of
samples mentioned in Table 1. Mask R-CNN is an ex-
tension of Faster R-CNN and is one of the popular mod-
els for instance segmentation. The model performs the
tasks in mainly two stages. A Region Proposal Net-
work (RPN) is the first stage that creates sub-regions
of a given image that may contain the parts. This net-
work then uses a Feature Pyramid Network (FPN) [20]
and a top-bottom feature map to propose the candidate
regions. In the second stage, the network head takes
proposed regions from the previous stage and generates
part classes, bounding boxes and masks. - Parts PANet: Path Aggregation Network for Instance
Segmentation (PANet) [5] is an improvement over
Mask R-CNN. It was the top performing model in the
COCO instance segmentation challenge of 2018 and
achieved a better performance than the previous Mask
R-CNN. It takes the Mask R-CNN basic network and
adds a separate connection from the lower level fea-
tures of the feature pyramid to the topmost feature. In
doing so it boosts the information flow in the network.
The author also introduces adaptive feature pooling to
link the feature grids generated to all the feature levels.
This makes useful information from each feature label
propagate directly to the proposal subnetworks. - Ensemble (Parts MRCNN and PANet): We used a
general ensembling method to combine the Parts MR-
CNN and Parts PANet outputs. The outputs of the en-
semble are bounding boxes around all the parts present
in the car image.
.
3.2. Damage detection
Once we detect the parts using the ensemble of models ex-
plained in Section 3.1, we filter out the parts that are not dam-
aged using an image classifier. Here we used a fine-tuned
VGG16 [3] model to classify the parts in damage and not-
damage classes. We call this network, Damage Net (D-Net).
We used only those extracted parts that are classified as not-
damage to localize and predict the damage extent using Dam-
age Detection and Localization model described in 3.3. Fig-
ure 2 shows the architecture of the D-Net.
Fig. 2. Architecture of Damage Net (D-Net)
3.3. Damage classification and localization
This task closely resembles the one discussed in Section 3.1.
So, we used similar network and called it Damage MRCNN. Fig. 3. Dataset annotation example. In the image each num- - ber corresponds to one annotation. 1 – Major dent outline, 2 –
- Fender, not damaged, 3 – Hood, damaged, 4 – Lhs front wheel
- arc, not damaged, 5 – Renault KWID RXT Edition.
- We trained Damage MRCNN on car images to correctly lo-
- calize and classify the damage extent categories mentioned in
- Table 2.
- EXPERIMENTAL SETUP
This section comprises of the details about the dataset in Sec-
tion 4.1 followed by Model Training details in Section 4.2.
The details about System Development and System Deploy-
ment are given in Sections 4.3 and 4.4 respectively.
4.1. Dataset
Our dataset comprises of images taken from the database of
previously approved insurance claims. The dataset was an-
notated using a version of the VIA [21] tool which we have
modified to include all the types of parts and damages classes
used by an insurance inspector. We excluded any image of
the interior of a car as our current focus is to detect only the
external damages. Each of the images were annotated for the
following details: - Car Model Details: A rectangle selection tool was
used to annotate the car under consideration with the
details of its make, model and view as shown in re-
gion tagged 5 in Figure 3. In our current dataset we
have considered only two popular Indian hatchback car
models of Hyundai i10 and Renault KWID. - Parts Details: A polygon selection tool was used to an-
notated the exact outline of each part of the car present
visible in the image along with the information if it was
damaged or not damaged, as shown in regions 2, 3, 4
in Figure 3. A total of 87 different parts of the car
have been annotated that are used by insurance com-
panies. Due to class imbalance in the available parts
in the dataset, we excluded parts which had lesser than
50 annotations. In the excluded parts if they differed
by being the front, rear, right or left part then we com-
bined them to form a single part. For e.g – right head
lamp and left lamp annotations were combined as the
head lamp class. Similarly, rear and front, left door and
right doors were combined as door. Table 1 gives the
list of the final 32 classes that we have considered for
our experiment. - Damage extent details: A polygon selection tool was
used to annotate the outline each instance of the dam-
age classes listed in Table 2. One sample annotation
is the region 1 shown in the Figure 3, which outlines a
major dent present on the side of the hood of the car.
Annotation Details
Classes Count Classes Count
Car right door 266 Fender 683
Car left door 282 Roof top assy 50
Upper grill 216 Door assy 497
Lower grill 226 Rhs rear wheel arc 109
Hood 545 Rhs front wheel arc 270
Rear W/S glass 274 Lhs rear wheel arc 113
Front W/S glass 352 Lhs front wheel arc 255
Rearview mirror 518 Rhs fender QTR panel 208
Head lamp 519 Lhs fender QTR panel 251
Rear door glass 423 Front bumper 660
Front door glass 272 Rear bumper 278
Rear door handle 304 Fog lamp 359
Front door handle 222 Fog lamp cover 316
Rear door moulding 206 Tail light 416
Front door moulding 199 Rear spoiler 334
Tail gate 396 Rear reflector 136
Total Count 10,155
Table 1. Distribution of parts classes in dataset. W/S – wind
screen, rhs- right hand side, lhs- left hand side
Damage classes Count
Scratch 757
Major Dent 1235
Minor Dent 274
Cracked 404
Missing 152
Total 2822
Table 2. Distribution of damage classes in dataset
4.2. Model Training
We trained models described in Section 3 for part localiza-
tion, damage detection and damage extent classification and ocalization. Here we discuss the implementation details and - training setup of the models in detail:
- Parts MRCNN and Parts PANet: Parts MRCNN is
modified version of Matterport implementation 2 of
Mask R-CNN. Parts MRCNN was trained on our car
dataset described in 4.1. The images were first re-sized
to 1024 × 1024 followed by the application of random
image augmentations, which is a powerful technique
as analyzed by Mikoajczyk and Grochowski [22], to
improve the performance of the model and its ability
to generalize. The augmentations applied were: ran-
dom cropping, Gaussian blur, and affine transforma-
tions (horizontal, vertical shifts). We then fine-tuned
the network heads of the pre-trained model (loaded
weights from the pre-trained model on COCO dataset
[18]) using SGD optimizer with learning rate 0.001,
learning momentum 0.9 and gradient norm clipping of
0.5 for 100 epochs, with a batch size 1 for 72 hours
on NVIDIA RTX 2080 Ti. SGD, as suggested by
Zhang [23] and Shalev-Shwartz et al. [24] works ef-
fectively on large scale problems. Figure 6 shows im-
age samples of car part predictions using Mask R-CNN.
As discussed in Section 3.1, since Parts PANet is quite
similar in architecture to that of Parts M-RCNN, we
used similar settings to train the model. We fine-tuned
the model for 65 epochs. It took around 48 hours to
train the model on NVIDIA RTX 2080 Ti. Few sam-
ple results are shown in Figure 7. The two models
2https://github.com/matterport/Mask_RCNN
we ensembled using simple generic ensembling tech-
nique. We merged the outputs of the two models with
equal weights if the IoU (Intersection over Union) of
the bounding boxes for each part prediction generated
by the models is greater than 0.5. In future, we are plan-
ning to create an ensemble on the weights of the models
rather than just outputs. - Damage Net (D-Net): Damage Net (D-Net) comprises
of a VGG16 network that takes as input the each lo-
calized part detected and re-sized to 256 × 256. This is
then followed by the application of standard image aug-
mentations like vertical flip, affine transformations and
gaussian blur. The VGG16 network is loaded with pre-
trained weights on ImageNet dataset and the last layer
is replaced with a dense layer with size 2. The network
is then fine tuned by using a batch size of 64, SGD opti-
mizer with a learning rate of 0.0001 and decay as 1e−6
for 100 epochs. It predicts each part into two classes-
damaged and not damaged. - Damage M-RCNN: Damage M-RCNN resembles the
architecture of Parts MRCNN, so we use the same im-
plementation of Mask R-CNN as in 1. We trained the
network heads for 150 epochs on damage dataset de-
scribed in Section 4.1 using SGD optimizer with learn-
ing rate 0.001, learning momentum 0.9 and gradient
norm clipping of 0.5 and it took around 72 hours to
train on NVIDIA RTX 2080 Ti.
203
4.3. System Development
To make the car claim predictor system easy for the user
and the claim manager, we have developed a portal that can
be used as a mobile application or a web portal, where a
user/claim manager can upload images of a damaged car as
shown in Figure 4. This portal would then provide the details
of the damaged parts, the type of damage to each of the parts
and an estimate for damage extent of the car.
The industry standard used for claim process is to create
a case-file directory to collect the images of the damaged car.
The number of images in a single case-file typically ranges
from 8-15. These images are taken from every profile of the
car: front, rear, left-side, right-side. In addition to these pro-
files, zoomed-in images of damaged parts and damages are
also often present. The other practice followed is that the
claim manager/user also records a video of 360◦ of the car
and zoom-in to the parts where there is damage. This reduces
the chance of fraud as the user cannot upload images of dif-
ferent cars as we can corroborate the same with the video. In
our first stage we focus on using the case-file images. For ev-
ery image in a case-file, we predict and localize every part,
predict whether that part is damaged or not and then the type
of damage (scratch, crack, etc.) and localize it by generating
a mask. These results are the aggregated over all the images
and the final report generated according to the estimate poli-
cies each company would follow.
We created a micro-service architecture for this system,
i.e. we developed services for every component. The archi-
tecture is depicted in Figure 5. Two applications have been
developed on the front-end: iOS Application and Web Ap-
plication. The iOS app can be used to upload images of the
damaged car and submit the case file. The web portal can be
used to upload the image, visualize the predictions made my
the system and also view the estimated cost of damage repair.
To submit a claim, the user has to first login to the application
and fill the details of the car and their insurance policy. Every
user has to do this the first time they use the portal. When-
ever there is a car accident or any damage to the car, the user
can upload pictures of the damages of the car and then submit
for the claim. Currently, we have implemented a feedback
system in place for the claim manager after they view the pre-
dictions made. The data so provided is saved to the database
and will be used to train the models for further improvement
in the future.
The services interact with each other as described in Fig-
ure 5. All the services are REST APIs. We have used An-
gularJS for the web app and served it using ExpressJS. Mon-
goDB works as our primary database to store image attributes
and meta-data. Following is the description of all the compo-
nents of the system: - Application: The source of the input images can be
web app or mobile app. User can upload multiple im-
ages at once. When a request comes to predict the dam-
Fig. - 5. Development Architecture of Motor Insurance Claim
Estimator
ages, the images are saved to AWS S3 bucket using file
stream and the meta data related to the image like size
of the image, car details are saved to MongoDB. After
the images are saved, a request goes to Model Serving
server with the credentials to access the image from S3.
The server predicts the damages and saves the predicted
images to S3 and provides a relevant response. - Model Server: We integrated the components de-
scribed in Section 3 to get the prediction for every case-
file. We deployed the model services using Flask. Flask
REST APIs were exposed with fixed token authentica-
tion. Since the models were created in Keras [25], we
used Tensorflow [26] Keras API to export the model
graph and weights to serve the models with Tensor-
flow Serving. Tensorflow ModelServer supports gRPC
APIs and RESTful APIs. We used RESTful APIs to
get the output from the model served by Tensorflow
ModelServer. We chose Tensorflow Serving for deep
learning models as it is fast and provides asynchronous
modes of operations. All the models described in Sec-
tion 3 have been deployed using Tensorflow Serving.
4.4. System Deployment
We deployed the application server with MongoDB on AWS
EC2 Ubuntu-16.04 instance with 4 cores and 8GBs of RAM
and used AWS EC2 p3.2xlarge with NVIDIA-K80 GPU in-
stance for model deployment.
Model mAP score
Parts MRCNN 0.35
Parts PANet 0.32
Ensemble
(Parts MRCNN + PANet) 0.38
Damage MRCNN 0.40
Table 3. Results for parts and damage detection models.
mAP- mean Average Precision
204
Fig. 6. Predictions by Parts MRCNN - RESULTS AND DISCUSSION
The metric used for segmentation models is mean Average
Precision (mAP) score. mAP is the metric being used to re-
port the performance of recent segmentation and object de-
tection models on the COCO dataset 3 which has the same
definition for both the Average Precision (AP) score and mAP
score. The Average Precision (AP) or mean AP (mAP) is cal-
culated as the average of the corresponding precision values
for each recall value which is taken from a Precision-Recall
curve.
A primary challenge we faced during the training of the
models was that, in object detection tasks most objects have
a fixed shape and outline. They might be present from differ-
ent views but still have discernible fixed boundary shape. Our
dataset comprises of both damaged and undamaged annota-
tions for each part and a damaged part shows much variation
in its shape from case to case.
Table 3 reports the mAP score results of the parts and
damages detection and localization models as discussed in
Section 3.1. Figure 6 shows a few sample results generated by
Parts MRCNN. Figure 7 serves to show that the binary masks
predicted by Parts PANet are also accurate which is a result
of the complementary branch that is added to the architecture
3 http://cocodataset.org/detection-eval
Fig. 7. Predictions by Parts PANet
which captures different views for each of the proposal that
is generated. It also implements a bottom-up augmentation
path to enrich the top feature with the lower level features.
This path propagates the lower level features of each part to
the top layers of the model which causes it to not generalize
more on those parts that have a fewer number of damaged an-
notations. Hence, as seen by the results, Parts MRCNN gives
a better performance than Parts PANet on our testing data and
is more stable to variations in shape.
We tested the D-Net for the parts detected by the ensemble
of Parts MRCNN and Parts PANet and got a 85.6% accuracy
score. The model has good enough performance which can be
improved by experimenting with other CNN based backbones
like ResNet [27] and Inception V3 [28].
The Damage MRCNN gives us a good mAP score and
as shown in Figure 8 identifies the various classes of dam-
ages effectively. Due to the inherent issue of the damages not
having a defined shape though the masks predicted are not
complete, the bounding boxes are classified correctly. The re-
sults shown by Parts MRCNN and Damage MRCNN serves
as a proof that the Mask R-CNN model can effectively ap-
plied for this task and the results can be improved by training
the complete network over the our dataset. Figure 9 shows
the false positive result by Damage MRCNN. We have ob-
served that sometimes due to high brightness and reflections,
Damage MRCNN gives false positive results.
205
Fig. 8. Predictions by Damage MRCNN - CONCLUSION AND FUTURE WORK
Automating the car insurance claim process is very relevant
and has real-life applications that can benefit both the cus-
tomer and the company. In this paper, we have proposed
an automated system that has different components to tackle
each of the tasks performed during the claim process. We
have demonstrated an end to end pipeline for the user to up-
load images, visualize the predictions and also get the esti-
mate of the cost of repair. In our on-going work, we are
working to improve the performance of the models by check-
ing for various failure cases and also experiment on adding an
ensemble for the damage localization as in the case of parts
localization. We also plan to scale the system to include all
different types (sedans, SUV etc.) and models of cars. - ACKNOWLEDGMENTS
We would like to thank Soumay Seth for bringing such a great
opportunity and business case to work on. We would like to
thank Humonics Team for their continuous support. This is
a Humonics Product and credit goes to all the members who
helped in making this successful. Dr. Rajiv Ratn Shah is
partly supported by the Infosys Center for AI, IIIT Delhi and
ECRA Grant by SERB, Government of India.
>