Challenge Data

NLP applied to judicial decisions parsing
by Predilex

Login to your account


Competitive challenge
10MB to 1GB
Advanced level


Started on Jan. 6, 2020

Challenge context

When a trial is over, a summary of the trial is published with all the important information dealing with the case that have just been judged.
This document is called jurisprudence in French.

In the case of a trial between a victim and an insurer, this document contains all the circumstances, and the medical and financial data from the first injuries to the final amounts of indemnisation.

Challenge goals

At Predilex, we have “jurisprudence” data as text files and we want to build an algorithm to automate the extraction of the relevant information.
In this challenge, we want to extract from "jurisprudence" the sex of the victim, the date of the accident and the date of the consolidation of the injuries.

Data description


The inputs are “.txt” files containing a whole “jurisprudence” publication.
The documents come from different courts and different judges. The samples are taken on a period long enough to observe changes in the formulations, but the words and phrases are often similar.


The outputs are the data we want to extract: sex of the victim, date of accident, date of consolidation.
We provide them in a “.csv” file as columns. Each line of this file will be linked to a “jurisprudence” text file with its trial number and court.

Sex of the victim

This information is always contained in the document and can only take two values : "homme" and "femme"

Date of the accient

Except in very rare cases, this information is always domewhere in the document (usually at the beginning). It is the date when the accident happenned. We expect a date in the format dd/mm/yyyy.

Date of consolidation

This is the date when the injuries of the victim became stable and were declared final by a physician. The information should be present in most cases but sometimes it is either missing (so we put "n.c." in the csv file) or not applicable (so we put "n.a." in the csv) if the injury did not stabilize before the death of the victim.

Benchmark description

Sex of the victim

The classification of the sex of the victim was made by counting the number of some key words like in the text 'il' vs 'elle', 'monsieur' vs 'madame', "né" vs "née"...


For the extraction of the dates, we extracted all sentences in the text that contain a date, and classified those sentences based on bag of words. The classification was done by a SVM classifier.
In every file, and for each field ("date accident" and "date consolidation") we have ranked sentences based on the SVM score. Our prediction for each field is the date that had the best score, if this score was above a threshold (otherwise we predicted "n.c.").


Files are accessible when logged in and registered to the challenge

The challenge provider


Machine learning estimations for insurrance companies

Congratulation for the winners of the challenge

1 Omar Kadim, Sébastien Mcrae, Mohamed Ali Chaieb
2 Tristan Dot, Mathieu Rita
3 Ariane Alix

You can find the whole list of winners of the season here