Context – Fraud Detection Contest : Find it !

Forensics research is quite a sensitive topic. Data are either private or unlabeled and most of related works are evaluated on private datasets with a restricted access. This restriction has two major consequences: results can’t be reproduced and no benchmarking can be done between every approach.

This contest was conceived in order to address these drawbacks:
First, a database containing no private information was designed, so that it can be publicly spread and used with no constraints (blurring, masking or anything else).
Second, the contest should finally presents a benchmark on major works developed since many years.

Because they can be easily modified with common and usual tools, frauds on documents have seriously increased. For instance, a dishonest person can be easily attempted to modify amount of purchases on types of document admitted as evidence, such as invoices or receipts, in order to earn more money from insurance in case of theft or fire. Receipts can also be provided as expense report by employees. We can imagine there are cases of falsifications of name of purchased products, to respect constraints of reimbursement, or of the address of restaurant to prove the presence at the good place. For all these reasons, we choose for this contest to focus on this type of documents.

Recent research in document forensics are mostly focused on the analysis of images of documents. However, we believe that Natural Language Processing (NLP) and Knowledge Engineering (KE) could be used to improve the performance of fraudulent document detection. Document is not only an image: it contains textual information that can be processed, analyzed and verified. The aim of this contest is to provide an Image-Text parallel corpus and an unique benchmark to test and evaluate image-based and text-based methods.