MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities

Jason Armitage
University of Bonn
Golsa Tahmasebzadeh
University of Bonn
Jožef Stefan Institute


We introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic data provide a resource that further tests the ability for multitask systems to learn relationships between entities.

For more information and analysis, check out our paper at  

MLM Dataset

The resource includes three versions of the dataset - all of which contain multimodal and multilingual samples with semantic data.

Supported Languages: English (EN), French (FR), German (DE)

Supported Modalities: Text, Image, Geocoordinate

Semantic Data: Triple classes (ie component entities of triples for the sample entity stored in Wikidata)

MLM can be downloaded from this link  

Field Description:

Number of Entities by Continent (000):

From MLM, MLM-irle is generated for the benchmark evaluation tasks in the paper. The evaluation comprises the two tasks of cross-modal retrieval and location estimation. Further to balance the uneven distribution of human settlements in the dataset as visible in the above chart and to serve organisations that focus on the European Union, we have also created a version of the dataset - MLM-irle-gr (ie geo-representative) - that provides a geographically balanced coverage of human settlements in Europe.

Dataset Statistics:

Dataset Splits for Benchmark Evaluation Tasks:

Language Statistics:


MLM Benchmark Evaluation for Multitask Systems

In the paper    a set of benchmark evaluation tasks is outlined to train and evaluate systems designed to perform multiple tasks over diverse data.

Benchmark Evaluation Objective: Surpass results presented in the following tables for the baseline IR+LE framework in the paper (also see figure below) on both tasks.

Cross-modal Retrieval Results for MLM:

Location Estimation Results for MLM:

Note: In the above, medR is median rank at 500, F1 is weighted F1, P is precision, and R is recall (see report  ).

A Few Examples of Cross-Modal Retrieval:

A Few Examples of Location Estimation:


Multitask IR+LE Framework

Usage Information


  License:  Creative Commons Attribution 4.0 International



Github Repository of code: repo link

Link to Zenodo for downloading the data: MLM link

Cite As

	author       = {Armitage, Jason and Kacupaj, Endri and Tahmasebzadeh, Golsa and Swati},
	title        = {MLM: A Benchmark Dataset for Multitask Learning with Multiple Languages and Modalities},
	month        = jun,
	year         = 2020,
	publisher    = {Zenodo},
	version      = {version 1.0.0},
	doi          = {10.5281/zenodo.3885753},
	url          = {}


The project leading to this publication has received funding from the European Union’s Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No 812997.