This site uses cookies used by Google Analytics to offer you a better browsing experience. To accept all cookies, select [Accept] button. If you do not accept all cookies and accept essential cookies, select the [Reject] button.

PLISM : Dataset Description

Large-scale histological image dataset toward stain and device-agnostic models

  • Color and texture in digital pathology images are affected by H&E stain conditions (e.g. Harris or Carrazi) and digitalization devices (e.g. slide scanners or smartphones), which cause inter-institutional domain shifts.
  • PLISM is the first group-wised pathological image dataset that encompasses diverse tissue types stained under 13 H&E conditions, with multiple imaging media, including smartphones (7 scanners and 6 smartphones).
  • PLSIM dataset was created for the evaluation of AI models’ robustness to domain shifts and development of robust AI models to them.
  • Totally, the PLISM dataset is resulted in the creation of two subsets;
    • 1. PLISM-sm, where smartphone images were used as queries to create image groups for each staining condition corresponding to each tile image.
    • 2. PLISM-wsi, consisting of image groups for all staining conditions between WSIs for each tile image.
  • Paper Link: Ochi, M., Komura, D., Onoyama, T. et al. Registered multi-device/staining histology image dataset for domain-agnostic machine learning models. Sci Data 11, 330 (2024). https://doi.org/10.1038/s41597-024-03122-5
    • If you use the dataset for scientific work, please cite the above paper.
Left) Abbreviations for whole slide scanner vendors and the devices that correspond to each vendor. Right) Abbreviations for smartphone OS types and corresponding smartphones.
rom the 64 staining conditions, we selected 12 types for each hematoxylin solvent. Mayor represents the reference staining condition. Detailed procedures are described in the Methods section.
Sample tile images
AT2
GT450
P
S210
S360
S60
SQ
galaxy
iphone6
iphone13
itel
moto
redmi
PLISM-sm(AT2)
DL link: Link URL
PLISM-sm(GT450)
DL link: Link URL
PLISM-sm(P)
DL link: Link URL
PLISM-sm(S210)
DL link: Link URL
PLISM-sm(S360)
DL link: Link URL
PLISM-sm(S60)
DL link: Link URL
PLISM-sm(SQ)
DL link: Link URL
PLISM-sm(galaxy)
DL link: Link URL
PLISM-sm(iphone6)
DL link: Link URL
PLISM-sm(iphone13)
DL link: Link URL
PLISM-sm(itel)
DL link: Link URL
PLISM-sm(motorola)
DL link: Link URL
PLISM-sm(redmi)
DL link: Link URL
title
DL link: Link URL
AT2
GT450
P
S210
S360
S60
SQ
PLISM-wsi(GIVH_AT2)
DL link: Link URL
PLISM-wsi(GIVH_GT450)
DL link: Link URL
PLISM-wsi(GIVH_P)
DL link: Link URL
PLISM-wsi(GIVH_S210)
DL link: Link URL
PLISM-wsi(GIVH_S360)
DL link: Link URL
PLISM-wsi(GIVH_S60)
DL link: Link URL
PLISM-wsi(GIVH_SQ)
DL link: Link URL
PLISM-wsi(SQ)
DL link: Link URL
PLISM-wsi(galaxy)
DL link: Link URL
PLISM-wsi(iphone6)
DL link: Link URL
PLISM-wsi(iphone13)
DL link: Link URL
title
DL link: Link URL
title
DL link: Link URL
title
DL link: Link URL
GIV
GIVH
GM
GMH
GV
GVH
HR
HRH
KR
KRH
LM
LMH
MY
PLISM-wsi(GIV_AT2)
DL link: Link URL
PLISM-wsi(GIVH_AT2)
DL link: Link URL
PLISM-wsi(GM_AT2)
DL link: Link URL
PLISM-wsi(GMH_AT2)
DL link: Link URL
PLISM-wsi(GV_AT2)
DL link: Link URL
PLISM-wsi(GVH_AT2)
DL link(Preparing): Link URL
PLISM-wsi(HR_AT2)
DL link: Link URL
PLISM-wsi(HRH_AT2)
DL link: Link URL
PLISM-wsi(KR_AT2)
DL link: Link URL
PLISM-wsi(KRH_AT2)
DL link: Link URL
PLISM-wsi(LM_AT2)
DL link: Link URL
PLISM-wsi(LMH_AT2)
DL link: Link URL
PLISM-wsi(MY_AT2)
DL link: Link URL
title
DL link: Link URL

Workflow & Tissue types

Workflow from slide digitalization to image-registation.
a. Adenocarcinoma. b. Neuroendocrine carcinoma. c. Squamous cell carcinoma. d. Mucinous carcinoma. e. Gastrointestinal stromal tumor. f. Liver cancer. g. Epstein-Barr virus-positive gastric cancer. h. Salivary gland. i. Clear cell carcinoma. j. Hepatocellular Carcinoma. k. Dedifferentiated liposarcoma.

Breakdown of images by tissue

# of image tiles from PLISM-sm per tissue type.
The PLISM-sm subsets contain approximately 60 thousand image tiles in total.
The PLISM-wsi subsets contain approximately 0.3 million image tiles in total.

Dataset organization

A Tar.gz file contains the following files:
- H&E image file from PLISM-sm subset: (stain_name)/(device_name)/(top_left_x)_(top_left_y)_(right_lower_x)_(right_lower_y).png
- H&E image file from PLISM-wsi: (stain_name)_(device_name)/(stain_name)_(device_name)_(top_left_x)_(top_left_y).png
Each image file is 512x512 px.

A csv file contains the following information:
- Tissue Type: The specific type of human tissue represented in the image, chosen from among 46 possible tissue types.
- Stain Type: The specific staining condition applied to the image, chosen from among 13 possible conditions.
- Device Type: The specific type of imaging device used to capture the image, chosen from among 13 possible device types.
- Coordinate (PLISM-sm): The xy coordinates of the top left and bottom right corners of each image (e.g., 1000_500_0_0).
- Coordinate (PLISM-wsi): The xy coordinates of the top left of each image (e.g., 1000_500).
- Image Path: The relative path to each image.

Licenses

This work is licensed under a Creative Commons Attribution 4.0 International (CC-BY 4.0)
For use, please use the dataset under CC-BY 4.0.
If you would like to use the dataset for commercial purposes, please contact us (ishum-prm@m.u-tokyo.ac.jp).

Top