Population-Level Medical AI Using 3D Self-Supervised Learning
This is deeply personal for me. In 2020, my father, Alan, developed a stomachache. A CT scan revealed a subtle pattern—what’s called a double-duct sign. It often means something serious, but there was no visible tumor. CT, CT with contrast, MRI, even MRI with contrast—all showed the same sign, but no clear cause. After three months of scan after scan, an endoscopic ultrasound finally revealed a hidden pancreatic cancer. By the time my dad received treatment, it had spread. He fought for two years. He didn’t survive.
So, we built this AI so that his tumor would have been diagnosed the moment that first CT scan was taken. See how our AI instantly finds double duct signs among 300k medical images.
TL;DR
- 300k+ organs from CT, MRI (T1, T2, T1GD, FLAIR, DTI, MRA, PD), and mammogram learned without labels
- Embeddings form searchable graphs with search accuaracy to the level of the individual - literally find an individual by any organ
- Used to detect rare conditions and estimate patient properties
- Motivated by real-world clinical gaps and personal loss
- Commissioned by the University of Calgary, with funding from Alberta Innovates
91 % k‑Nearest Neighbor Clustering — No Labels Required


See details & per‑organ accuracy
Our self‑supervised Vision Transformer learned from 300 000 CT, MRI, and mammogram organ images without labels, achieving 91 % k‑Nearest Neighbor accuracy across 130 categories.
Organ (Scroll this table to see accuracy for 130 organs)⇵ | ResNet18 Accuracy | ViT-10 Accuracy | Test Samples |
---|---|---|---|
Overall | 91.44% | 87.71% | -- |
Adrenal Gland Left (CT) | 99.54% | 96.94% | 434 |
Adrenal Gland Right (CT) | 99.78% | 99.12% | 454 |
Aorta (CT) | 98.14% | 98.63% | 483 |
Autochthon Left (CT) | 95.20% | 93.55% | 500 |
Autochthon Right (CT) | 96.27% | 94.46% | 510 |
Brain (CT) | 92.86% | 87.88% | 28 |
Clavicula Left (CT) | 97.92% | 94.25% | 96 |
Clavicula Right (CT) | 100.00% | 96.74% | 85 |
Colon (CT) | 96.23% | 93.43% | 504 |
Duodenum (CT) | 95.60% | 94.09% | 432 |
Esophagus (CT) | 97.89% | 97.38% | 379 |
Face (CT) | 90.62% | 91.84% | 32 |
Femur Left (CT) | 95.81% | 90.89% | 454 |
Femur Right (CT) | 97.09% | 91.46% | 412 |
Full Torso (CT) | 99.06% | 98.22% | 640 |
Gallbladder (CT) | 98.45% | 97.93% | 386 |
Gluteus Maximus Left (CT) | 92.34% | 88.81% | 444 |
Gluteus Maximus Right (CT) | 95.73% | 86.89% | 422 |
Gluteus Medius Left (CT) | 96.65% | 93.28% | 418 |
Gluteus Medius Right (CT) | 97.28% | 86.90% | 404 |
Gluteus Minimus Left (CT) | 98.17% | 94.36% | 436 |
Gluteus Minimus Right (CT) | 99.76% | 97.03% | 414 |
Heart Atrium Left (CT) | 96.35% | 90.16% | 137 |
Heart Atrium Right (CT) | 95.73% | 93.33% | 164 |
Heart Myocardium (CT) | 92.55% | 92.13% | 255 |
Heart Ventricle Left (CT) | 87.56% | 91.74% | 209 |
Heart Ventricle Right (CT) | 94.91% | 94.06% | 275 |
Hip Left (CT) | 92.60% | 87.95% | 446 |
Hip Right (CT) | 89.12% | 86.21% | 432 |
Humerus Left (CT) | 89.71% | 75.76% | 68 |
Humerus Right (CT) | 94.44% | 83.33% | 90 |
Iliac Artery Left (CT) | 86.51% | 81.25% | 415 |
Iliac Artery Right (CT) | 90.76% | 75.06% | 433 |
Iliac Vena Left (CT) | 92.89% | 93.93% | 408 |
Iliac Vena Right (CT) | 94.44% | 81.31% | 414 |
Iliopsoas Left (CT) | 97.23% | 94.26% | 433 |
Iliopsoas Right (CT) | 94.88% | 93.07% | 430 |
Inferior Vena Cava (CT) | 99.18% | 97.03% | 487 |
Kidney Left (CT) | 97.82% | 95.02% | 412 |
Kidney Right (CT) | 98.84% | 95.68% | 430 |
Liver (CT) | 98.46% | 99.11% | 456 |
Lung Lower Lobe Left (CT) | 97.77% | 95.95% | 449 |
Lung Lower Lobe Right (CT) | 97.88% | 98.28% | 472 |
Lung Middle Lobe Right (CT) | 97.43% | 96.67% | 350 |
Lung Upper Lobe Left (CT) | 97.24% | 93.52% | 398 |
Lung Upper Lobe Right (CT) | 91.07% | 94.69% | 112 |
Pancreas (CT) | 94.57% | 94.20% | 516 |
Portal Vein and Splenic Vein (CT) | 97.47% | 97.91% | 435 |
Pulmonary Artery (CT) | 98.00% | 93.68% | 100 |
Sacrum (CT) | 99.02% | 99.75% | 409 |
Scapula Left (CT) | 95.19% | 87.37% | 104 |
Scapula Right (CT) | 95.70% | 84.78% | 93 |
Small Bowel (CT) | 90.19% | 91.22% | 428 |
Spine Segment (CT) | 95.67% | 86.93% | 531 |
Spleen (CT) | 99.17% | 97.68% | 480 |
Stomach (CT) | 98.29% | 92.54% | 469 |
Trachea (CT) | 98.99% | 94.85% | 99 |
Tumour Colon (CT) | 86.67% | 15.38% | 15 |
Tumour Lung (CT) | 75.00% | 30.00% | 8 |
Tumour Pancreas (CT) | 94.12% | 75.86% | 34 |
Urinary Bladder (CT) | 99.74% | 98.87% | 390 |
Vertebrae C1 (CT) | 93.10% | 89.74% | 29 |
Vertebrae C2 (CT) | 82.50% | 82.86% | 40 |
Vertebrae C3 (CT) | 66.67% | 61.11% | 45 |
Vertebrae C4 (CT) | 51.52% | 46.34% | 33 |
Vertebrae C5 (CT) | 52.73% | 47.50% | 55 |
Vertebrae C6 (CT) | 79.25% | 57.14% | 53 |
Vertebrae C7 (CT) | 78.90% | 80.00% | 109 |
Vertebrae L1 (CT) | 94.49% | 82.81% | 490 |
Vertebrae L2 (CT) | 92.39% | 81.41% | 486 |
Vertebrae L3 (CT) | 89.10% | 78.08% | 477 |
Vertebrae L4 (CT) | 91.43% | 84.33% | 490 |
Vertebrae L5 (CT) | 96.58% | 96.21% | 439 |
Vertebrae L6 (CT) | 0.00% | 0.00% | 4 |
Vertebrae T1 (CT) | 84.75% | 83.59% | 118 |
Vertebrae T10 (CT) | 78.42% | 62.22% | 329 |
Vertebrae T11 (CT) | 88.11% | 66.60% | 454 |
Vertebrae T12 (CT) | 93.14% | 77.62% | 510 |
Vertebrae T13 (CT) | 0.00% | 0.00% | 1 |
Vertebrae T2 (CT) | 80.00% | 83.15% | 115 |
Vertebrae T3 (CT) | 77.88% | 71.43% | 113 |
Vertebrae T4 (CT) | 66.33% | 63.89% | 98 |
Vertebrae T5 (CT) | 53.39% | 37.84% | 118 |
Vertebrae T6 (CT) | 47.06% | 45.79% | 119 |
Vertebrae T7 (CT) | 42.50% | 32.43% | 120 |
Vertebrae T8 (CT) | 41.22% | 26.76% | 148 |
Vertebrae T9 (CT) | 66.67% | 40.26% | 204 |
Full Brain (DTI) | 100.00% | 100.00% | 818 |
Edema Brain (FLAIR) | 32.69% | 67.65% | 52 |
Enhancing Tumour Brain (FLAIR) | 25.42% | 23.73% | 59 |
Full Brain (FLAIR) | 69.70% | 64.84% | 99 |
Non-Enhancing Tumor Brain (FLAIR) | 8.97% | 12.73% | 78 |
Full Brain (MRA) | 100.00% | 100.00% | 73 |
Full Brain (PD) | 95.24% | 100.00% | 63 |
Edema Brain (T1GD) | 37.93% | 42.19% | 58 |
Enhancing Tumour Brain (T1GD) | 9.09% | 22.22% | 44 |
Full Brain (T1GD) | 51.28% | 82.08% | 78 |
Non-Enhancing Tumor Brain (T1GD) | 9.09% | 7.50% | 66 |
Edema Brain (T1) | 22.22% | 59.70% | 54 |
Enhancing Tumour Brain (T1) | 6.90% | 27.27% | 58 |
Full Brain (T1) | 55.56% | 68.09% | 90 |
Full Head (T1) | 100.00% | 100.00% | 77 |
Non-Enhancing Tumor Brain (T1) | 6.67% | 20.37% | 60 |
Edema Brain (T2) | 16.33% | 58.67% | 49 |
Enhancing Tumour Brain (T2) | 11.86% | 20.00% | 59 |
Full Brain (T2) | 88.59% | 91.43% | 149 |
Non-Enhancing Tumor Brain (T2) | 4.55% | 15.25% | 66 |
Anatomy Behaves Like a Network

Above: Each dot in this graph represents a 3‑D MRI scan of a human brain as learned by our self-learning model. Blue dots represent biological males, red dots represent biological females. 6-degrees of separation is in effect here. Click below for more...
Learn more about these networks
Our self‑supervised model learned how these ~1 000 brains relate to one another — without any labels — and organized them into a small‑world, scale‑free network: About 5 % of brains act as “hub‑brains” (prototypes) at the center. The rest branch outward like relations in a social network, only a few “steps” away from any other brain (think six degrees of separation).
Notice the split: the large cluster on the left is entirely male, while the large cluster on the right mixes male and female brains. We don’t yet know why — but this is the power of discovery: the model finds patterns we didn’t expect, giving us new questions to explore.
Self‑supervised learning organizes anatomy into scale‑free, small‑world graphs — the same kind of networks we see in biology and even social systems. In simple terms, the model is finding the most efficient way to connect similar organs, like neighborhoods linked by a few well‑connected “hubs.”
This isn’t something we programmed in — it happens on its own, like a phase change when water freezes. And that’s where the discoveries begin. For example, some organs, like the bladder, cluster by biological sex, while many others, like brains (see above), do not. This is tremendously interesting!
Another surprise: these learned representations don’t follow the familiar “bell curve” (Gaussian distribution) that so much of medical science assumes. Instead, they have a long tail — meaning rare, unusual cases carry far more weight than we typically expect.

Medical Search: Finding a Needle in a Haystack. Searching for a Patient Priors and Disease Across 300,000 Scans
Our model learns organs like fingerprints — unique to each person — opening the door to a new kind of personalized medicine. It doesn’t just group organs by type; it recognizes them at the level of the individual. This makes for a search engine so accurate, our model can literally find individuals in a population using any one of their organs as a search query.

Above:An organ (outlined in bold) is used as a search query. Our model finds the most similar results (indicated by the arrows). The left search query image is an L1 verteba with a bone island (purple outline) where other bone-island L1 vertebrae instances are returned. The right search query image shows a pancreas with double-duct sign (yellow outline) and here other double-duct sign instances are returned. Searching through 300 000 total organs take only about 0.0005 seconds which enables population scale searching.
See additional cases & results
To expand on how remarkable this is, there is one patient in our dataset that has 4 torso CT scans. We use one CT scan, select 45 different organs including L-vertebrae, pancreas, gallbladder, duodenum, stomach, left and right adrenal glands, heart ventricles, kidneys, psoas, glutes, and more - a good variety of organs. Since this patient has 4 scans, the search scan is held out meaning that there are 3 instances of each of his organs in out 300k organ dataset. Our search finds his other organs instances, with all 3 other instances of each organ as the top 3 search results, 65% of the time, and this is with a tiny research-sized model too. To reiterate just how sensitive the search is, out model finds this patient's L1-L5 vertebrae as the top 3 search results 100% of the time. So, inputing his L3 vertebra as a search query, the top 3 search results are his other three L3 vertebrae - out dataset has ~6000 L3 vertebrae and 52,000 vertebra of all types in it. The model doesn't just organize by organ type, but it clusters anatomy at the level of the individual.
How the Model Sees: Patch‑Level Visualizations
Learn more about these visualizations
In each video, multiple panes are shown from left to right. The first pane displays the original scan image, while each successive pane shows three principal components: the first component mapped to the red channel, the second to green, and the third to blue. Moving across the panes from left to right, the principal components increase in groups of three (e.g., components 1–3, then 4–6, then 7–9, and so on).
The videos include a BIRADS 5 tomosynthesis (3D) mammogram, a T1‑weighted contrast‑enhanced (T1Gd) MRI of the brain, a CT of a left kidney, and a CT of a liver.
It's especially obvious in the liver video that the model learns to focus on the correct anatomy despite having no labels or annotations during training — this focus emerges automatically.
Segmentation Without Skip Connections — 94 % DICE

Learn more about the segmentation model
The U-Net architecture was a game‑changer for medical image segmentation. It uses a clever design with an encoder–decoder setup connected by “skip connections,” which pass information directly from the early layers of the network to the later ones. This gives the model strong “hints” about what the correct segmentation should look like. But these shortcuts come with downsides: they can cause the model to rely too heavily on those direct connections, making it less flexible when dealing with unfamiliar data. They can also introduce small visual errors—often called “floaters”—in the final results.
To overcome these issues, we built a new segmentation model that removes skip connections entirely. This forces the network to build a deeper understanding of the images it sees, improving its ability to handle new, unseen data and reducing visual artifacts in its outputs.
Our model also uses a Vision Transformer—a type of neural network that can “look” at an image as a whole, not just piece by piece. This gives it a global perspective and opens the door for future features like prompt‑based segmentation. Even without prompting, our model achieves a 94% DICE score on the TotalSegmentator dataset, performing on par with leading models like nnUnet but with greater consistency.
The system is built for scale, supporting distributed training and inference using PyTorch. We’ll be open‑sourcing the model soon so others can explore and build on it.