Automated quantitative analysis of peri-articular bone microarchitecture in HR-pQCT knee images
===============================================================================================

* Nathan J. Neeteson
* Sasha M. Hasick
* Roberto Souza
* Steven K. Boyd

## ABSTRACT

There is growing interest in applying HR-pQCT to image the knee, particularly in the study of osteoarthritis, which necessitates the development and validation of novel image analysis workflows. In this work, we present and validate the first fully automated workflow for *in vivo* quantitative assessment of peri-articular bone density and microarchitecture in the human knee. Bone segmentation models were trained by transfer learning with a large dataset of radius and tibia images (N=2,598) and fine-tuned on a knee image dataset (N=131), atlas-based registration was used to identify medial and lateral contact surfaces, and morphological operations combined these intermediate outputs to generate peri-articular regions of interest (ROIs) for morphological analysis. Accuracy was assessed with an external validation dataset (N=131), where predicted and reference morphological parameters showed excellent correspondence (0.86≤R2 ≤0.99), with moderate bias present in predictions of subchondral bone plate density (-80 mg HA/cm3) and thickness (+0.15 mm). Precision was assessed with a triple-repeat measures dataset (N=29), where the short-term precision RMS%CV estimates ranged from 0.7% to 3.5% when rigid registration was used to synchronize ROI generation across images.

## Introduction

High-resolution peripheral quantitative computed tomography (HR-pQCT) is an *in vivo* medical imaging modality that produces density-calibrated images with a nominal isotropic resolution of 60.7 µm, allowing for direct and standardized measurement of bone density and microarchitecture1. HR-pQCT was originally intended for measuring bone at peripheral skeletal sites, primarily at the ultra-distal radius and tibia2. However, there is growing interest in utilizing this technology to measure bone quality in more proximal sites, such as the knee3,4 and the elbow5. Data processing and analysis procedures for HR-pQCT radius and tibia images are well-established6, but extending this technology for use at other sites requires the development and validation of new tools and protocols.

Our lab developed and validated a protocol for analyzing the peri-articular bone in the knee using HR-pQCT7 and has so far employed it in two cross-sectional studies8,9 and one longitudinal study10. Briefly, we summarize the analysis steps7: First, the image is segmented, using the standard semi-automated HR-pQCT segmentation procedure11,12 to separate the background, trabecular bone, and cortical bone (which we define to also include the subchondral bone plate, for simplicity). Next, the articular contact surfaces on the tibial plateaus and femoral condyles are manually identified. Finally, the articular contact surfaces are axially extruded into the bone to define four peri-articular regions of interest (ROIs) per compartment: subchondral bone plate and shallow (0 – 2.5 mm), middle (2.5 – 5 mm), and deep (5 – 7.5 mm) trabecular bone. These ROIs are shown in a volume rendering of the femur and tibia from a sample HR-pQCT image in Figure 1. In each subchondral bone plate ROI, the subchondral bone plate bone mineral density (Sc.BMD) and thickness (Sc.Th) are measured. In each trabecular ROI, the trabecular bone mineral density (Tb.BMD), thickness (Tb.Th), separation (Tb.Sp), and number (Tb.N) are measured, using standard methods6.

![Figure 1.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F1.medium.gif)

[Figure 1.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F1)

Figure 1. 
The peri-articular ROIs. The four compartments each have three trabecular ROIs (shallow - yellow, middle - orange, deep - red) and one subchondral bone plate ROI (green). The ROIs are labelled for the lateral femoral condyle, these labels also correspond to the same-coloured ROIs in each other compartment.

A major limitation of this protocol is that the semi-automated bone segmentation and the manual definition of articular contact surfaces incur a significant labour burden, not least due to the enormous size of a knee HR-pQCT image relative to the size of a more typical HR-pQCT image of the ultra-distal radius or tibia, hand, or foot. Furthermore, the subjectivity of these manual processes is a potential source of inter- and intra-operator errors and biases in study data13. A high labour burden limits the feasibility of scaling knee HR-pQCT studies to larger sample sizes, and inter-operator biases limit the ability to compare data between studies, scale studies to longer time periods, or conduct multi-center studies; all cases where data would be processed by multiple individuals and be subject to inter-operator biases.

Two well-established techniques for automating biomedical image segmentation are deep learning14 and atlas-based segmentation15. Fully convolutional neural networks, such as the UNet and its variants16, have been employed for segmentation to great success across many biomedical imaging modalities, including HR-pQCT17,18. In recent years, generative-adversarial19 and transformer-based20 architectures have also shown excellent performance for segmentation in both general and biomedical domains. Similarly, atlas-based segmentation is a well-established technique for biomedical image segmentation and has been applied successfully across several modalities and sub-domains15. While atlas-based segmentation is perhaps most well-known in the subdomain of neuroimaging, it also has a rich history of application in the subdomain of musculoskeletal imaging21.

The objective of this study is to develop and validate the first robust, fully automated protocol for peri-articular analysis of HR-pQCT knee images. We use deep learning for bone segmentation, atlas-based segmentation for articular contact surface labelling, and traditional morphological image processing operations to combine these segmentations and obtain the final peri-articular ROIs. We leverage several external validation datasets to evaluate the precision and accuracy of the developed protocol. This automated analysis protocol, which we intend to freely distribute, will accelerate HR-pQCT knee research by enabling studies to scale more efficiently and eliminating inter-operator biases in reported study data.

## Methods

### Datasets

Images used in this study were obtained using second-generation HR-pQCT (XtremeCT II, Scanco Medical AG, Brütisellen, Switzerland). All participants provided written informed consent before data collection. The Conjoint Health Research Ethics Board approved data collection at the University of Calgary (REB15-2108). There were three distinct sets of data used in this study: *training* data, external *validation* data, and *precision* data. Training data was used to train bone segmentation models and develop knee atlases. External validation data was used to assess the accuracy of the automated workflow, and precision data was used to quantify the precision of the automated workflow. All images in the external validation and precision datasets were scored for motion22, and images with a motion score of 4 or 5 (on a 1-to-5 scale) were excluded from the analysis.

### Training Data

There were two subsets of training data used in this study: the *distal leg/wrist* training data, and the *knee + background* training data. The *distal leg/wrist* training data contains pooled image data from two prior studies: a normative study (n=1,236)23 and a hip fracture study (n=108)24. These images were collected at standard scan sites for the distal radius and tibia, with 168 axial slices per image and a nominal isotropic resolution of 60.7 *µ*m. These images have reference segmentations labelling background, cortical bone, and trabecular bone, which were generated using the standard semi-automated protocol6. The full 3D images were preprocessed to create a training dataset consisting of 64x64x64 patch samples. Five patches were sampled from each full image with 50% of patches centered on cortical bone and 50% of patches randomly sampled from the full volume.

The *knee + background* training data contains image data from a previous longitudinal knee HR-pQCT study (n=19)8 investigating the effect of ACL injury on bone microarchitecture. In this study, each participant had bilateral HR-pQCT imaging of their knees at baseline (following ACL injury) and up to three follow-up visits. These images were collected following the protocol developed in our lab: the knee is placed at full extension in the scanner gantry and 1008 axial slices are collected, covering approximately 40 mm of the femur and 20 mm of the tibia, with a nominal isotropic voxel resolution of 60.7 *µ*m. Reference segmentations had been previously generated following the standard semi-automated protocol11,12 as well as articular contact surface masks and peri-articular ROIs generated7. The 3D images were preprocessed to create a training dataset consisting of 64x64x64 patch samples. From each articular contact surface in each image (medial femur, lateral femur, medial tibia, lateral tibia), 100 patches were extracted with 50% of the patches centred on the subchondral bone plate ROI and 50% of the patches randomly sampled from the vicinity of the trabecular peri-articular ROIs.

All knee images from prior studies had their background masked out as part of the data curation protocol that was followed at the time of those studies. To ensure the bone segmentation model could properly segment the background in new images, background patch samples were randomly sampled from 115 radius and 115 tibia images selected from the normative study dataset. From each of these images, five patches were sampled, centred on the background only. These background patches were pooled with the preprocessed knee image patches to create the *knee + background* training dataset.

### External Validation Data

The external test data contains pooled image data from three prior cross-sectional knee HR-pQCT studies: a cadaveric validation study (n=12)25 and two *in vivo* studies of bone microarchitecture in individuals who had undergone ACL reconstruction (n=589, n=708). These images were collected following the same protocol as the knee images in the training data, and have peri-articular ROIs generated following the same semi-automated protocols.

### Precision Data

A new repeat-measures dataset was collected to allow quantification of short-term precision of measurement workflows in knee HR-pQCT data. Adult participants were recruited using convenience sampling with a target sample size of 3026. Participants were screened for pregnancy at the time of recruitment by participant self-disclosure. HR-pQCT knee scans were performed on their non-dominant knee three times, with repositioning between scans, within a two week period. Knee HR-pQCT imaging followed the established protocol; however, to limit the total radiation dose, each scan contained only three stacks of 168 axial slices covering only the proximal tibia (excluding the distal femur).

### Bone Segmentation

#### Model Zoo and Training Procedure

Five distinct model architectures were considered for bone segmentation, all of which were 3D UNet variants or derivatives. Four models were imported from MONAI27: UNet28, UNet++29, UNETR20, and SegResNetVAE30. An additional fifth model, SeGAN19, was custom-implemented with PyTorch v1.12.1(+cu116)31. These specific models were selected due to their availability in an open-source package, their documented state-of-the-art segmentation performance, and because they each represent a distinct variation on the classic fully convolutional segmentation model architecture. All of the segmentation models take a 3D image with a single channel (rescaled densities) as input and produce a 3D image with three class channels (subchondral bone plate, trabecular bone, background) as output.

Training and validation were performed using five-fold cross-validation (CV)32. Models were trained using the PyTorch Lightning framework v1.9.433 with a cross-entropy loss function, the AdamW optimizer34, and a batch size as large as would fit in memory for a given model using two NVIDIA A100 (2 x 80GB VRAM) GPUs, which varied based on model architecture and hyperparameters. In each fold, training proceeded for a minimum of 40 epochs with the following stopping conditions: four hours of wall clock time elapsed since the start of training, 1000 epochs, or no improvement in validation set subchondral bone plate Dice similarity coefficient (DSC)35 over the previous 40 epochs. The ranges of hyperparameters considered for each model architecture are documented in the supplemental information (Table S2).

#### From-scratch Learning

A hyperparameter grid search was performed with each of the five model architectures using five-fold CV with the *knee + background* training dataset. The best-performing model of each architecture was selected based on which had the highest mean validation set subchondral bone plate DSC. This resulted in five sets of five trained models: one model from each fold for each architecture. These were labelled the “from-scratch” trained models.

#### Transfer Learning

A hyperparameter grid search was performed with each of the five model architectures using five-fold CV with the *distal leg/wrist* training dataset. The best-performing model of each architecture was selected based on which had the highest mean validation set subchondral bone plate DSC (classifying radial and tibial cortical bone with the subchondral bone class). The models were reinitialized with the optimal hyperparameters and retrained using the full *distal leg/wrist* training dataset to convergence to be used as the pre-trained models. The pre-trained models were fine-tuned using five-fold CV with the *knee + background* training dataset. This resulted in five sets of five trained models, labelled the “transfer” trained models.

#### Ensemble Inference

Ensemble inference uses a set of models to produce a single prediction36. Each model performs patch-based inference on a full image using MONAI’s sliding window inferer27 with a patch width of 128 voxels, an overlap of 25%, and Gaussian blending. In our case, the five models are ensembled by summing their raw outputs element-wise before converting the outputs to voxel-wise class labels by finding the channel index with the largest predicted value at each voxel in the output. Ensemble inference is performed using a single NVIDIA V100 (16GB VRAM) rather than two A100s, as less memory is required for inference than for training.

### Segmentation Post-processing

Figure 2 shows a schematic of the segmentation post-processing workflow, including specific details of the morphological filters applied in each step. In step 1, the outputs from a set of trained models are ensembled to create the raw segmentation, labelling voxels as either subchondral bone plate, trabecular bone, or background. In step 2, the two bone labels (subchondral and trabecular) are combined to create a single bone mask. In step 3, morphological filters fill any gaps in the bone mask. In step 4, morphological filters keep only the largest connected component of the bone mask, which eliminates both small, “disconnected islands” resulting from segmentation errors and removes any secondary bones (e.g. in a proximal tibia image, the proximal fibula and distal femur are considered secondary). In step 5, the minimum subchondral bone plate is extracted from the filtered bone mask by eroding by 4 voxels and subtracting the eroded mask from the original mask. The result is combined with the raw subchondral bone plate mask. In step 6, a morphological filter removes all but the largest connected component from the subchondral bone mask and in step 7, the trabecular mask is created by subtracting this subchondral mask from the filtered bone mask. In steps 8 and 9, the trabecular bone mask is morphologically filtered to fill in any interior gaps and subtracted from the subchondral bone plate mask to ensure no overlap. Finally, in step 10, the trabecular and subchondral bone plate masks are combined into the final post-processed segmentation. All morphological and Boolean operations are performed using scikit-image v0.19.337 and NumPy v1.21.638.

![Figure 2.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F2.medium.gif)

[Figure 2.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F2)

Figure 2. 
Segmentation post-processing workflow. Steps are shown as black arrows while intermediate outputs are visualized as 2D slices of the full 3D volume between steps. The workflow is demonstrated on a tibia with low bone density to demonstrate how the post-processing steps correct possible errors in the initial inferred segmentation.

### Atlas-Based Segmentation of Contact Surfaces

Articular contact surfaces on the tibial plateaus and femoral condyles are labelled on new images using atlas-based segmentation, with a separate atlas for each of the tibia and femur. The average atlases were created using the baseline knee images from the *knee + background* training dataset, which have manually generated masks defining the medial and lateral articular contact surfaces on the tibia and femur. First, images of left knees were reflected across the sagittal plane and combined with images of right knees so that all the images could be used to generate a single atlas (one for the tibia and one for the femur). An initial image is selected randomly and all other images are affinely registered to the initial image and averaged together to create the average atlas. Each of the original images is deformably registered to the average atlas and the contact surface masks from each image are transformed into the atlas space and combined to create the average atlas mask using the STAPLE algorithm39 with a binarization threshold of 0.5. The contact surfaces can be segmented in a new image by deformably registering the tibia and femur to their corresponding atlas and transforming the atlas mask to the new image. If the new image is from a left knee, the image is reflected sagitally before registration and the transformed masks are reflected sagitally back to the original orientation as the final step.

The optimal parameters for atlas generation and deformable registration were determined using a grid search (Table S1) and leave-one-out cross-validation, where there are as many folds as samples in the dataset. In each fold, one image was held out for validation while the rest were used to generate an atlas. The atlas was used to predict the contact surface mask on the validation image and the DSC of the reference and predicted mask were computed, which were averaged across folds. The set of parameters that produced the highest cross-validation DSC was considered optimal and used to generate a final atlas for the tibia and femur. The ranges of parameters considered in the atlas generation grid search are documented in the supplemental information, and the optimal procedure is reported in the Results section.

### Combined Workflow

The full combined workflow is depicted in Figure 3 for both cross-sectional and longitudinal analysis modes. The steps are described in full detail in the proceeding sections.

![Figure 3.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F3.medium.gif)

[Figure 3.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F3)

Figure 3. 
The combined workflow to generate peri-articular ROIs from a knee HR-pQCT image. With cross-sectional data, ROIs are generated for individual images. With longitudinal data, peri-articular ROIs are generated for each image in a time series using rigid registration. The bone segmentations and contact surface masks are transformed to the baseline frame to synchronize both the spatial location of the contact surfaces and the direction of the extrusions that generate the trabecular ROIs.

#### Segmentations

First, a full knee image is manually separated into two sub-volumes, using a custom script integrated into the manufacturer-provided GUI program for image analysis (*µ*CT_Evaluation, v6.6, Scanco Medical AG, Brütisellen, Switzerland). The two sub-volumes contain the distal femur and proximal tibia, which are processed independently. Ensemble inference and post-processing are applied to obtain a bone compartment segmentation. Next, the medial and lateral contact surface masks are obtained by atlas-based segmentation using the corresponding atlas.

#### Longitudinal Registration

If the image data is longitudinal (i.e., repeated-measures), each follow-up image is rigidly registered to the baseline image before generating the peri-articular ROIs. For rigid registration, fixed and moving images are first downsampled by a factor of 4 and Gaussian smoothed with a standard deviation of 1 voxel. The rigid registration uses a Powell optimizer with a maximum of 100 line iterations, a step length of 1, a step tolerance of 10*−*6, a value tolerance of 10*−*6, a geometry-based initialization, linear interpolation, a correlation similarity metric, a regular sampling strategy, and a sampling rate of 30%. Multi-scale registration is performed with downscaling factors of 16, 8, 4, 2, 1 and corresponding Gaussian smoothing standard deviations of 8, 4, 2, 1, 0.1 voxels. Longitudinal registration is performed using SimpleITK v2.2.040. After registration, the follow-up bone segmentation and atlas-segmented contact surface mask are transformed to the baseline image space. The baseline and transformed follow-up contact surface masks are intersected to create the longitudinal contact surface masks in the baseline frame. This is combined with the bone segmentations, in the baseline space, to generate peri-articular ROIs for baseline and follow-up images. Follow-up peri-articular ROIs are transformed back to their respective follow-up space(s). If the image data is cross-sectional and only one image is available from each knee, the longitudinal registration step is skipped, and peri-articular ROIs are created using the atlas-segmented contact surface mask.

#### Peri-articular ROI Generation

A Gaussian filter (standard deviation: 1 voxel) is applied to smooth the medial and lateral contact surface masks. The smoothed contact surface masks are dilated axially out from the bone by 20 voxels (proximally for the tibia and distally for the femur) and intersected with the subchondral bone plate mask from the bone segmentation, and a connected components filter is applied to keep only the largest component. The resulting mask is dilated axially out from the bone by another 5 voxels and intersected once more with the subchondral bone plate mask from the bone segmentation. This creates the first two peri-articular ROIs: the medial and lateral subchondral bone plate. As in prior studies, the articular contact surfaces are extruded axially 7.5 mm into the bone (proximally for the femur and distally for the tibia) to create three trabecular peri-articular ROIs at three ranges of depth from the articular contact surface: shallow (0.0 – 2.5 mm), middle (2.5 mm – 5.0 mm), and deep (5.0 – 7.5 mm). This results in a total of eight ROIs in each of the femur and tibia.

#### Micro-architectural Analysis

Subchondral bone mineral density (Sc.BMD) and thickness (Sc.Th) are computed for the subchondral bone plate ROIs, while trabecular bone mineral density (Tb.BMD), thickness (Tb.Th), separation (Tb.Sp), and number (Tb.N) are computed for the shallow, middle, and deep peri-articular ROIs. All micro-architectural parameters are computed using standard manufacturer-implemented algorithms (IPL)6.

### Evaluation and Metrics

First, the combined workflow was applied to all images in the held-out test dataset in cross-sectional mode. The predicted and reference ROIs were used to perform micro-architectural analysis. The paired reference and predicted outputs were compared using both direct linear correlation analysis and the difference and mean values in Bland-Altman plots.

Next, the combined workflow was applied to all images in the precision dataset in both cross-sectional and longitudinal mode (using the first scan as the baseline and repeat scans as follow-ups). Generated ROIs were used to perform micro-architectural analysis and the cross-sectional and longitudinal results were used to compute the root-mean-square percentage coefficient of variation (RMS%CV) and least significant change (LSC)26 for all parameters at all ROIs in the tibia. The normality of the distribution of individual-level standard deviations was assessed with D’Agostino and Pearson’s method41 and the significance of differences in precision between the cross-sectional and longitudinal modes was tested using independent Wilcoxon signed-rank tests42, with the significance threshold adjusted for multiple testing using the Benjamini, Krieger, and Yekutieli two-step false discovery rate correction procedure (34 tests)43. The base significance threshold for all statistical tests was 0.05. Visualizations were created using PyVista v0.38.644, Matplotlib v3.5.345, pandas v1.3.546, seaborn v0.12.247, and Inkscape.

## Code Availability

The code used in this project is distributed across three GitHub repositories: utility code for training segmentation models in the PyTorch Lightning framework with HR-pQCT images ([https://github.com/Bonelab/bonelab-pytorch-lightning](https://github.com/Bonelab/bonelab-pytorch-lightning)), code for training, inference, and atlas-based segmentation ([https://github.com/Bonelab/hrpqct-knee-segmentation](https://github.com/Bonelab/hrpqct-knee-segmentation)), and code for organizing and analyzing the precision dataset ([https://github.com/Bonelab/triknee](https://github.com/Bonelab/triknee)).

## Results

### Datasets

The *distal leg/wrist* training dataset contained 2,598 total images (1,257 radii, 1,341 tibiae) from 1,276 participants (457 men, 819 women). The *knee + background* dataset contained 131 total images (each containing the femur and tibia) from 31 participants (10 men, 21 women). The external validation dataset contained 128 images from 64 participants (21 men and 43 women) and the precision dataset contained 87 images from 29 participants (10 men, 19 women).

### Segmentation Models

Figure 4 shows the mean cross-validation DSC for each class on the *knee + background* dataset for the best-performing version of each of the five architectures considered and with each training approach. All the best models achieve DSC≥0.94 at segmenting background and trabecular bone, except for the from-scratch trained SegResNetVAE. As expected, segmentation of subchondral bone plate is the most challenging aspect of the bone segmentation task. Most models achieved only 0.84≤ DSC≤0.87, the from-scratch trained SegResNetVAE achieved DSC = 0.66 (worse result), and the transfer trained SegResNetVAE and UNet++ achieved DSC = 0.89 (best results). Finally, the SegResNetVAE, UNet, UNet++, and UNETR show better performance across all classes when transfer trained than when trained from-scratch, while the reverse is true for the SeGAN. The transfer trained UNet++ was selected over the SegResNetVAE as the final model for bone segmentation as it had a lower standard deviation in subchondral bone plate DSC during cross-validation (UNet++: 0.889 ± 0.006, SegResNetVAE: 0.889 ± 0.014).

![Figure 4.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F4.medium.gif)

[Figure 4.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F4)

Figure 4. 
Cross-validation results for the best-performing model of each architecture trained with each method. The bars show the mean DSC averaged across folds, while the error bars show the standard deviation of the DSC. The dashed line indicates DSC = 1.0.

### Atlas-Based Segmentation

The best-performing atlas-based segmentation method was the diffeomorphic demons algorithm with a displacement smoothing standard deviation of 2 and an update smoothing standard deviation of 2. Images are downsampled by a factor of 8 and smoothed with a Gaussian filter with a standard deviation of 0.5 prior to registration, and multiscale registration is performed with down-sampling factors of 16, 8, 4, 2 and corresponding Gaussian smoothing standard deviations of 8, 4, 2, 1. The average DSC values across the leave-one-out cross-validation for this optimal configuration were 0.81 and 0.83 for the medial and lateral tibia plateaus, respectively, and 0.83 and 0.84 for the medial and lateral femoral condyles, respectively.

### Peri-articular microarchitectural parameters

Running times for the various steps in the workflow are reported in Table S3. Ensemble inference is performed on an NVIDIA V100 (16GB), and all other steps are performed on the CPU.

#### Accuracy

Figure 5 shows Bland-Altman and correlation plots comparing the results of morphological analysis on the external validation dataset with the reference and predicted peri-articular ROIs. For all morphometric parameters in all trabecular ROIs, the limits of agreement contain zero mean bias error, and the predicted and reference values are highly correlated (R2≥0.97). In the tibial subchondral bone plate ROIs, the limits of agreement overlap with zero, indicating no bias for both Sc.BMD and Sc.Th. In the femoral subchondral bone plate ROIs, the limits of agreement do not overlap with zero and indicate over-prediction of Sc.Th (+0.15 mm, compared to an approximate range of 0.35 to 0.85 mm) and under-prediction of Sc.BMD (-80 mg HA/cm3, compared to an approximate range of 500 to 800 mg HA/cm3). Predicted and reference values are highly correlated for Sc.BMD in the femur (R2 = 0.95) and Sc.Th in the tibia (R2 = 0.94), while the correlations are weakest, though still quite strong, for Sc.BMD in the tibia (R2 = 0.86) and Sc.Th in the femur (R2 = 0.88).

![Figure 5.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F5.medium.gif)

[Figure 5.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F5)

Figure 5. 
Linear correlation and Bland-Altman plots showing results on the external test datasets. On the linear correlation plots, the black line is y=x, the solid red line is a regression line, and the dashed red lines indicate the confidence interval for the regression. On the Bland-Altman plots, the black line is zero bias, the solid red line is the mean bias, and the dashed red lines are the limits of agreement.

Figure 6 shows sample visualizations of the compartments with the largest positive errors, the largest negative errors, and the median absolute disagreement in Sc.Th in the femur and tibia, with one row per image. There are two potential sources of disagreement: articular contact surface labelling, and subchondral bone plate segmentation. In each image, both types of error are present, with substantial disagreements at the medial and lateral edges of the image as well as at the endosteal boundary, which separates cortical bone, or subchondral bone plate, from trabecular bone. Notably, disagreements at the periosteal boundary are minimal, and correspondingly the differences at inter-ROI boundaries are also minimal. Finally, both the axial projections and sagittal slice overlays show good qualitative agreement between the predicted and reference ROIs, even for these four images with the largest errors in Sc.Th.

![Figure 6.](http://medrxiv.org/http://medrxiv.stage.highwire.org/content/medrxiv/early/2024/05/21/2024.05.20.24307643/F6.medium.gif)

[Figure 6.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/F6)

Figure 6. 
Sample visualizations of the compartments with the largest over-estimation of Sc.Th, median absolute disagreement in Sc.Th, and under-estimations of Sc.Th in the femur (top three rows, respectively) and tibia (bottom three rows). In the left-most column we show axial projections of the predicted and reference ROIs, combined into a single compartmental ROI, where green represents true positives, red represents false positives, and yellow represents false negatives. In the two center columns, the predicted (left) and reference (right) ROIs are overlaid on the gray-scale image of a central sagittal slice (the location of which is indicated by a black line in the left-most column). Here, green is the subchondral bone plate and yellow, orange, and red are the shallow, middle, and deep trabecular bone ROIs, respectively. ROI errors are shown in red in the right-most column and are defined as any voxel where the predicted and reference ROI labels disagree.

#### Precision

Table 1 contains RMS%CV and LSC values for all morphological parameters in all peri-articular ROIs in the tibia, computed in both cross-sectional and longitudinal analysis modes with the repeat-measure precision dataset. RMS%CV values are ≤3.5% for all parameter-ROI combinations, indicating excellent reproducibility of morphometric measurements across repeat imaging with repositioning. Precision is best for Tb.BMD (0.8% to 1.5%) and Tb.Th (0.7% to 1.3%), while Sc.Th shows the relative worst precision (2.2% to 3.5%). For all but two parameter-ROI combinations, the differences in precision between cross-sectional and longitudinal analysis modes are not statistically significant. The exceptions are Sc.Th in the medial tibial plateau, where the precision is significantly improved with the longitudinal mode (RMS%CV improves from 3.5% to 2.2%) and Tb.N across all trabecular ROIs, where precision is statistically significantly worse with the longitudinal mode, though the effect is quite small (RMS%CV worsens from 2.8% to 2.9%). For nearly all ROIs across all parameters except Tb.N, the trend is for precision to be improved when the analysis is performed in the longitudinal mode compared to the cross-sectional mode, while this trend is reversed for Tb.N.

View this table:
[Table 1.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/T1)

Table 1. RMS%CV and LSC values for all microarchitectural parameters in all compartments of the tibia, compared between the cross-sectional and longitudinal modes. Differences in RMS%CV between modes were tested using Wilcoxon signed-rank tests.

## Discussion

We have developed and validated a robust, automated protocol for peri-articular analysis of bone microarchitecture with HR-pQCT knee images. We found that of the segmentation models considered, the UNet++ trained by transfer learning achieved the most accurate segmentation of the subchondral bone plate in five-fold cross-validation, though many of the models performed nearly as well, while atlas-based registration by the diffeomorphic demons algorithm produced preliminary compartmental segmentations with DSCs of 0.81 to 0.84 in leave-one-out cross-validation. Using a held-out external validation test set, we found excellent accuracy and precision in standard bone morphometric analysis outputs when compared to the previous labour-intensive analysis workflow.

This is the first study to present an automated protocol for the analysis of peri-articular bone morphometry with *in vivo* HR-pQCT knee images and lays the foundation to leverage this technology more widely. While the application of HR-pQCT for studying bone quality in the human knee is relatively recent, there have been several studies demonstrating semi-automated protocols for both *in vivo* and *ex vivo* HR-pQCT knee images. These include the *in vivo* study of knee microarchitecture in OA patients and controls by dividing the tibial plateaus into four compartments (anterior, central, posterior, and medial or lateral)3,4,48, the *ex vivo* study of resected human femurs by manually creating volumes of interest, based on visual inspection49, and the *in vivo* protocol developed in our lab and described in the introduction7.

Comparing predicted to reference morphometric parameters in the test dataset (Figure 5), we observe strong correlation and high accuracy for all trabecular parameters (R2≥0.96) while for the subchondral bone plate parameters, the correlations are still strong but relatively weaker (0.86≤R2≤0.95). Sample visualizations (Figure 6) for the femur and tibia images show one of the biggest challenges is measuring the Sc.Th. This is not unexpected because the subchondral bone plate is extremely thin, which is evident both visually in Figure 6 and in the ranges of Sc.Th in Figure 5 (femur: ≤ 0.25 Sc.Th [mm] ≤1.2; tibia: 0.25 ≤Sc.Th [mm] ≤2.5). An extremely thin structure combined with endemic measurement artifacts (e.g. motion, partial volume effects, etc.) presents a difficult segmentation problem, and this is worsened by the possible presence of ambiguities in the definition of the boundary of the structure such as those caused by adjacent thickened trabeculae and porosities and/or breaks in the subchondral bone plate. The smaller the structure, the greater the marginal effect will be of incorrectly labelled voxels on density and thickness measurements. Compared to the subchondral bone plate, the bone as a whole and the articular contact surfaces are much easier to segment accurately (see Figure 4) All these effects combine to explain the greater accuracy in the trabecular parameters when compared to the subchondral bone plate parameters.

It is important to keep in mind that the correctness of the reference segmentations used to train and evaluate the model is subjective. These segmentations were generated using a semi-automated procedure with a human operator intervening to visually inspect and correct the periosteal and endosteal boundaries where required. Corrections were made using a manufacturer-provided tool (*µ*CT_Evaluation, Scanco Medical AG, Brütisellen, Switzerland) in which surface contours are inspected and adjusted laborously. A particular challenge is that in the knee the articular contact surfaces are predominantly oriented parallel to the axial plane and the proprietary software does not allow editing contours in other more convenient orientations. As such, it is difficult to ascertain which of the reference or predicted segmentations is objectively more correct when disagreements arise, or whether the idea of an objectively correct subchondral bone plate segmentation is even meaningful. In practical usage, it is most important that a measurement has a sufficient degree of self-consistency, or precision, such that differences can be reliably detected across groups or across time. When we apply our novel technique to a triple-repeat-measures dataset with N=29, we find excellent measures of reproducibility across all parameters and all compartments: 0.7 *<* RMS%CV [%] *<* 3.5, compared to 0.3 *<* RMS%CV [%] *<* 3.0 for similar parameters measured by a similar automated workflow at the distal radius and tibia18. Importantly, these measures of precision include not just the effects of the segmentation workflow but also effects caused by repositioning and rescanning. By comparison, the semi-automated procedure used to generate the reference segmentations was evaluated for inter- and intra-operator precision by comparing analysis results on a single set of images either performed by one operator twice (intra) or by two operators separately (inter), and this resulted in intra-operator precision values of 0.5 *<* RMS%CV [%] *<* 3.5 and inter-operator precision values of 0.5 *<* RMS%CV [%] *<* 7.98. The fully automated workflow we present in this work has, by construction, zero inter- or intra-operator precision error.

Finally, while we did not find an across-the-board improvement of the precision of using rigid registration for repeat measures, we did observe a trend towards lower precision error, particularly in Sc.Th. One of the main benefits of rigid registration is to ensure that the projection of the ROIs from the articular contact surface into the bone occurs in the same direction in each image. In our precision data, repositioning was well aligned between scans, but in cases where there is a greater difference in the knee angle between measurements, it is presumed that registration would be an important tool to maximize precision50. In our precision dataset, all participants were healthy and had no difficulty in keeping their knee close to 0° of flexion consistently in each measurement. This is often not the case for injured participants with limited knee mobility.

There are limitations to this study to be acknowledged. All study participants were recruited locally in Calgary, AB, Canada and therefore the geographic and ethnographic heterogeneity of the study population is limited. It is not expected that this will limit its application, but that remains to be confirmed. All images were obtained using the same XtremeCTII scanner and all of the reference segmentations were generated by only two operators (one for radius and tibia images and one for knee images). Due to limitations of radiation dose, the knee image data for precision quantification contained only images of proximal tibia and excluded the distal femur. There was also a relative paucity of knee images with reference segmentations to be used for training and testing, since the application of HR-pQCT to the study of the knee is quite a new development compared to the well-established application to the distal radius and tibia.

There are elements of the workflow that could be improved by the incorporation of additional tools and/or techniques, or alternative measurement approaches. While the segmentation and ROI generation is fully automated, currently a user first spends approximately two minutes placing bounding boxes to crop out the distal femur and proximal tibia from a full knee image. This step saves memory and computation time in subsequent steps and is an opportunity to perform basic visual data checking and metadata entry before processing (e.g., to ensure the image has been reconstructed properly, to perform motion scoring, to make sure the image is from the correct side, etc.). However, it could potentially be automated to further streamline data processing. So-called ‘stack shift’ artifacts are a problem in any HR-pQCT measurement produced by performing multiple sequential measurements and “stacking” them together, and knee images are no exception. These artifacts can be particularly troublesome if the stack shift occurs at or near the articular contact surface, but could be ameliorated by incorporating a stack shift-correcting registration approach51. The thickness of the subchondral bone plate is approaching, or at, the resolution limits of HR-pQCT. An alternative to the standard measurement technique of segmentation and estimation of Sc.Th by Hildebrand’s method for calculating structure thickness52 could be to utilize the dual step function model-fitting approach originally proposed for measuring cortical thickness of the proximal femur in clinical CT53. Additionally, multi-task learning approaches have been shown in recent years to dramatically improve the performance of deep learning models54, particularly as a solution to data scarcity55. Multi-task learning could be applied to further improve subchondral bone plate segmentation by simultaneously training a model to predict additional holistic or voxel-wise characteristics of the input image, such as the motion score, the subchondral bone plate and trabecular thickness map, the marrow space thickness map, morphometric parameters, or even strength-based parameters derived from finite element modelling.

## Conclusions

We have presented and characterized a novel workflow for automating the peri-articular analysis of bone at the knee with *in vivo* HR-pQCT images. The algorithm runs fully automatically, requiring no human input after an initial cropping step, and can be operated in either a cross-sectional or longitudinal configuration as needed for a given study. We have demonstrated the accuracy of the workflow when compared to the best available reference data, and have evaluated the short-term precision and shown that it compares favorably to the precision of a similar workflow for obtaining similar measurements on radius and tibia HR-pQCT images. In its current form, it could be integrated directly and seamlessly into a standard HR-pQCT data analysis workflow, and the code, atlases, and trained models have been made available for anyone to use. Future work will focus on the practical application of the workflow on clinical data to investigate osteoarthritis etiology.

## Data Availability

The raw image data used in this study are not publicly available.

## Author contributions statement

N.J.N. conceptualized and implemented the method, conducted the experiments, performed the statistical analysis, and wrote the manuscript. S.M.H. recruited participants and organized collection of the new *in vivo* data, and processed the new data.

R.S. provided input into method conceptualization and investigation. S.K.B. provided input into method conceptualization and investigation, and provided supervision and project administration. All authors reviewed the manuscript.

## Supplemental Information

View this table:
[Table S1.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/T2)

Table S1. 
Parameter space for the deformable registration grid sweep. Only the diffeomorphic demons type was considered since it was important that the transformations be invertible. Shrink and smooth factors are reported as pairs of downsampling factors and Gaussian smoothing sigmas.

View this table:
[Table S2.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/T3)

Table S2. 
Parameter space for segmentation model selection when training from scratch on either training dataset. Parameter values grouped with round brackets indicate a single array of values.

View this table:
[Table S3.](http://medrxiv.org/content/early/2024/05/21/2024.05.20.24307643/T4)

Table S3. 
The running times for each step in the workflow were measured on the repeat-measures precision dataset and are reported as the average and standard deviation (SD) of running times for indnvidual images.

## Acknowledgments

The authors acknowledge Drs. Danielle Whittier and Andres Kroker for collecting and curating the training datasets and the support and contributions of the staff and students of the Bone Imaging Laboratory at the University of Calgary. This work was supported financially by a grant from the Canadian Institutes of Health Research (CIHR) [PJT 162189] and a Training Graduate PhD Salary Award from Arthritis Society Canada [TGP-21-0000000093].

*   Received May 20, 2024.
*   Revision received May 20, 2024.
*   Accepted May 21, 2024.


*   © 2024, Posted by Cold Spring Harbor Laboratory

This pre-print is available under a Creative Commons License (Attribution-NonCommercial-NoDerivs 4.0 International), CC BY-NC-ND 4.0, as described at [http://creativecommons.org/licenses/by-nc-nd/4.0/](http://creativecommons.org/licenses/by-nc-nd/4.0/)

## References

1.  1.Boutroy, S., Bouxsein, M. L., Munoz, F. & Delmas, P. D. In vivo assessment of trabecular bone microarchitecture by high-resolution peripheral quantitative computed tomography. The J. Clin. Endocrinol. Metab. 90, 6508–6515, DOI: 10.1210/jc.2005-1258 (2005).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1210/jc.2005-1258&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=16189253&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000233754000025&link_type=ISI) 

2.  2.van den Bergh, J. et al. The clinical application of high-resolution peripheral computed tomography (HR-pQCT) in adults: state of the art and future directions. Osteoporos. Int. 32, 1465–1485, DOI: 10.1007/s00198-021-05999-z (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00198-021-05999-z&link_type=DOI) 

3.  3.Chiba, K., Okazaki, N., Motoi, M. & Osaki, M. Analysis of Subchondral Bone Microstructure by HR-PQCT: A New Imaging Technique for Knee OA. Osteoarthr. Cartil. 25, S261–S262, DOI: 10.1016/j.joca.2017.02.440 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.joca.2017.02.440&link_type=DOI) 

4.  4.Shiraishi, K. et al. In vivo analysis of subchondral trabecular bone in patients with osteoarthritis of the knee using second-generation high-resolution peripheral quantitative computed tomography (HR-pQCT). Bone 132, 115155, DOI: 10.1016/j.bone.2019.115155 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2019.115155&link_type=DOI) 

5.  5.Sada, K. et al. Bone Mineral Density and Microstructure of the Elbow in Baseball Pitchers: An Analysis by Second-Generation HR-pQCT. J. Clin. Densitom. 23, 322–328, DOI: 10.1016/j.jocd.2019.03.001 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jocd.2019.03.001&link_type=DOI) 

6.  6.Whittier, D. et al. Guidelines for the assessment of bone density and microarchitecture in vivo using high-resolution peripheral quantitative computed tomography. Osteoporos. Int. 31, 1607–1627, DOI: 10.1007/s00198-020-05438-5 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00198-020-05438-5&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

7.  7.Kroker, A. et al. Quantitative in vivo assessment of bone microarchitecture in the human knee using HR-pQCT. Bone 97, 43–48, DOI: 10.1016/j.bone.2016.12.015 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2016.12.015&link_type=DOI) 

8.  8.Kroker, A., Manske, S. L., Mohtadi, N. & Boyd, S. K. A study of the relationship between meniscal injury and bone microarchitecture in ACL reconstructed knees. The Knee 25, 746–756, DOI: 10.1016/j.knee.2018.07.004 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.knee.2018.07.004&link_type=DOI) 

9.  9.Kroker, A., Bhatla, J. L., Emery, C. A., Manske, S. L. & Boyd, S. K. Subchondral bone microarchitecture in ACL reconstructed knees of young women: A comparison with contralateral and uninjured control knees. Bone 111, 1–8, DOI: 10.1016/j.bone.2018.03.006 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2018.03.006&link_type=DOI) 

10. 10.Kroker, A. et al. Longitudinal Effects of Acute Anterior Cruciate Ligament Tears on Peri-Articular Bone in Human Knees Within the First Year of Injury. J. Orthop. Res. 37, 2325–2336, DOI: 10.1002/jor.24410 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jor.24410&link_type=DOI) 

11. 11.Buie, H. R., Campbell, G. M., Klinck, R. J., MacNeil, J. A. & Boyd, S. K. Automatic segmentation of cortical and trabecular compartments based on a dual threshold technique for in vivo micro-CT bone analysis. Bone 41, 505–515, DOI: 10.1016/j.bone.2007.07.007 (2007).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2007.07.007&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=17693147&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

12. 12.Burghardt, A. J., Buie, H. R., Laib, A., Majumdar, S. & Boyd, S. K. Reproducibility of direct quantitative measures of cortical bone microarchitecture of the distal radius and tibia by HR-pQCT. Bone 47, 519–528, DOI: 10.1016/j.bone.2010.05.034 (2010).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2010.05.034&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=20561906&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

13. 13.Whittier, D., Mudryk, A., Vandergaag, I., Burt, L. & Boyd, S. Optimizing HR-pQCT workflow: a comparison of bias and precision error for quantitative bone analysis. Osteoporos. Int. 31, 567–576, DOI: 10.1007/s00198-019-05214-0 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00198-019-05214-0&link_type=DOI) 

14. 14.Conze, P.-H., Andrade-Miranda, G., Singh, V. K., Jaouen, V. & Visvikis, D. Current and Emerging Trends in Medical Image Segmentation With Deep Learning. IEEE Transactions on Radiat. Plasma Med. Sci. 7, 545–569, DOI: 10.1109/TRPMS.2023.3265863 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TRPMS.2023.3265863&link_type=DOI) 

15. 15.1.  Paragios, N., 
    2.  Duncan, J. & 
    3.  Ayache, N.
    
    Bach Cuadra, M., Duay, V. & Thiran, J.-P. Atlas-based Segmentation. In Paragios, N., Duncan, J. & Ayache, N. (eds.) Handbook of Biomedical Imaging, 221–244, DOI: 10.1007/978-0-387-09749-7_12 (Springer US, Boston, MA, 2015).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/978-0-387-09749-7_12&link_type=DOI) 

16. 16.Siddique, N., Paheding, S., Elkin, C. P. & Devabhaktuni, V. U-Net and Its Variants for Medical Image Segmentation: A Review of Theory and Applications. IEEE Access 9, 82031–82057, DOI: 10.1109/ACCESS.2021.3086020 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/ACCESS.2021.3086020&link_type=DOI) 

17. 17.Folle, L. et al. Deep learning methods allow fully automated segmentation of metacarpal bones to quantify volumetric bone mineral density. Sci. Reports 11, 9697, DOI: 10.1038/s41598-021-89111-9 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-021-89111-9&link_type=DOI) 

18. 18.Neeteson, N. J., Besler, B. A., Whittier, D. E. & Boyd, S. K. Automatic segmentation of trabecular and cortical compartments in HR-pQCT images using an embedding-predicting U-Net and morphological post-processing. Sci. Reports 13, 252, DOI: 10.1038/s41598-022-27350-0 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41598-022-27350-0&link_type=DOI) 

19. 19.Xue, Y., Xu, T., Zhang, H., Long, R. & Huang, X. SegAN: Adversarial Network with Multi-scale $L_1$ Loss for Medical Image Segmentation. Neuroinformatics 16, 383–392, DOI: 10.1007/s12021-018-9377-x (2018). ArXiv:1706.01805 [cs].
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s12021-018-9377-x&link_type=DOI) 

20. 20.Hatamizadeh, A. et al. UNETR: Transformers for 3D Medical Image Segmentation (2021). ArXiv:2103.10504 [cs, eess].
    
    
21. 21.Pedoia, V., Majumdar, S. & Link, T. M. Segmentation of joint and musculoskeletal tissue in the study of arthritis. Magn. Reson. Mater. Physics, Biol. Medicine 29, 207–221, DOI: 10.1007/s10334-016-0532-9 (2016).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s10334-016-0532-9&link_type=DOI) 

22. 22.Pauchard, Y., Liphardt, A.-M., Macdonald, H. M., Hanley, D. A. & Boyd, S. K. Quality control for bone quality parameters affected by subject motion in high-resolution peripheral quantitative computed tomography. Bone 50, 1304–1310, DOI: 10.1016/j.bone.2012.03.003 (2012).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2012.03.003&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22445540&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

23. 23.Whittier, D. E., Burt, L. A., Hanley, D. A. & Boyd, S. K. Sex- and Site-Specific Reference Data for Bone Microarchitecture in Adults Measured Using Second-Generation HR-pQCT. J. Bone Miner. Res. 35, 2151–2158, DOI: 10.1002/jbmr.4114 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jbmr.4114&link_type=DOI) 

24. 24.Whittier, D. E. et al. Hip fractures in older adults are associated with the low density bone phenotype and heterogeneous deterioration of bone microarchitecture. J. Bone Miner. Res. jbmr.4663, DOI: 10.1002/jbmr.4663 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1002/jbmr.4663&link_type=DOI) 

25. 25.Keen, C. E., Whittier, D. E., Firminger, C. R., Edwards, W. B. & Boyd, S. K. Validation of Bone Density and Microarchi-tecture Measurements of the Load-Bearing Femur in the Human Knee Obtained Using In Vivo HR-pQCT Protocol. J. Clin. Densitom. S1094695021000056, DOI: 10.1016/j.jocd.2021.01.004 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jocd.2021.01.004&link_type=DOI) 

26. 26.Glüer, C. C. et al. Accurate assessment of precision errors: How to measure the reproducibility of bone densitometry techniques. Osteoporos. Int. 5, 262–270, DOI: 10.1007/BF01774016 (1995).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/BF01774016&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=7492865&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1995RQ04400008&link_type=ISI) 

27. 27.MONAI Consortium. MONAI: Medical Open Network for AI, DOI: 10.5281/ZENODO.4323058 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.4323058&link_type=DOI) 

28. 28.Ronneberger, O., Fischer, P. & Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. arxiv:1505.04597 [cs] (2015). ArXiv: 1505.04597.
    
    
29. 29.Zhou, Z., Siddiquee, M. M. R., Tajbakhsh, N. & Liang, J. UNet++: A Nested U-Net Architecture for Medical Image Segmentation (2018). ArXiv:1807.10165 [cs, eess, stat].
    
    
30. 30.Myronenko, A. 3D MRI brain tumor segmentation using autoencoder regularization (2018). ArXiv:1810.11654 [cs, q-bio].
    
    
31. 31.Paszke, A. et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32, 8024–8035 (Curran Associates, Inc., 2019).
    
    
32. 32.1.  Lindzey, G. & 
    2.  Aronson, E.
    
    Mosteller, F. & Tukey, J. W. Data analysis, including statistics. In Lindzey, G. & Aronson, E. (eds.) Handbook of Social Psychology, Vol. 2 (Addison-Wesley, 1968).
    
    
33. 33.Falcon, W. et al. PyTorchLightning/pytorch-lightning: 0.7.6 release, DOI: 10.5281/ZENODO.3828935 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.3828935&link_type=DOI) 

34. 34.Loshchilov, I. & Hutter, F. Decoupled Weight Decay Regularization. arxiv:1711.05101 [cs, math] (2019). ArXiv: 1711.05101.
    
    
35. 35.Dice, L. R. Measures of the Amount of Ecologic Association Between Species. Ecology 26, 297–302, DOI: 10.2307/1932409 (1945).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.2307/1932409&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1945XX03600006&link_type=ISI) 

36. 36.Ganaie, M., Hu, M., Malik, A., Tanveer, M. & Suganthan, P. Ensemble deep learning: A review. Eng. Appl. Artif. Intell. 115, 105151, DOI: 10.1016/j.engappai.2022.105151 (2022).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.engappai.2022.105151&link_type=DOI) 

37. 37.Van der Walt, S. et al. scikit-image: image processing in Python. PeerJ 2, e453 (2014). Publisher: PeerJ Inc.
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.7717/peerj.453&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=25024921&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

38. 38.Harris, C. R. et al. Array programming with NumPy. Nature 585, 357–362, DOI: 10.1038/s41586-020-2649-2 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1038/s41586-020-2649-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=32939066&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

39. 39.Warfield, S. K., Zou, K. H. & Wells, W. M. Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE transactions on medical imaging 23, 903–921, DOI: 10.1109/TMI.2004.828354 (2004).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/TMI.2004.828354&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=15250643&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000222428100013&link_type=ISI) 

40. 40.Lowekamp, B. C., Chen, D. T., Ibáñez, L. & Blezek, D. The Design of SimpleITK. Front. Neuroinformatics 7, DOI: 10.3389/fninf.2013.00045 (2013).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.3389/fninf.2013.00045&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=24416015&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

41. 41.D’Agostino, R. & Pearson, E. S. Tests for Departure from Normality. Empirical Results for the Distributions of b 2 and √b 1. Biometrika 60, 613, DOI: 10.2307/2335012 (1973).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/60.3.613&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=A1973R829900021&link_type=ISI) 

42. 42.Conover, W. J. Practical nonparametric statistics. Wiley series in probability and statistics. Applied probability and statistics section (Wiley, New York, 1999), 3rd ed edn.
    
    
43. 43.Benjamini, Y., Krieger, A. M. & Yekutieli, D. Adaptive linear step-up procedures that control the false discovery rate. Biometrika 93, 491–507, DOI: 10.1093/biomet/93.3.491 (2006).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/biomet/93.3.491&link_type=DOI) 
    
    [Web of Science](http://medrxiv.org/lookup/external-ref?access_num=000241271700001&link_type=ISI) 

44. 44.Sullivan, C. & Kaszynski, A. PyVista: 3D plotting and mesh analysis through a streamlined interface for the Visualization Toolkit (VTK). J. Open Source Softw. 4, 1450, DOI: 10.21105/joss.01450 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21105/joss.01450&link_type=DOI) 

45. 45.Hunter, J. D. Matplotlib: A 2D Graphics Environment. Comput. Sci. & Eng. 9, 90–95, DOI: 10.1109/MCSE.2007.55 (2007).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1109/MCSE.2007.55&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=WOS:00024566&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

46. 46.pandas development team, T. pandas-dev/pandas: Pandas, DOI: 10.5281/ZENODO.3509134 (2024).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.5281/ZENODO.3509134&link_type=DOI) 

47. 47.Waskom, M. seaborn: statistical data visualization. J. Open Source Softw. 6, 3021, DOI: 10.21105/joss.03021 (2021).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.21105/joss.03021&link_type=DOI) 

48. 48.Okazaki, N., Chiba, K., Motoi, M. & Osaki, M. Analysis of Subchondral Bone Microstructure by HR-PQCT: Relationship With The Severity of Knee Osteoarthritis and Alignments of Lower Extremities. Osteoarthr. Cartil. 25, S263, DOI: 10.1016/j.joca.2017.02.442 (2017).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.joca.2017.02.442&link_type=DOI) 

49. 49.Rapagna, S. et al. Quantification of human bone microarchitecture damage in press-fit femoral knee implantation using HR-pQCT and digital volume correlation. J. Mech. Behav. Biomed. Mater. 97, 278–287, DOI: 10.1016/j.jmbbm.2019.04.054 (2019).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.jmbbm.2019.04.054&link_type=DOI) 

50. 50.Kemp, T. et al. Longitudinal bone microarchitectural changes are best detected using image registration. Osteoporos. Int. 31, 1995–2005, DOI: 10.1007/s00198-020-05449-2 (2020).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s00198-020-05449-2&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=http://www.n&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

51. 51.Whittier, D. E. et al. A multi-stack registration technique to improve measurement accuracy and precision across longitudinal HR-pQCT scans. Bone 176, 116893, DOI: 10.1016/j.bone.2023.116893 (2023).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.bone.2023.116893&link_type=DOI) 

52. 52.Hildebrand, T. & Rüegsegger, P. A new method for the model-independent assessment of thickness in three-dimensional images. J. Microsc. 185, 67–75, DOI: 10.1046/j.1365-2818.1997.1340694.x (1997).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1046/j.1365-2818.1997.1340694.x&link_type=DOI) 

53. 53.Treece, G., Poole, K. & Gee, A. Imaging the femoral cortex: Thickness, density and mass from clinical CT. Med. Image Analysis 16, 952–965, DOI: 10.1016/j.media.2012.02.008 (2012).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1016/j.media.2012.02.008&link_type=DOI) 
    
    [PubMed](http://medrxiv.org/lookup/external-ref?access_num=22465079&link_type=MED&atom=%2Fmedrxiv%2Fearly%2F2024%2F05%2F21%2F2024.05.20.24307643.atom) 

54. 54.Zhang, Y. & Yang, Q. An overview of multi-task learning. Natl. Sci. Rev. 5, 30–43, DOI: 10.1093/nsr/nwx105 (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1093/nsr/nwx105&link_type=DOI) 

55. 55.Thung, K.-H. & Wee, C.-Y. A brief review on multi-task learning. Multimed. Tools Appl. 77, 29705–29725, DOI: 10.1007/s11042-018-6463-x (2018).
    
    [CrossRef](http://medrxiv.org/lookup/external-ref?access_num=10.1007/s11042-018-6463-x&link_type=DOI)