View on GitHub

Vocal3D

Library for the real-time reconstruction of human vocal folds based on a single shot structured light system.

LGDV Phoniatric Division

Python3 Language Standard GPL-3.0

Vocal3D

Vocal3D is a library for the real-time reconstruction of human vocal folds using a single shot structured light system. This is a joint work of the Chair of Visual Computing of the Friedrich-Alexander University of Erlangen-Nuremberg and the Phoniatric Division of the University Hospital Erlangen. This code accompanies the paper Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy.

Example

Dataset

The HLE Dataset described in the Paper is hosted here on GitHub!
We will add it to CERNs Zenodo Platform at a later stage.

Prerequisites

Make sure that you have a Python version >=3.5 installed. A CUDA capable GPU is recommended, but not necessary. However, getting PyTorch3D to work inside the Nurbs-Diff Module without CUDA may require some tinkering.

Installation

First, make sure that conda is installed and clone this repository, including its submodules:

git clone https://github.com/Henningson/Vocal3D.git
cd Vocal3D
git submodule update --init --recursive

Generate a new conda environment and activate it:

conda create --name Vocal3D python=3.8
conda activate Vocal3D

Then, install the necessary packages with

pip install opencv-python-headless matplotlib scikit-learn tqdm geomdl PyQt5 pyqtgraph ninja
pip install -U fvcore
conda install -c bottler nvidiacub
conda install -c conda-forge igl

Install pytorch and pytorch3D

conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
pip install "git+https://github.com/facebookresearch/pytorch3d.git@stable"

Download and install NURBS_Diff

git clone https://github.com/anjanadev96/NURBS_Diff.git
cd NURBS_Diff
python setup.py install

Download and install our fork of Victor Cornillères PyIGL Viewer.
It adds some shadercode that we use for a more domain specific visualization.

pip install git+git://github.com/Henningson/PyIGL_viewer.git

And finally install our lightweight C++ ARAP implementation.

cd PybindARAP
python setup.py install

Usage

An example video and calibration files are given in the assets folder. Unzip the example folder with unzip assets/sample_data.zip -d assets/ and run the example using

python source/main.py

Things to note

If you are using the supplied viewer, please note that the pipeline will generally be not as performant, as every step of the pipeline will be computed in succession (think of it more like a debug view). However, you will still be able to generate results in a matter of seconds, provided you do not use a PC that is barely able to run MS-DOS. We supply three Segmentation algorithms in this repository. One is especially designed for the silicone videos (that are included in the sample_data.zip file), then we include the one by Koc et al. and finally a Neural Segmentator based on a U-Net architecture. For first tests, we recommend the U-Net one, as it generally is the most robust (albeit the slowest) one. A pre-trained model is included in the assets folder.

Implementing your own segmentation algorithm

If you want to integrate your own segmentation algorithm into the viewer, we supply a BaseSegmentator class, from which your segmentation class may inherit. The necessary functions to override are marked by #TODO: Implement me. Please have a look at the supplied segmentation algorithms for some inspiration.

Limitations

Due to the moisture on top of human vocal folds, the mucuous tissue of in-vivo data often generates specular highlights that influences the performance of segmentation algorithms. Furthermore, the segmentation algorithm by Koc et al. that we supply in this repository requires well captured data, in which the glottis can be accurately differentiated from the vocal folds. As of right now, we are working on a system-specific segmentation algorithm that can deal with these harsh cases.

Citation

Please cite this paper, if this work helps you with your research:

@InProceedings{10.1007/978-3-031-16449-1_1,
  author="Henningson, Jann-Ole and Stamminger, Marc and D{\"o}llinger, Michael and Semmler, Marion",
  title="Real-Time 3D Reconstruction of Human Vocal Folds via High-Speed Laser-Endoscopy",
  booktitle="Medical Image Computing and Computer Assisted Intervention -- MICCAI 2022",
  year="2022",
  pages="3--12",
  isbn="978-3-031-16449-1"
}

A PDF of the Paper is included in the assets/ Folder of this repository. However, you can also find it here: Springer Link.