Proceedings of International Conference on Applied Innovation in IT
2025/04/26, Volume 13, Issue 1, pp.155-160
Calibration of the Open-Vocabulary Model YOLO-World by Using Temperature Scaling
Max Andreas Ingrisch, Subashkumar Rajanayagam, Ingo Chmielewski and Stefan Twieg Abstract: In many areas of the real world, such as robotics and autonomous driving, deep learning models are an indispensable tool for detecting objects in the environment. In recent years, supervised models such as YOLO or Faster R-CNN have been increasingly used for this purpose. One disadvantage of these models is that they can only detect objects within a closed vocabulary. To overcome this limitation, research is currently being conducted into models that can also detect objects outside the known classes of the training data set. A model is therefore trained with base classes and can recognize novel, unseen classes – this is referred to as open-vocabulary detection (OVD). Novel models such as YOLO-World offer a solution to this problem, but they tend to over- or underestimate when calculating confidence values and are therefore often poorly calibrated. However, reliable determination of confidence values is a crucial factor for the use of these models in the real world to ensure safety and trustworthiness. To address this problem, this paper investigates the influence of the calibration method temperature scaling on the OVD model YOLO-World. The optimal T-value is determined by 2 calibration data sets (Pascal VOC and Open Images V7) and then evaluated on the LVIS minival dataset. The results show that the use of temperature scaling improved the Expected Calibration Error (ECE) from 6.78% to 2.31%, but the model still tends to overestimate the confidence values in some bins.
Keywords: Calibration, YOLO-World, Temperature Scaling, Expected Calibration Error, Open-Vocabulary Detection.
DOI: Under Indexing
Download: PDF
References:
- S. Twieg and R. Menghani, "Analysis and implementation of an efficient traffic sign recognition based on YOLO and SIFT for TurtleBot3 robot," 2023, [Online]. Available: http://dx.doi.org/10.25673/112993.
- J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, "You only look once: Unified, real-time object detection," in 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016, pp. 779-788.
- S. Ren, K. He, R. Girshick, and J. Sun, "Faster R-CNN: Towards real-time object detection with region proposal networks," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 39, no. 6, pp. 1137-1149, 2017.
- C. Zhu and L. Chen, "A survey on open-vocabulary detection and segmentation: Past, present, and future," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 12, pp. 8954-8975, 2024.
- A. Radford et al., "Learning transferable visual models from natural language supervision," 2021, [Online]. Available: https://arxiv.org/abs/2103.00020.
- T. Cheng et al., "YOLO-World: Real-time open-vocabulary object detection," in 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024, pp. 16901-16911.
- L. Yao et al., "DetCLIP: Dictionary-enriched visual-concept paralleled pretraining for open-world detection," in Advances in Neural Information Processing Systems, vol. 35, 2022, pp. 9125-9138, [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2022/file/3ba960559212691be13fa81d9e5e0047-Paper-Conference.pdf.
- S. Liu et al., "Grounding DINO: Marrying DINO with grounded pre-training for open-set object detection," 2024, [Online]. Available: https://arxiv.org/abs/2303.05499.
- F. Mumuni and A. Mumuni, "Segment anything model for automated image data annotation: Empirical studies using text prompts from Grounding DINO," 2024, [Online]. Available: https://arxiv.org/abs/2406.19057.
- J. Gawlikowski et al., "A survey of uncertainty in deep neural networks," 2022, [Online]. Available: https://arxiv.org/abs/2107.03342.
- C. Guo, G. Pleiss, Y. Sun, and K. Q. Weinberger, "On calibration of modern neural networks," 2017, [Online]. Available: https://arxiv.org/abs/1706.04599.
- W. LeVine et al., "Enabling calibration in the zero-shot inference of large vision-language models," 2023, [Online]. Available: https://arxiv.org/abs/2303.12748.
- M. Everingham et al., "The PASCAL Visual Object Classes challenge: A retrospective," International Journal of Computer Vision, vol. 111, no. 1, pp. 98-136, 2015.
- A. Kuznetsova et al., "The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale," arXiv:1811.00982, 2018.
- I. Krasin et al., "OpenImages: A public dataset for large-scale multi-label and multi-class image classification," 2017, [Online]. Available: https://storage.googleapis.com/openimages/web/index.html. [Accessed: 2-Jan-2025].
- A. Kamath et al., "MDETR - Modulated detection for end-to-end multi-modal understanding," in 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 1760-1770.
- Tencent AI Lab, "YOLO-World GitHub Repository," [Online]. Available: https://github.com/AILab-CVC/YOLO-World. [Accessed: 2-Jan-2025].
- T. Hirsch and B. Hofer, "The map metric in information retrieval fault localization," in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2023, pp. 1480-1491.
- P. Zhang and W. Su, "Statistical inference on recall, precision and average precision under random selection," in 2012 9th International Conference on Fuzzy Systems and Knowledge Discovery, 2012, pp. 1348-1352.
- S. Shao et al., "Objects365: A large-scale, high-quality dataset for object detection," in 2019 IEEE/CVF International Conference on Computer Vision (ICCV), 2019, pp. 8429-8438.
- A. Gupta, P. Dollar, and R. Girshick, "LVIS: A dataset for large vocabulary instance segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2019.
|

HOME

- Call for Papers
- Paper Submission
- For Authors
- For Reviewers
- Important Dates
- Conference Committee
- Editorial Board
- Reviewers
- Last Proceedings

PROCEEDINGS
-
Volume 13, Issue 1 (ICAIIT 2025)
-
Volume 12, Issue 2 (ICAIIT 2024)
-
Volume 12, Issue 1 (ICAIIT 2024)
-
Volume 11, Issue 2 (ICAIIT 2023)
-
Volume 11, Issue 1 (ICAIIT 2023)
-
Volume 10, Issue 1 (ICAIIT 2022)
-
Volume 9, Issue 1 (ICAIIT 2021)
-
Volume 8, Issue 1 (ICAIIT 2020)
-
Volume 7, Issue 1 (ICAIIT 2019)
-
Volume 7, Issue 2 (ICAIIT 2019)
-
Volume 6, Issue 1 (ICAIIT 2018)
-
Volume 5, Issue 1 (ICAIIT 2017)
-
Volume 4, Issue 1 (ICAIIT 2016)
-
Volume 3, Issue 1 (ICAIIT 2015)
-
Volume 2, Issue 1 (ICAIIT 2014)
-
Volume 1, Issue 1 (ICAIIT 2013)

PAST CONFERENCES
ICAIIT 2025
-
Photos
-
Reports
ICAIIT 2024
-
Photos
-
Reports
ICAIIT 2023
-
Photos
-
Reports
ICAIIT 2021
-
Photos
-
Reports
ICAIIT 2020
-
Photos
-
Reports
ICAIIT 2019
-
Photos
-
Reports
ICAIIT 2018
-
Photos
-
Reports
ETHICS IN PUBLICATIONS
ACCOMODATION
CONTACT US
|
|