GTA-NarrativeTraj: Language-Aware Trajectory Prediction from GPS and Dialogue in an Open-World Simulator

Sapeha, Anastasiia; Sariiev, Eduard; Sapeha, Mykyta; Kovan, Ibrahim; Rajanayagam, Subashkumar; Karpov, Kirill; Gering, Maksim; Siemens, Dmitry

doi:<a href=

10.25673/122853">

Proceedings of International Conference on Applied Innovation in IT
2025/12/22, Volume 13, Issue 5, pp.193-199

GTA-NarrativeTraj: Language-Aware Trajectory Prediction from GPS and Dialogue in an Open-World Simulator

Anastasiia Sapeha, Eduard Sariiev, Mykyta Sapeha, Ibrahim Kovan, Subashkumar Rajanayagam, Kirill Karpov, Maksim Gering, Dmitry Kachan and Eduard Siemens

Abstract: GTA–NarrativeTraj is presented as a simulation framework and dataset for Grand Theft Auto V (GTA V) that couples spatiotemporal trajectories with in-game narrative signals (speech audio, subtitles, speaker identity). A ScriptHookVDotNet-based logger records world coordinates and vehicle state at ≥ 1Hz and captures dialogue events (subtitle text, speaker tags, soundbank IDs) during story-mode play. The released dataset provides tightly time-aligned GPS-like traces and the complete dialogue stream for full playthroughs, yielding a resource in which coordinates, audio, and text jointly form a narrative constraining and explaining agent motion. The task of narrative-grounded mobility prediction is introduced: given recent GPS and ongoing utterances, infer the agent’s near-term path and next waypoint while recovering salient context such as interlocutors (who is speaking to whom), scene-level locations, and dialogue-implicated points of interest. The dataset serves as ground truth for these tasks by pairing GPS histories with contemporaneous narrative cues and future motion outcomes - enabling models that reason simultaneously over movement, interlocutors, and places. Reproducibility, offset stability, and licensing are discussed; the release includes code, logs, transcripts, and time-aligned audio features, while excluding raw copyrighted assets.

Keywords: Narrative Trajectory Prediction, Language-Aware Forecasting, Next Location Prediction, Multimodal, GPS, Audio-to-Text, Speech-to-Text, Subtitles Alignment, Dialogue Grounding, Spatio-Temporal Knowledge Graph (ST-NKG), Map Matching, Road Graph, Synthetic Dataset, Urban Environment Simulation, Intent Extraction, Named-Entity Recognition (NER), Ontology, Grand Theft Auto V (GTA V).

DOI: 10.25673/122853

Download: PDF

References:

S. R. Richter, V. Vineet, S. Roth, and V. Koltun, “Playing for data: Ground truth from computer games,” in ECCV, 2016.
S. R. Richter, Z. Hayder, and V. Koltun, “Playing for benchmarks,” in ICCV, 2017.
B. Hurl, K. Czarnecki, and S. Waslander, “Precise synthetic image and lidar (PRESIL) dataset for autonomous vehicle perception,” in IEEE Intelligent Vehicles Symposium (IV), 2019.
D. Ott et al., “DeepGTAV: A system to easily extract ground truth from GTAV,” 2018. [Online]. Available: https://github.com/David0tt/DeepGTAV.
A. Dosovitskiy, G. Ros, F. Codevilla, A. Lopez, and V. Koltun, “CARLA: An open urban driving simulator,” in CoRL (PMLR), 2017.
G. Ros, L. Sellart, J. Materzynska, D. Vazquez, and A. M. Lopez, “The SYNTHIA dataset: A large collection of synthetic images for semantic segmentation of urban scenes,” in CVPR, 2016.
P. Anderson, Q. Wu, D. Teney, J. Bruce, M. Johnson, N. Sünderhauf, I. Reid, S. Gould, and A. van den Hengel, “Vision-and-language navigation: Interpreting visually-grounded navigation instructions in real environments,” in CVPR, 2018.
H. Chen, A. Suhr, D. Misra, N. Snavely, and Y. Artzi, “Touchdown: Natural language navigation and spatial reasoning in visual street environments,” in CVPR, 2019.
K. M. Hermann, M. Malinowski, P. Mirowski et al., “Learning to follow directions in street view,” in AAAI, 2020.
A. B. Vasudevan, D. Dai, and L. V. Gool, “Talk2Nav: Long-range vision-and-language navigation with dual attention and spatial memory,” International Journal of Computer Vision, 2021.
T. Deruyttere, S. Vandenhende, D. Grujicic, L. V. Gool, and M.-F. Moens, “Talk2Car: Taking control of your self-driving car,” in EMNLP-IJCNLP, 2019.
Y.-H. L. Kuo et al., “Trajectory prediction with linguistic representations,” arXiv:2110.09741, 2022.
I. Bae et al., “Social reasoning-aware trajectory prediction via multimodal language model (LMTraj),” 2024. [Online]. Available: https://github.com/InhwanBae/LMTrajectory.
J. Xia et al., “Language-driven interactive traffic trajectory generation,” in NeurIPS, 2024.
W. J. Chang et al., “LangTraj: Diffusion model and dataset for language-conditioned trajectory simulation,” arXiv:2504.11521, 2025.
T. Afouras, J. S. Chung, and A. Zisserman, “LRS3-TED: A large-scale dataset for visual speech recognition,” 2018.
R. Sanabria, O. Caglayan, S. Palaskar, D. Elliott, L. Barrault, L. Specia, and F. Metze, “How2: A large-scale dataset for multimodal language understanding,” arXiv:1811.00347, 2018.
L. Banarescu, C. Bonial, S. Cai, M. Georgescu, K. Griffitt, U. Hermjakob, K. Knight, P. Koehn, M. Palmer, and N. Schneider, “Abstract meaning representation for sembanking,” in Proceedings of the 7th Linguistic Annotation Workshop and Interoperability with Discourse, 2013.
M. Palmer, D. Gildea, and P. Kingsbury, “The Proposition Bank: An annotated corpus of semantic roles,” Computational Linguistics, vol. 31, no. 1, pp. 71–106, 2005.
J. Pustejovsky, J. M. Castaño, R. Ingria, R. Saurí, R. Gaizauskas, A. Setzer, G. Katz, and I. Mani, “TimeML: Robust specification of event and temporal expressions in text,” in AAAI Spring Symposium on New Directions in Question Answering, 2003.
J. Strötgen and M. Gertz, “HeidelTime: High quality rule-based extraction and normalization of temporal expressions,” in SemEval, 2010, pp. 321–324.

HOME

       - Conference
       - Journal
       - Paper Submission to Conference
       - Paper Submission to Journal
       - Fee Payment
       - For Authors
       - For Reviewers
       - Important Dates
       - Conference Committee
       - Editorial Board
       - Reviewers
       - Last Proceeding

PROCEEDINGS

       - Volume 14, Issue 1 (ICAIIT 2026)
       - Volume 13, Issue 5 (ICAIIT 2025)
       - Volume 13, Issue 4 (ICAIIT 2025)
       - Volume 13, Issue 3 (ICAIIT 2025)
       - Volume 13, Issue 2 (ICAIIT 2025)
       - Volume 13, Issue 1 (ICAIIT 2025)
       - Volume 12, Issue 2 (ICAIIT 2024)
       - Volume 12, Issue 1 (ICAIIT 2024)
       - Volume 11, Issue 2 (ICAIIT 2023)
       - Volume 11, Issue 1 (ICAIIT 2023)
       - Volume 10, Issue 1 (ICAIIT 2022)
       - Volume 9, Issue 1 (ICAIIT 2021)
       - Volume 8, Issue 1 (ICAIIT 2020)
       - Volume 7, Issue 1 (ICAIIT 2019)
       - Volume 7, Issue 2 (ICAIIT 2019)
       - Volume 6, Issue 1 (ICAIIT 2018)
       - Volume 5, Issue 1 (ICAIIT 2017)
       - Volume 4, Issue 1 (ICAIIT 2016)
       - Volume 3, Issue 1 (ICAIIT 2015)
       - Volume 2, Issue 1 (ICAIIT 2014)
       - Volume 1, Issue 1 (ICAIIT 2013)

LAST CONFERENCE

       ICAIIT 2026
         - Photos
         - Reports

    PAST CONFERENCES

ETHICS IN PUBLICATIONS

ACCOMODATION

CONTACT US

Proceedings of the International Conference on Applied Innovations in IT by Anhalt University of Applied Sciences is licensed under CC BY-SA 4.0

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License

           ISSN 2199-8876
           Publisher: Edition Hochschule Anhalt
           Location: Anhalt University of Applied Sciences
           Email: leiterin.hsb@hs-anhalt.de
           Phone: +49 (0) 3496 67 5611
           Address: Building 01 - Red Building, Top floor, Room 425, Bernburger Str. 55, D-06366 Köthen, Germany

Except where otherwise noted, all works and proceedings on this site is licensed under Creative Commons Attribution-ShareAlike 4.0 International License.