Audio travel podcasts are a valuable source of information for travelers. Yet, travel is, in many ways, a visual experience and the lack of visuals in travel podcasts can make it difficult for listeners to fully understand the places being discussed. We present Crosscast: a system for automatically adding visuals to audio travel podcasts. Given an audio travel podcast as input, Crosscast uses natural language processing and text mining to identify geographic locations and descriptive keywords within the podcast transcript. Crosscast then uses these locations and keywords to automatically select relevant photos from online repositories and synchronizes their display to align with the audio narration. In a user evaluation, we find that 85.7% of the participants preferred Crosscast generated audio-visual travel podcasts compared to audio-only travel podcasts.
The following videos are generated by Crosscast:
Haijun Xia, Jennifer Jacobs, and Maneesh Agrawala. 2020. Crosscast: Adding Visuals to Audio Travel Podcasts. In Proceedings of the 33rd Annual ACM Symposium on User Interface Software and Technology (UIST '20). Association for Computing Machinery, New York, NY, USA, 735–746. DOI:https://doi.org/10.1145/3379337.3415882