Scale up: analysing social media images using computer vision and machine learning

Tuomo Hiippala, University of Jyväskylä

Popular social networking services for sharing photographs such as Instagram provide researchers with an unprecedented view into everyday multimodal communication between individuals and their close and extended social networks. The volume of data in these services, however, is overwhelming: over 20 billion photographs are estimated to have been uploaded on Instagram since 2010 (Zappavigna 2016, 271). This undoubtedly reduces the capability of human analysts to search for meaningful patterns and to generalise about the data. Fortunately, recent advances in two areas of computer science – computer vision and machine learning – may be of assistance in this task.

In this extended abstract, I describe my forthcoming research project, which aims to develop corpus-driven methods for studying multimodal data retrieved from social media by leveraging computer vision and machine learning techniques. Within multimodal research and digital humanities, such techniques have been previously used to analyse street fashion photography (Podlasov & O’Halloran 2013), social media use in cities (Hochman & Manovich 2013, O’Halloran et al. 2014), photographic practices among locals and tourists (Cao & O’Halloran 2015), layout in printed documents (Hiippala 2015), and to support the annotation of multimodal corpora (Hiippala 2016).

I intend to develop the aforementioned methods by performing a number of case studies that examine a data set consisting of over 36 000 photographs retrieved from Instagram. The photographs were gathered from five different locations in Helsinki, which are described in Table 1. These locations are frequented by both locals and visitors, which enables the comparison of photographic and multimodal practices among these groups. The photographs were collected via the Instagram API in May and June 2016, just before new API restrictions that severely limited access to the data were implemented. In addition to the photographs, their captions, hashtags and metadata, up to 50 geographical coordinates and their upload dates were retrieved from the user’s recent photographs. This provides a geographical vector for each user, which provides the necessary data for assessing whether the user is a local or a tourist, thus enabling comparative analyses between these groups.


To showcase the ongoing work, I demonstrate techniques such as content-based image retrieval (CBIR), which look for patterns, shapes and textures in the images instead of relying on linguistic metadata such as descriptions or tags. While these techniques may enable studying photographic practices in a manner similar to queries in traditional corpus, there are several challenges that arise from the noisy nature of social media data, which is contaminated by memes, screenshots and non-photographs. To this end, I will also show how machine learning techniques can help to filter the data.

Cao, Y. & O’Halloran, K. L. (2015), `Learning human photo shooting patterns from large-scale community photo collections’, Multimedia Tools and Applications 74(24), 11499-11516.

Hiippala, T. (2015), Combining computer vision and multimodal analysis: a case study of layout symmetry in bilingual in-ight magazines, in J. Wildfeuer, ed., `Building Bridges for Multimodal Research: International Perspectives on Theories and Practices of Multimodal Analysis’, Peter Lang, Bern and New York, pp. 289-307.

Hiippala, T. (2016), Semi-automated annotation of page-based documents within the Genre and Multimodality framework, in `Proceedings of the 10th SIGHUM Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities’, Association for Computational Linguistics, Berlin, Germany, pp. 84-89.

Hochman, N. & Manovich, L. (2013), `Zooming into an Instagram City: Reading the local through social media’, First Monday 18(7).

O’Halloran, K. L., Chua, A. & Podlasov, A. (2014), The role of images in social media analytics: A multimodal digital humanities approach, in D. Machin, ed., `Visual Communication’, De Gruyter Mouton, Berlin, pp. 565-588.

Podlasov, A. & O’Halloran, K. L. (2013), Japanese street fashion for young people: A multimodal digital humanities approach for identifying sociocultural patterns and trends, in E. Djonov & S. Zhao, eds, `Critical Multimodal Studies of Popular Discourse’, Routledge, New York and London, pp. 71-90.

Zappavigna, M. (2016), `Social media photography: Construing subjectivity in Instagram images’, Visual Communication 15(3), 271-292.