CMC

Computer-mediated communication (CMC) is human communication that occurs through the use of two or more electronic devices.

Logo CMC Corpora

A corpus is a large and structured set of texts created for linguistic research. Ideally, there are also meta data and annotations.

This page has the aim of listing all CMC corpora freely available for (linguistic) research with either texts from Switzerland or compiled by Swiss researchers. It was created and is maintained in a cooperation between Elisabeth Stark, Simone Ueberwasser and the Zurich Center for Linguistics. Do you want to inform us about your CMC corpus? Send us an email.

For CMC corpora without a link to Switzerland, please check the CLARIN page on CMC corpora.

sms4science

Data Text messages (SMS)
0.5 Mio tokens
Languages Swiss German, German, French, Italian, Romansh
Collected 2009
Availability Freely available for linguistic research, no access for commercial use.

In 2009 under the lead of Elisabeth Stark around 25'000 SMS were collected from the Swiss population. From 2011 to 2015, the data were investigated in the frame of the SNSF Sinergia project sms4science. For the French data, sister projects are available in Belgium, France and Canada (cf. www.sms4science.org)

What's up, Switzerland?

Data WhatsApp messages
Size 5 Mio tokens
Languages Swiss German, German, French, Italian, Romansh
Collected 2014
Availability Available for scientific (not commercial) research.

As a followup-project so sms4science, Elisabeth Stark and her team collected 216 WhatsApp chats in 2014. Between 2016-2020 they were investigated in the SNSF Sinergia project "What's up, Switzerland?".

SB-CH: A Swiss German Corpus with Sentiment Annotations

Data Facebook, Chats,
Size 203,242 Swiss German phrases with 981,247 tokens
Languages Swiss German
Collected 2010-2017
Availability Creative Commons License CC BY 4.0 by spinningbytes.

Check Ralf Grubenmann, Don Tuggener, Pius von Daniken, Jan Deriu, Mark Cieliebak (2018): SB-CH: A Swiss German Corpus with Sentiment Annotations. for more information.