CMC
Computer-mediated communication (CMC) is human communication that occurs through the use of two or more electronic devices.
Corpora
A corpus is a large and structured set of texts created for linguistic research. Ideally, there are also meta data and annotations.
This page has the aim of listing all CMC corpora freely available for (linguistic) research with either texts from Switzerland or compiled by Swiss researchers. It was created and is maintained in a cooperation between Elisabeth Stark, Simone Ueberwasser and the Zurich Center for Linguistics. Do you want to inform us about your CMC corpus? Send us an email.
For CMC corpora without a link to Switzerland, please check the CLARIN page on CMC corpora.
sms4science
Data | Text messages (SMS) |
---|---|
0.5 Mio tokens | |
Languages | Swiss German, German, French, Italian, Romansh |
Collected | 2009 |
Availability | Freely available for linguistic research, no access for commercial use. |
In 2009 under the lead of Elisabeth Stark around 25'000 SMS were collected from the Swiss population. From 2011 to 2015, the data were investigated in the frame of the SNSF Sinergia project sms4science. For the French data, sister projects are available in Belgium, France and Canada (cf. www.sms4science.org)
What's up, Switzerland?
Data | WhatsApp messages |
---|---|
Size | 5 Mio tokens |
Languages | Swiss German, German, French, Italian, Romansh |
Collected | 2014 |
Availability | Available for scientific (not commercial) research. |
As a followup-project so sms4science, Elisabeth Stark and her team collected 216 WhatsApp chats in 2014. Between 2016-2020 they were investigated in the SNSF Sinergia project "What's up, Switzerland?".
SB-CH: A Swiss German Corpus with Sentiment Annotations
Data | Facebook, Chats, |
---|---|
Size | 203,242 Swiss German phrases with 981,247 tokens |
Languages | Swiss German |
Collected | 2010-2017 |
Availability | Creative Commons License CC BY 4.0 by spinningbytes. |
Check Ralf Grubenmann, Don Tuggener, Pius von Daniken, Jan Deriu, Mark Cieliebak (2018): SB-CH: A Swiss German Corpus with Sentiment Annotations. for more information.