Why Learn Japanese: The Role of Japanese Media
Introduction
(Research done by our member, Mahdi Austian) Our goal is to survey how prevalent anime culture is as the motivation for Japanese learners. We mined 50 reddit threads from r/LearnJapanese subreddit pertaining to reasons why people study Japanese. Afterward, we conducted exploratory analysis to quantify the presence of anime-related keywords in the top 10 comments of the threads. Such words were mentioned in 40% of the comments indicating that Japanese media and entertainment comprehension is a key factor for students learning Japanese.
Methods & Data
Data Collection
Using googlesearch-python package, we scraped google search results related to the query “reason learn japanese site:reddit.com”. From these results, we manually filtered out unrelated threads and narrowed down the number from 228 to 50. And with the threads, we used praw (a reddit API wrapper) to collect each of the top 10 original comments. If less than 10 comments, we took all the comments that were available.
Data Preprocessing
Before creating the corpus to analyze, we preprocessed the data by performing the following filters for the text:
- Removing URLs
- Removing Emails
- Removing Non-Alphanumeric Characters
Remove Stopwords
To avoid dilution from irrelevant English words, we used the nltk stopwords list plus the additional set of stopwords below. We removed all the stopwords from our dataset before the analysis.
["wanted", "really", "learn", "language", "japanese", "want", "reason", "why", 'interest', 'learning', 'year', 'started', 'english', "English", "love", "now", 'reddit', 'www', 'http', 'https', 'learnjapanese', 'get', 'got', 'also', 'like', 'since', 'though', 'comment', 'com', 'time', 'know', 'motivation', 'hope', 'year', 'study', 'people', 'would', 'think', 'thing', 'never', 'could', 'studying', 'one', 'day', 'ago', 'new', 'motivate', 'language', 'something', 'interested', 'awesome', 'good']
Lemmatization
Lemmatization considers the context and converts the word to its standardized base form (Stanford NLP Group, n.d.) . This allows for focus on the meaning of the words, preventing reduced frequency due their multiple variations.
Example:
- am, are, is => be
- car, cars, car's, cars' => car
Analysis
List of Keywords
We used this list of keywords to determine whether a comment involves Japanese media and entertainment as a learning motivation as well as other categories
- To enjoy Japanese media: [anime, manga, game, novel, otaku, weaboo, weeb, music, media]
- For travel in Japan: [travel, culture, food]
- To live in Japan: [work, live, dating, marry]
- To communicate with closed ones: [relationship, friends, wife, girlfriend, husband, boyfriend, mother, father, parents, grandparents, grandfather, grandmother, relatives]
N-Grams
N-grams are a continuous sequence of N-words. We wanted to investigate the top most popular combination of words to capture further insights on the motivations to learn Japanese.
Example of N-Grams:
- San Francisco (is a 2-gram/bigram)
- The Three Musketeers (is a 3-gram/trigram)
- She stood up slowly (is a 4-gram)
Results
Conclusion
Of the 432 comments in the reddit threads, 204 of them contained at least one of the keywords related to Japanese media and entertainment. This is more than double the other common reasons like studying Japanese to travel in Japan, to live in Japan, or to communicate with closed ones in Japanese. Particularly, the most common word combinations in the bi- and tri-gram bar charts support the high frequency we observed. In conclusion, Japanese media like anime can be identified as a significant reason for why people learn Japanese. Further exploration can be directed into figuring if similar motivations exist learning other languages.
References
Andybywire. (2020). NLP Text Analysis. NLP Text Analysis. https://github.com/andybywire/nlp-text-analysis/blob/master/text-analytics.ipynb
Stanford NLP Group. (n.d.). Stemming and lemmatization. Stanford NLP Group. Retrieved September 3, 2022, from https://nlp.stanford.edu/IR-book/html/htmledition/stemming-and-lemmatization-1.html
50 Threads from Reddit: https://drive.google.com/file/d/1CLjYtEHWgcgJRq889YukrL-S1fDrtqa0/view?usp=sharing