tl;dr: Watch anime/Youtube all day with Migaku + study a little bit of grammar = profit. Don’t get a textbook lol. Read the Refold roadmap for a solid outline of the methodology.
1. The Landscape
Language-learning methdologies centering around Stephen Krashen’s input hypothesis have been slowly on the rise over the past several years—terms like “comprehensible input”, “mass immersion”, etc. come to mind. ~5-10 years ago, this method didn’t have so many people talking about it on the internet. At the time, I would point people mainly to 2 people/memeplexes: LingQ and Refold/MIA/AJATT. These are still good places to start if you’re unfamiliar with the topic. Before you commit to a language learning method—especially a method one where your effort is not transparently commensurate with reward—you’re going to need to be confident and motivated. So I recommend you go ahead and check out some of the videos I’ll link in these first 2 sections and go from there, seeing what vibes with you.
Refold, MIA, and AJATT are methods for learning languages—mainly Japanese, which, due to being one of the most difficult languages for English speakers, yet also the most enticing for nerds, it represents the vanguard of the language learning community and technologies made to aid you in your quest. A lot of the resources you’ll find and that I’ll link will be about Japanese, but it all basically applies to any other language. AJATT/MIA/Refold basically refer to the same thing just changing names and somewhat evolving and changing ownership over the course of a few years, to put it simply. It’s had a cult following on the internet for quite some time, and the core idea is just consuming as much Japanese media as possible, and your brain will automatically acquire the language. Everything else like brushing up on grammar is effectively a supplement to facilitate this process.
The Youtube channel Matt vs Japan is one of the OGs at actually disseminating the core ideas of AJATT (All Japanese All The Time). He then went on to make MIA (Mass Immersion Approach) which later got nuked for personal reasons, and basically became Refold (which is somehow monetized? I haven’t kept up with it, but it’s not necessary. What matters are the free resources they’ve laid out in their methodology, the online community surrounding it, and the resources therein. A mere methodology for consuming content fundamentally can’t really be monetized.
Over the years, many other people have started Youtube channels giving progress reports with these methodologies, and sometimes educating the youth on the method of watching anime all day and making Anki cards. I highly suggest you stick to this memeplex and not bother with any other content because there’s a high probability of it being misinformed and/or slop.
All that being said, any of these videos/youtube channels may or may not lay out every single bit of nuance behind comprehensible input. Especially as it pertains to you and your needs, preferences, strengths and weaknesses, etc. so I recommend you watch many of them critically, and plan your method for learning your target language.
Sentence Mining & Building Vocabulary
For your immersion, sentence mining or some way of tracking words and building your vocabulary, such as with a pre-made Anki deck, is the second most important thing after comprehensible input. (Allocating a disproportionate amount of your time on this component could even take you relatively far—but there are optimal ratios). Many tools exist for this. When you consume with a piece of media in your target language (reading, watching, or listening) you’ll have some kind of tool that records which words you know/don’t know, and, ideally, also recording which words you’re in the process of learning. If you’re unfamiliar with SRS I wrote a little bit about it in the post on Studying & Spaced Repetition, although that was specifically for non-language learning purposes (meanwhile language learning is probably the most common usecase of SRS).
When you sentence mine to create targeted sentence cards, you find an i+1 sentence when consuming media, meaning it contains a single unknown word. Note that the best pre-made decks also follow this format, gradually starting from bare-bones vocabulary and only using words used prior, gradually increasing the repertoire. After a certain point though, the word frequency lists these decks are based on lacks catering to your own domain specificity and probably becomes boring. (But, rest assured… there are decks with 10,000+ cards, there’s 3 volumes of Remembering the Kanji, etc. but don’t bother, lol).
Like consuming scientific content? You’ll learn scientific words. I doubt “hydrogen” or “neuron” is particularly high on the word frequency list, yet it might be something you’d benefit from learning ASAP due to the content you like to consume. You get the idea. There do exist non-general frequency lists and decks made out of them, such as one based on anime. (You can also compute frequency lists yourself out of a data set if you were so inclined, and you could also systematically find i+1 sentences out of them, provided you’re using something which tracks your vocabulary, which is what I discuss below. I’m getting ahead of myself.)
Once you reach the mark of 500~2000 known words (especially if we’re talking Kanji/Hanzi), you understand a significant amount of material and can start easily mining 5-10+ sentences daily.
(From: Forever a student: Chinese character frequency list - News articles)
Matt vs Japan - Sentence Cards vs Vocab Cards: In-Depth Comparison discusses what your flashcards will be looking like; it’s hotly debated which exact format is optimal. To some extent, it comes down to preference.
Anyways - for actually actually implementing sentence mining, you have 2 options:
-
Migaku. There are other software solutions I used to recommend but think lacks features at this point, LingQ (a good explanatory video), or, as a free alternative, Learning with Texts. All great pieces of software; I got quite proficient in German with LingQ, but I would use Migaku if I had to start over due to the fact you can use it on basically any website and use it in real-time on videos (Youtube, Netflix, or mpv/asbplayer + texthooker for any local file) without needing to import anything or whatever.
-
Pop-up dictionary like JMDict/Yomitan, + mpv + scripts for sentence-mining and importing to Anki. You might want to add Language Reactor to the mix too. This is much less versatile than tools like Migaku in almost every way, which aim to immediately create a setup like this, but it is free and open-source. If that is appealing to you, I recommend https://anacreondjt.gitlab.io/, https://animecards.site/, and https://tatsumoto-ren.github.io/ as a good place to start when creating your setup. Having your cards exist in the Anki ecosystem (i.e., your local files, where you can batch edit/analyze things, etc.) is quite nice though; interoperability between Anki and ‘Migaku Legacy’ is possible, however! It’s not something I’ve used personally though.
Anyways: no, you don’t actually need to use Anki or a structured SRS system if you immerse enough, especially if you have a word tracking tool to reduce cognitive strain, which is why I said that word tracking is essentially rather than SRS per se. When reading with a tool like pop-up-dictionary, especially if it has tracking like Migaku/LingQ, you’re still taking advantage of the testing effect by inevitably scratching your head trying to comprehend it. I still highly recommend supplemental SRS if you’re learning a difficult language, especially Japanese/Chinese. It’s very high reward:effort, especially if you have to memorize certain things, rather than just see what sticks in what you immerse with. That being said, Steve Kaufmann himself doesn’t use the SRS functionality in LingQ, and neither did I when using it for German, instead relying purely on input on LingQ for vocabulary retention. Was this gonna slide for something way more difficult like Chinese? This is much less likely. 5-10 words is quite fair for Japanese/Chinese.
In summary: as stated in the tl;dr, my method is basically just something like:
- 80% of your time you can allocate to the language should be spent doing active immersion with Migaku. Start with videos with subtitles in your target language, then move on to reading and audio-only after they become comprehensible (especially if they work better with your daily routine or the type of material you’d simply like to consume). Ideally get versatility in all 3, but technically if you only care about one of those 3 domains, nothing is stopping you.
- ~15% of your time reviewing SRS. Before you rely solely on sentence mining i+1 sentences for acquiring new words (if ever), you must get a pre-made flashcard deck of ~500-2k cards. Ankiweb has a ton of these, and I’ve linked some below.
- ~5% of your time learning grammar points or otherwise “studying” the language.
- Passive immersion/passive listening throughout the day (during times where you can partially put your attention to the language, such as while at the gym, while chopping wood, while carrying water, etc.) if possible is also quite valuable - even at the beginning stages.
Grammar + Other Resources
You still need a good book/online resource that you can consult every now and then for learning grammar, etc. as a supplement to your immersion. I think that—especially if you’re on the higher end of free time you can dedicate to learning a language—not allocating at least a little time to grammar is slightly ridiculous. It would be a wast eof time expecting to learn everything related to grammar only via immersion, especially if the language is a difficult one. Even just 5-30 minutes a day will go a long way and fill in the gaps for you.
- Start with the Refold Detailed Roadmap. I haven’t read this in forever but this should equip you with just about everything you need to know (terminology, priors, a more nuanced explanation behind the methodology) to approach other resources.
- Refold Community Dashboard - Notion has links to resources for a ton of languages and their respective sub-pages(though for some reason it’s kinda hard to navigate. For instance I have no idea how to get to the one for Mandarin, yet it links back to this page in directory structure at the top??)
- https://github.com/kelciour/mpv-scripts
- https://github.com/Ajatt-Tools
Japanese
- The best Anki Deck is probably ‘JP1K decks’ like this.
Get the version where the front of the card has the isolated target word though.
- Tae Kim is probably the best grammar guide out there. It’s short and to the point; very well-known in the community. Avoid books like Genki.
Traverse these links for finding useful stuff. For instance, sites like https://kitsunekko.net should you need to download subtitles. There’s all kinds of stuff you might want along your journey.
- Refold Japanese - Google Docs
- https://animecards.site/
- https://anacreondjt.gitlab.io/
- https://tatsumoto-ren.github.io/ is an excellent resource; I recommend looking at its many FAQ articles
(Mandarin) Chinese
-
Best introductory Anki Deck is probably either what comes with Migaku, or Refold Mandarin 1k.
-
Mnemonic-based methods, i.e. the “Marilyn Method” make a very good case for themselves. HanziHero is probably the best implementation of this. Mandarin Blueprint’s method is also quite good, but their course is really expensive; I wouldn’t really recommend it unless you’re literally just rich.
- I may implement this in the future. HanziHero is really cool in theory, but it lacks features: personally I found myself basically never using their mnemonics, especially due to the fact you can’t even customize them. So if you don’t vibe with them, too bad. You can’t choose ‘actors’ or ’locations’ that are personally meaningful to you (which Mandarin Blueprint does allow for). r- has to be Robinhood instead of Rick Owens, etc.. Additionally: they sometimes give components meanings that don’t really correspond with their literal etymology, or even contexts in which they’re being used. E.g., calling 艮 ‘silver’ (when it actually means “tough” and depicts a person looking back) or 行 ‘sandal’ (which does relate to locomotion, but it depicts a street intersection, not a sandal). To me, this is inexcusable; it feels like you’re bastardizing your understanding of the words’ real origins. The pictographic/ideographic origin of Chinese characters and the way the components interact to create characters is undoubtedly one of the most—probably the most—fascinating aspect of the language. (This is why I think it’s probably a shame that simplified Chinese was invented instead of sticking with traditional, but that’s a whole other take). Why lie to yourself about what characters are depicting? Can your memory not rest easy and place its trust in thousands of years of linguistic development? Hubris!
-
Refold Mandarin - Refold Mandarin Resources. This here is beautiful.
-
https://www.dong-chinese.com/. Has a good dictionary, and Youtube videos sorted by difficulty.
-
https://www.vidioma.com/ Youtube videos sorted by difficulty
-
https://hanzicraft.com for dictionary + character decomposition.
-
https://rtega.be/chmn/index.php?subpage=41 dictionary + etymology + (sometimes) mnemonics.
Ancient Languages
You can’t immerse yourself in a language that’s dead, now can you? Well, for Latin, there’s probably actually a decent ammount of audio out there considering it’s somewhat popular, and it has the excellent book Lingua Latina per se Illustrata (and sites that provide notes), which follows a ground-up approach to learning. (This method of pure-input slightly above your current level has recently been popularized with Dreaming Spanish - although it’s not text-only).
But if the language you want to learn is really dead, then you’re probably not aversive to doing it the old fashioned way anyways, and that’s just the “grammar-translation” method. There’s not exactly much of a purpose in training yourself to speak a language with no native speakers, or be “fluent” in it, per se. Work through textbooks, learning grammar and understanding more and more sentences, until you can read what you want. Our is an era of AI though; I imagine it will be able to generate a plethora of resources and translate things and even create audio/video not too long from now. Stuff like this video is something you might find interesting.
Ancient Greek
The 3 most distinguished dialects would probably be: Homeric, Attic (Plato, Aristotle) and Koine (Septuagint), but there’s also Ionic, Doric, and a few others. But if there’s one dialect to own the title “Ancient Greek”, it’d be Attic - most textbooks are about it, and it has enough similarities with the other dialects and with modern Greek, such that jumping from one to another wouldn’t be too difficult or anything.
- Lingua Gracae per se Illustrata is a WIP project that replicates LLPSI.
- Athenaze
- Hansen & Quinn, Greek: An Intensive Course + Answer key, notes
If you know Latin, maybe you could also get by with reading some material right off the bat, say in a biglotic text, since something like 10% of Latin words are of Greek origin, they have similar morphologies, and similar grammars. See Sihler, Andrew L. New Comparative Grammar of Greek and Latin or something like that.
Palī
It’s probably a good idea to learn Sanskrit first, which is a treasure anyways, and is very standardized: the Aṣṭādhyāyī being a spectacle of scholarly rigour, to a degree that took contemporary linguistics took a long time to even approach ints comprehensiveness.
- https://palistudies.blogspot.com/p/resources.html
- A Course in the Pali Language
- https://www.buddha-vacana.org/ or https://www.tititudorancea.com/z/tipitaka_english.htm for Pali-English biglottic texts.
- Johansson’s Pali Texts Explained to the Beginner has an appendix comparing Sanskrit and Pali.