| --- |
| license: mit |
| tags: |
| - natural-language-processing |
| - code-generation |
| - torch |
| - lstm |
| --- |
| |
| This generative text model was trained using [Andrej Karpathy's code](https://github.com/karpathy/char-rnn) on homeworks by [Linguistics students'](https://ling.hse.ru/en/) homework assignments for a beginning Python course of HSE University in 2017. |
|
|
| Model was trained with size 512 and 3 layers, dropout 0.5. |
|
|
| ## Usage |
|
|
| The procedure for installing the required software is described [by Karpathy](https://github.com/karpathy/char-rnn), torch is required, the code is written in lua. Be careful, versions of libraries written many years ago are used. |
|
|
| ```bash |
| th sample.lua lm_lstm_epoch27.89_0.7387.t7 -length 10000 -temperature 0.5 -primetext 'some text' |
| ``` |
|
|
| ## Train data |
|
|
| Train corpus consists of joined programms in to one file inclded in this repository as `input.txt` |
|
|
|
|
| ## What for? |
|
|
| In an era of winning Transformers, ancient RNN models seem archaic. But I see that they still work better than modern architectures with such important categories from the humanities point of view as individual style. |
|
|
| This model was created just or fun of students at the end of the course in 2017. |
|
|
| ## Samples |
|
|
| ### temperature 0.5 |
|
|
| ```python |
| some text] and re.search('<meta content=\"(.*)\" name=\"author\"></meta>", oneline): |
| for line in a: |
| if re.search('<w><ana lex=\"(.+)\" gr=\".+"></ana>(.+?)</w>', line): |
| s = re.search(reg_adj, line) |
| if r: |
| k = re.search('<meta content="(.+?)" name="author">', txt)) |
| sentences = re.sub('</w>', '', s) |
| with open('file.txt', 'a', encoding = 'utf-8') as f: |
| f.write(i+' '+count_words(f) |
| f.write('\n') |
| f.write('Выполняется файлов в папке в нет |
| можно сделеть слово слово в папка с цифрами в названии в папка с программой и папенается в тексте нет разной инит.') |
| print('Творительный падеж, единственное число') |
| elif word.endswith('ах') or word.endswith ('ям'): |
| print('Poss |
| ``` |
|
|
| ### temperature 0.6 |
|
|
| ```python |
| |
| def noun_midles(words): |
| print(result) |
| def main(): |
| print('В тексте нет попыгамителись попытка слов в препинания в ланное не равно киличество файлов (' + str(arr)) |
| def main(): |
| maxi = max_pmi_any(s, 'answ') |
| print(count_form(textik, dictionary) |
| def main(): |
| forms = open_file() |
| words = open_text(way_to_file) |
| words = [] |
| for i in range(len(forms)): |
| if '.' |
| words += word.strip('.,!?//()":;/|\)\'»\n\t ') |
| reg_author = '<meta content="(.+?)" name="author"' |
| bigrams.append(f +'\t'+str(pos[forms[i])+1 |
| else: |
| dic[file] = 1 |
| else: |
| d[key] = 1 |
| else: |
| dic[key] = 1 |
| else: |
| dic[lemmes[i]] += 1 |
| return d |
| def write_out_count_forms(text): |
| arr = re.findall('<w>(.+?)</w>', text) |
| return text |
| def find_max(string, 'words_anes) |
| |
| ``` |
|
|
| ### temperature 0.7 |
|
|
| ```python |
| |
| import re |
| def main(): |
| maxi = max(pmi) |
| number = int(input('Введите слово: ') |
| if os.path.isfile(f): |
| for key in d: |
| f.write(key + '\n') |
| f.close() |
| return |
| def main(): |
| text = text_process('text.txt') |
| words = [] |
| words = [] |
| for word in words: |
| word = word.strip('.,;:?!')) |
| f.close() |
| return forms |
| def names_file(fname): |
| with open (fname, 'r', encoding = 'utf-8') as f: |
| text = f.read() |
| return text |
| def count_text(text): |
| text2 = re.sub(u'<.*?></w>', text) |
| return text |
| def count_text(word, text): |
| t = open_text(fname) |
| return file |
| def author('text.txt'): |
| for i in range(len(reg)): |
| forms[i] = words[i].strip('.,?!()*&^%$ |
| file[i] = file[i].strip('.,?!()*&^%$ |
| for k in range(len(list_)): |
| if len(strings)>1: |
| print('Олонаким препинания.html', 'a раздания') |
| word=re.sub('<.*?>', '', word, text) |
| |
| |
| ``` |