ここ大まかに分けて

序盤はプログミングの基礎的な問題
中盤は外部のライブラリで文章を解析する問題
終盤は機械学習の問題

といった感じです。後半に進むにつれてだんだん難易度が上がっていきます。私も８章までは順調に進んだのですが９章から処理時間が激増してだいぶ苦労しました。言語処理に限らず一般的にプログミングで使う技術はだいたい練習できるのでPython勉強中の方は前半だけでもやってみるといいですよ。というわけで前置きが長くなりましたが１章の問題をプログラミング教室の生徒にやって欲しいので小学生でも出来るようにヒントをつけて解説します。

第1章: 準備運動

00. 文字列の逆順

文字列”stressed”の文字を逆に（末尾から先頭に向かって）並べた文字列を得よ．

ヒント

スライスの３つ目のパラメータをうまく使うと… [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″]# 解答例１ s = ′stressed′ print(s[::-1]) # スライスの3つめに-1を入れると文字列が逆になります。 [/su_note] desserts [su_note note_color=”#e3e3e3″]# 解答例２ s = ′stressed′ ans =′′ for i in range(len(s)): ans += s[-(i+1)] print(ans) # for文を使って書くとこんな感じ [/su_note] desserts [/su_spoiler] [/su_accordion]

01. 「パタトクカシーー」

「パタトクカシーー」という文字列の1,3,5,7文字目を取り出して連結した文字列を得よ．

ヒント

これもスライスを使って一文字飛ばしで文字を表示しましょう。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] # 解答例１ s = ′パタトクカシー′ print(s[::2]) [/su_note] パトカー [su_note note_color=”#e3e3e3″] # 解答例２ s = ′パタトクカシー′ print(s[0]+s[2]+s[4]+s[6]) # インデックスは０から始まることに注意。これは少しかっこ悪いね [/su_note] パトカー [/su_spoiler] [/su_accordion]

02. 「パトカー」＋「タクシー」＝「パタトクカシーー」

「パトカー」＋「タクシー」の文字を先頭から交互に連結して文字列「パタトクカシーー」を得よ．

ヒント

関数zip()を使いましょう [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] s1 = ′パトカー′ s2 = ′タクシー′ ans = ′′ for c1, c2 in zip(s1, s2): ans += c1 + c2 print(ans) [/su_note] パタトクカシーー [/su_spoiler] [/su_accordion]

03. 円周率

“Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.”という文を単語に分解し，各単語の（アルファベットの）文字数を先頭から出現順に並べたリストを作成せよ．

ヒント

まず、文章から”,”と”.”を消します。次に文字列のメソッドsplit()を使って単語に分けます。最後に関数len()で文字数を数えます。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] # 解答例１ s = “Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.” s = s.replace(′.′, ′′) s = s.replace(′,′, ′′) s = s.split() l = [] for a in s: l.append(len(a)) print(l) [/su_note] [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9] [su_note note_color=”#e3e3e3″] # 解答例２ import re s = “Now I need a drink, alcoholic of course, after the heavy lectures involving quantum mechanics.” s = re.sub(′[,.]′, ′′, s) ans = [len(i) for i in s.split()] print(ans) [/su_note] [3, 1, 4, 1, 5, 9, 2, 6, 5, 3, 5, 8, 9, 7, 9] [/su_spoiler] [/su_accordion]

04. 元素記号

“Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.”という文を単語に分解し，1, 5, 6, 7, 8, 9, 15, 16, 19番目の単語は先頭の1文字，それ以外の単語は先頭に2文字を取り出し，取り出した文字列から単語の位置（先頭から何番目の単語か）への連想配列（辞書型もしくはマップ型）を作成せよ．

ヒント

03.番の問題と同じように単語に分けてから１文字か２文字かで場合分け [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] s = “Hi He Lied Because Boron Could Not Oxidize Fluorine. New Nations Might Also Sign Peace Security Clause. Arthur King Can.” sin = [1, 5, 6, 7, 8, 9, 15, 16, 19] ans = {i: k[0] if i+1 in sin else k[0:2] for i, k in enumerate(s.split(′ ′))} print(ans) [/su_note] {0: ′H′, 1: ′He′, 2: ′Li′, 3: ′Be′, 4: ′B′, 5: ′C′, 6: ′N′, 7: ′O′, 8: ′F′, 9: ′Ne′, 10: ′Na′, 11: ′Mi′, 12: ′Al′, 13: ′Si′, 14: ′P′, 15: ′S′, 16: ′Cl′, 17: ′Ar′, 18: ′K′, 19: ′Ca′} [/su_spoiler] [/su_accordion]

05. n-gram

与えられたシーケンス（文字列やリストなど）からn-gramを作る関数を作成せよ．この関数を用い，”I am an NLPer”という文から単語bi-gram，文字bi-gramを得よ．

ヒント

問題文の意味がわからないと思います n-gramとは簡単にいうとn個ずつに切り分けられたものの集まりです。 bi-gramのbiとは２という意味。例えば”Python”の文字bi-gramは”Py”,”yt”,”th”,”ho”,”on”です。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] def ngram(l, n): r = [] for i in range(len(l)-n+1): r.append(l[i:i+n]) return r s = “I am an NLPer” print(ngram(s.split(′ ′), 2)) print(ngram(s, 2)) [/su_note] [[′I′, ′am′], [′am′, ′an′], [′an′, ′NLPer′]] [′I ′, ′ a′, ′am′, ′m ′, ′ a′, ′an′, ′n ′, ′ N′, ′NL′, ′LP′, ′Pe′, ′er′] [/su_spoiler] [/su_accordion]

06. 集合

“paraparaparadise”と”paragraph”に含まれる文字bi-gramの集合を，それぞれ, XとYとして求め，XとYの和集合，積集合，差集合を求めよ．さらに，′se′というbi-gramがXおよびYに含まれるかどうかを調べよ．

ヒント

うう…　集合は高校で習うみたいですね小学生に説明するのは難しいかなこれは飛ばしてもいいです。 set型を使って解きます。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] # ngram関数は略 s1 = “paraparaparadise” s2 = “paragraph” X = set(ngram(s1, 2)) Y = set(ngram(s2, 2)) print(X.union(Y)) # 和集合 print(X.intersection(Y)) # 積集合 print(X.difference(Y)) # 差集合 print(“se” in X) print(“se” in Y) [/su_note] {′ad′, ′ph′, ′pa′, ′ap′, ′di′, ′se′, ′ar′, ′ag′, ′gr′, ′is′, ′ra′} {′ar′, ′ra′, ′ap′, ′pa′} {′ad′, ′se′, ′is′, ′di′} True False [/su_spoiler] [/su_accordion]

07. テンプレートによる文生成

引数x, y, zを受け取り「x時のyはz」という文字列を返す関数を実装せよ．さらに，x=12, y=”気温”, z=22.4として，実行結果を確認せよ．

ヒント

文字列フォーマットか文字列の足し算ですかね。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] # 解答例１ def hoge(x, y, z): return “{}時の{}は{}”.format(x, y, z) print(hoge(12, “気温”, 22.4)) [/su_note] 12時の気温は22.4 [su_note note_color=”#e3e3e3″] # 解答例２ def hoge(x, y, z): return str(x) + ′時の′ + str(y) + ′は′ + str(z) print(hoge(12, “気温”, 22.4)) [/su_note] 12時の気温は22.4 [/su_spoiler] [/su_accordion]

08. 暗号文

与えられた文字列の各文字を，以下の仕様で変換する関数cipherを実装せよ．英小文字ならば(219 – 文字コード)の文字に置換その他の文字はそのまま出力この関数を用い，英語のメッセージを暗号化・復号化せよ．

ヒント

関数ord()で文字を文字コード(数字)に変換できます。関数chr()で文字コードを文字に変換できます。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] def cipher(s): r = “” for c in s: if(c.islower()): r += chr(219-ord(c)) else: r += c return r secret = cipher(′i have an apple.′) print(secret) print(cipher(secret)) [/su_note] r szev zm zkkov. i have an apple. [/su_spoiler] [/su_accordion]

09. Typoglycemia

スペースで区切られた単語列に対して，各単語の先頭と末尾の文字は残し，それ以外の文字の順序をランダムに並び替えるプログラムを作成せよ．ただし，長さが４以下の単語は並び替えないこととする．適当な英語の文（例えば”I couldn′t believe that I could actually understand what I was reading : the phenomenal power of the human mind .”）を与え，その実行結果を確認せよ．

ヒント

Typoglycemiaとは　http://dic.nicovideo.jp/a/typoglycemia randomモジュールを使って文字をシャッフルしましょう。これで１章は終わりです。 [su_accordion] [su_spoiler title=”解答”] [su_note note_color=”#e3e3e3″] import random s = “I couldn′t believe that I could actually understand what I was reading : the phenomenal power of the human mind .” foo = [] for i in s.split(): if len(i) > 4: l = list(i[1:-1]) random.shuffle(l) l = ′′.join(l) foo.append(i[0]+l+i[-1]) else: foo.append(i) ans = ” “.join(foo) print(ans) [/su_note] I cl′noudt biveele that I cuold altlcuay uendsnatrd what I was radieng : the pmoennehal pweor of the huamn mind . [/su_spoiler] [/su_accordion]]]>

言語処理100本ノックやってみた（１章ヒント・解答）