ファミリーサイズを追加してみる。
ソースはこちらを参照した。
https://www.kaggle.com/lperez/titanic-a-deeper-look-on-family-size
主要な変更部分だけ記載
まずはfamilysizeを作成する
d_train = pd.read_csv('train.csv') d_test = pd.read_csv('test.csv') d_train['title'] = d_train['Name'].apply(get_title).map(Title_Dictionary) d_test['title'] = d_test['Name'].apply(get_title).map(Title_Dictionary) d_train['FamilySize'] = d_train['SibSp'] + d_train['Parch'] + 1 d_test['FamilySize'] = d_test['SibSp'] + d_test['Parch'] + 1
ParchとSibSpはどろっぷ
d_train = d_train.drop(['PassengerId','Name','Ticket','Cabin', 'Parch','SibSp'], axis=1)
分析にFamilySizeを追加、ParchとSibSpを削除する。
x_train = d_train[["Pclass", "Sex", "Age", "Fare", "Embarked", "FamilySize", "title"]].values x_test = d_test[["Pclass", "Sex", "Age", "Fare", "Embarked", "FamilySize", "title"]].values from sklearn.tree import DecisionTreeClassifier dtree = DecisionTreeClassifier(max_depth=8) dtree.fit(x_train,y_train) predictions = dtree.predict(x_test)
ようやく6000番台に突入。