Does my prediction model predict the right thing?

  dataframe, prediction, python, xgboost

Situation: students take tests that have different characteristics (length of text, number of difficult words, they may or may not contain pictures etc.). The tests can be either "Hard" or "Easy". To predict whether they ae hard or easy, I used all these features and created several accurate models (with XGBoost for example).

Now: students can either fail (0) or pass (1) a test and I want to predict whether they would fail or pass. This is similar to a recommendation system, where users like or dislike items. That’s why I could apply a collaborative-filtering system that works well but obviously we need lots of data for that.

Problem: I want to use the characteristics of the tests to predict whether they would fail or pass. But if I have a dataframe like this:

studentId          testId        Length        Words      picture      result
       s1              t1            10         8.50           0            0
       s1              t2            11         9.80           1            0
       s1              t3            11        10.40           1            1
       s2              t2            11         9.80           1            0
       s2              t4            60         9.99           0            1
       s3              t7            40         6.45           0            1

And I just do (obviously with the necessary imports):

cols_to_drop = ['testId', 'studentId']
df.drop(cols_to_drop, axis=1, inplace=True)
X = df.drop('result', axis=1) 
y = df['result']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.20, random_state=5)
model = XGBClassifier(), y_train) 
y_pred = model.predict(X_test) 

Does it really answer the question? ("Would student s3 fail or pass the test t7?"). I feel like this is really just what I did for predicting whether a test is hard or easy, but this time it’s "hard" or "easy" from the student’s perspective.

I looked into content-based filtering and it seems so different from what I am doing here and that’s why I’m worried.

Source: Python Questions