American Express Interview Question: When would you use Random For... | Glassdoor.co.in

Interview Question

Business Analyst Interview(Student Candidate) Kolkata

When would you use Random Forest over SVM

Answer

Interview Answer

3 Answers

4

When there is more than two classes in a problem

Anonymous on 16-Nov-2018
0

I would say, the choice depends very much on what data you have and what is your purpose. A few "rules of thumb".

Random Forest is intrinsically suited for multiclass problems, while SVM is intrinsically two-class. For multiclass problem you will need to reduce it into multiple binary classification problems.

Random Forest works well with a mixture of numerical and categorical features. When features are on the various scales, it is also fine. Roughly speaking, with Random Forest you can use data as they are. SVM maximizes the "margin" and thus relies on the concept of "distance" between different points. It is up to you to decide if "distance" is meaningful. As a consequence, one-hot encoding for categorical features is a must-do. Further, min-max or other scaling is highly recommended at preprocessing step.

If you have data with nn points and mmfeatures, an intermediate step in SVM is constructing an n×nn×n matrix (think about memory requirements for storage) by calculating n2n2 dot products (computational complexity). Therefore, as a rule of thumb, SVM is hardly scalable beyond 10^5 points. Large number of features (homogeneous features with meaningful distance, pixel of image would be a perfect example) is generally not a problem.

For a classification problem Random Forest gives you probability of belonging to class. SVM gives you distance to the boundary, you still need to convert it to probability somehow if you need probability.

For those problems, where SVM applies, it generally performs better than Random Forest.

SVM gives you "support vectors", that is points in each class closest to the boundary between classes. They may be of interest by themselves for interpretation.

Anonymous on 17-Apr-2019
0

I would say, the choice depends very much on what data you have and what is your purpose. A few "rules of thumb".

Random Forest is intrinsically suited for multiclass problems, while SVM is intrinsically two-class. For multiclass problem you will need to reduce it into multiple binary classification problems.

Random Forest works well with a mixture of numerical and categorical features. When features are on the various scales, it is also fine. Roughly speaking, with Random Forest you can use data as they are. SVM maximizes the "margin" and thus relies on the concept of "distance" between different points. It is up to you to decide if "distance" is meaningful. As a consequence, one-hot encoding for categorical features is a must-do. Further, min-max or other scaling is highly recommended at preprocessing step.

If you have data with nn points and mmfeatures, an intermediate step in SVM is constructing an n×nn×n matrix (think about memory requirements for storage) by calculating n2n2 dot products (computational complexity). Therefore, as a rule of thumb, SVM is hardly scalable beyond 10^5 points. Large number of features (homogeneous features with meaningful distance, pixel of image would be a perfect example) is generally not a problem.

For a classification problem Random Forest gives you probability of belonging to class. SVM gives you distance to the boundary, you still need to convert it to probability somehow if you need probability.

For those problems, where SVM applies, it generally performs better than Random Forest.

SVM gives you "support vectors", that is points in each class closest to the boundary between classes. They may be of interest by themselves for interpretation.

Anonymous on 17-Apr-2019

Add Answers or Comments

To comment on this, Sign In or Sign Up.