Tuesday, October 27, 2015

Collection of DataScience - volume 2




TOC

1. Why is Cross Validation important? 12
Solution 12
Code 12
2. Why is Grid Search important? 13
Solution 13
Code 13
3. What are the new Spark DataFrame and the Spark Pipeline? And how we can use the new ML library for Grid Search 14
Solution 14
Code 15
4. How to deal with categorical features? And what is one-hot-encoding? 17
Solution 17
Code 18
5. What are generalized linear models and what is an R Formula? 18
Solution 18
Code 19
6. What is the Word2Vec distributed representation? 19
Solution 19
Code 20
7. What are the Decision Trees? 20
Solution 20
Code 22
8. What are the Ensembles? 23
Solution 23
9. What is a Gradient Boosted Tree? 23
Solution 23
10. What is a Gradient Boosted Trees Regressor? 24
Solution 24
Code 24
11. Gradient Boosted Trees Classification 25
Solution 25
Code 25
12. What is a Random Forest? 27
Solution 27
Code 27
13. What is an AdaBoost classification algorithm? 28
Solution 28
14. What is a recommender system? 29
Solution 29
15. What is a collaborative filtering ALS algorithm? 29
Solution 29
Code 30
16. What is the DBSCAN clustering algorithm? 31
Solution 31
Code 31
17. What is a Streaming K-Means? 32
Solution 32
Code 33
18. What is the PCA Dimensional reduction technique? 33
Solution 33
Code 35
19. What is the SVD Dimensional reduction technique? 35
Solution 35
Code 36
20. What is Parquet? 36
Solution 36
Code 36
21. What is the Isotonic Regression? 37
Solution 37
Code 37
22. What is SVM with soft margins? 38
Solution 38
23. What is the Expectation Maximization Clustering algorithm? 39
Solution 39
24. What is a Gaussian Mixture? 40
Solution 40
Code 41
25. What is the Latent Dirichlet Allocation topic model? 41
Solution 41
Code 42
26. What is the Associative Rule Learning? 43
Solution 43
27. What is FP-growth? 44
Solution 44
Code 44
28. How to use the GraphX Library? 45
Solution 45
29. What is PageRank? And how to compute it with GraphX 46
Solution 46
Code 47
Code 47
30. What is Power Iteration Clustering? 48
Solution 48
Code 49
31. What is a Perceptron? 49
Solution 49
32. What is an ANN (Artificial Neural Network)? 50
Solution 50
33. What are the activation functions? 51
Solution 51
34. How many types of Neural Networks are known? 52
35. How can you train a Neural Network 53
Solution 53
36. What application have the ANNs? 54
Solution 54
37. Can you code a simple ANNs in python? 55
Solution 55
Code 55
38. What support has Spark for Neural Networks? 57
Solution 57
Code 57
39. What is Deep Learning? 58
Solution 58
40. What are autoencoders and stacked autoencoders? 62
Solution 62
41. What are convolutional neural networks? 63
Solution 63
42. What are Restricted Boltzmann Machines, Deep Belief Networks and Recurrent networks? 64
Solution 64
43. Neural Network – Deep Learning - Theano 66
Solution 66
Code 66
Complexity 66
44. Neural Network – Deep Learning - Theano 66
Solution 66
Code 67
Complexity 67
45. Neural Network – Deep Learning - Lasagne 67
Solution 67
Code 67
Complexity 67
46. Splines 67
Solution 67
Code 67
Complexity 67
47. Search – Hill Climbing, Simulated Annealing, Greedy 67
Solution 67
Code 67
Complexity 67
48. MonteCarlo 67
Solution 67
Code 68
Complexity 68
49. Sampling (Gibbs) 68
Solution 68
Code 68
Complexity 68
50. Hypothesis Testing 68
Solution 68
Code 68
Complexity 68
51. Text Mining 68
Solution 68
Code 68
Complexity 68
52. NLP tagging 68
Solution 68
Code 69
Complexity 69
53. Bloom Filters 69
Solution 69
Code 69
Complexity 69
54. minHash 69
Solution 69
Code 69
Complexity 69
55. LSH 69
Solution 69
Code 69
Complexity 69
56. Count Min Sketches 69
Solution 69
Code 69

No comments:

Post a Comment