Bunch of interesting slides from Slideshare.
- useful tips about performance and optimisation
- partitioning: few thousands of partitions are OK, tens of thousands are too much
- partition file sizes should be hundreds of MB and more
- Parquet: snappy vs. gzip compression (faster vs. more efficient)
- before Impala 1.2.1 was necessary to place the biggest table in the query first, but since that version Impala does automatic rearrangement
-
Odkaz na článek
Vytvořil 26. března 2016 ve 23:03:44
mira. Upravováno 23x, naposledy 29. března 2016 v 19:54:26,
mira