# Data Platforms

## Databricks

1. [Databricks is ACID](https://databricks.com/glossary/acid-transactions)

<figure><img src="https://3144294592-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsyNRcze2iiLsRdtame8e%2Fuploads%2FjecU58evMseyYH7upV3E%2Fimage.png?alt=media&#x26;token=6330abae-b48d-4268-9c95-f673a12774ea" alt=""><figcaption></figcaption></figure>

2. DB Learning Library
   1. [Free courses](https://www.databricks.com/training/catalog?costs=free)
   2. [Docs Optimization recommendations](https://docs.databricks.com/en/optimizations/index.html)
   3. [Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads](https://www.databricks.com/discover/pages/optimize-data-workloads-guide)
   4. [Vector Search](https://www.databricks.com/training/catalog/new-capability-overview-vector-search-2535)
   5. DB [for ML](https://www.databricks.com/training/catalog/get-started-with-databricks-for-machine-learning-2460)
   6. [DB for Data Engineering](https://www.databricks.com/training/catalog/get-started-with-databricks-for-data-engineering-1511)
3. (good) [Introduction & Tutorial](https://medium.com/@chuck.connell.3/databricks-a-history-and-introduction-438ce827227) - cluster / notebook / table / SQL / DataFrame / connections
4. [must know 7 concepts](https://www.datacamp.com/tutorial/introduction-to-databricks)

<figure><img src="https://3144294592-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsyNRcze2iiLsRdtame8e%2Fuploads%2FjL9Kl4FpSW6wvL5Egufb%2Fimage.png?alt=media&#x26;token=4139bb1b-1e10-4133-99b2-46dee7bf11ed" alt=""><figcaption></figcaption></figure>

<figure><img src="https://3144294592-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FsyNRcze2iiLsRdtame8e%2Fuploads%2FVh1IdCr7v3FzO3nkYiqm%2Fimage.png?alt=media&#x26;token=1e830297-cfb4-45c0-abeb-d4bf9dfa50b7" alt=""><figcaption></figcaption></figure>

5. RDD vs Dataframe vs Dataset
   1. [2016 official blog post](https://www.databricks.com/blog/2016/07/14/a-tale-of-three-apache-spark-apis-rdds-dataframes-and-datasets.html)
   2. [linkedin blog post](https://www.linkedin.com/pulse/rdd-vs-dataframe-dataset-sanyam-jain-iwsfe/)
   3. [comparison on youtube](https://www.youtube.com/watch?v=aBUqIAGxeg8)
   4. [RDDs vs. Dataframes vs. Datasets – What is the Difference and Why Should Data Engineers Care?](https://www.analyticsvidhya.com/blog/2020/11/what-is-the-difference-between-rdds-dataframes-and-datasets/)
6. Optimizations
   1. [Optimization recommendations on Databricks](https://docs.databricks.com/en/optimizations/index.html)
   2. [Comprehensive Guide to Optimize Databricks, Spark and Delta Lake Workloads](https://www.databricks.com/discover/pages/optimize-data-workloads-guide)
   3. [How I Use Caching in Databricks to Increase Performance and Save Costs](https://blog.det.life/caching-in-databricks-explained-68c07bf1f76b)
   4. [Why and How: Partitioning in Databricks](https://medium.com/@eduard2popa/why-and-how-partitioning-in-databricks-e9e6f960db43)
7. Best Practices
   1. [official docs](https://docs.databricks.com/en/delta/best-practices.html)
