In Apache Spark’s Spark SQL, you can create “Views” as temporary tables. There are two types of views depending on their scope.
Types of Views in Spark SQL
Temporary View:
- Associated only with the
SparkSessionthat created it. - The view’s namespace is limited to the internal scope of the
SparkSessionthat created it. - When the
SparkSessionterminates, the temporary view is automatically destroyed. - Views created by default are temporary views.
- Associated only with the
Global Temporary View:
- Associated not with a single
SparkSession, but with theSparkSessioninstance shared across the entire Spark application. - The view’s namespace is shared across all
SparkSessioninstances within the Spark application. - It is not destroyed when a
SparkSessionterminates, and persists until the Spark application ends. - Created using the
CREATE GLOBAL TEMPORARY VIEWsyntax.
- Associated not with a single
Namespace Conflicts During Parallel Execution
When executing Spark SQL queries in parallel, whether namespace conflicts become a problem depends on the type of view used.
Temporary Views: Temporary views are associated with their own independent
SparkSession. In Spark’s parallel processing, if each task or thread has its ownSparkSession, the temporary views created within eachSparkSessionhave a namespace independent from views in otherSparkSessioninstances. Therefore, namespace conflicts do not occur when running temporary views in parallel.Global Temporary Views: Since global temporary views are shared across the entire Spark application, if multiple
SparkSessioninstances try to create a global temporary view with the same name, a namespace conflict will occur.
Conclusion
In Spark SQL, unless you explicitly use the GLOBAL keyword, views created by default are temporary views. Since temporary views are scoped to each SparkSession, namespace conflicts during parallel execution generally do not occur.
If you need to share views across multiple SparkSession instances, you would use global temporary views, but in that case, you need to consider naming conventions and management strategies to avoid name conflicts.