

For example, a Python function to read from S3 and push to a database is a task. Tasks are defined as “what to run?” and operators are “how to run”. Note: Don’t confuse operators with tasks. Here the energy_operator is an instance of PythonOperator that has been assigned a task_id, a python_callable function and some DAG to be a part of it. In fact a task is the instance of the operator,like: energy_operator = PythonOperator ( task_id = 'print_date', python_callable =myfunc ( ), dag =dag ) Operators refer to tasks that they execute. You can also come up with a custom operator as per your need. Sensor - waits for a certain time, file, database row, S3 key, etc….MySqlOperator, SqliteOperator, PostgresOperator, MsSqlOperator, OracleOperator, JdbcOperator, etc.SimpleHttpOperator - sends an HTTP request.PythonOperator - calls an arbitrary Python function.There are different types of operators available(As given on Airflow Website) : An operator defines an individual task that needs to be performed. Although each one can mention multiple tasks, it’s a good idea to keep one logical workflow in one file. DAGs are defined in Python files that are placed in Airflow’s DAG_FOLDER. A graph- it’s a very convenient way to view the process.

Directed means the tasks are executed in some order.Īcyclic- as you cannot create loops (i.e. Airflow concepts DagĪn Airflow workflow is designed as a directed acyclic graph (DAG). It is generally best suited for regular operations which can be scheduled to run at specific times.

Airflow etl how to#
In this blog, I cover the main concepts behind Apache Airflow and illustrate a step-by-step tutorial with examples on how to make Airflow work better for you.
