Airflow Lessons from the Data Engineering Front in Chicago

Alison
Stanton Ventures Insights
2 min readNov 13, 2017

--

Recently (11/1/2017), I hosted a meetup of Chicagoland Airflow users. Despite being only 4 people it was probably the most educational and productive meetup I’ve ever attended.

Here are the learnings the 4 of us agreed upon based on our hand-on experiences:

  • Avoid using backfill. You can use Variables to track start_date instead.
  • Avoid using sub-DAGs.
  • Restart the scheduler often. Cron times to restart it that people had heard of or used ranged from 1 minutes to 6 hours.
  • If changing how frequently a DAG runs, change the DAG name (i.e. add _v2 to the end of the dag name) to avoid errors. In other words, if you were running once a day and now are going to run it once a week, change the DAG name from x_to_y_dag to x_to_y_dag_v2.
  • Instead of using the ExternalTaskSensor for external dependencies, use XCom. Some folks had issues with the ExternalTaskSensor.
  • In terms of Executors, the LocalExecutor works to a point. If you only run a few workflows and it works for you, the gist of our conversation was to stay with the LocalExecutor as long as you can. There were concerns about going to the CeleryExecutor (mostly related to memory). We’re curious to know how good the Mesos support is and in general more about the Mesos community contributed option.
  • In terms of storing secrets folks in our group are using Kubernetes Vault, AWS keys, or environment variables.
  • The best practice the group has found is to do launch task, sensor task, run task instead of trying to do launch and run in one task.
  • Not all of us had reached the production deployment stage but Kubernetes works. One of the folks in our group has is running successfully in Docker containers. Another thing that works is having a cron pull DAGs from a git repo.

And finally we had a question we weren’t able to answer among the 4 of us: what’s the difference between DAG concurrency and DAG parallelism? If you have an answer, please let us know!

Stanton Ventures is a data consultancy.

--

--

Alison
Stanton Ventures Insights

All Things Data and Databases. Knitter. I listen to #womenintech. She/Her/Hers.