Article | Best practices for documenting Dataiku projects

Documenting your work in Dataiku ensures projects remain understandable, reusable, and easier to improve over time. This article walks through practical ways to document your Dataiku projects so you can save time, reduce rework, and help your future self (and teammates) pick up where you left off.

Have you ever been so engrossed in a data analysis project you think you’ll never forget every important detail? Soon enough, it’s a year and 10 projects later and you can’t remember all the many components that made that project a success. Who decided to filter out the data? Why did we choose these specific business rules?

Documentation takes time to do, but in the long run, it saves much more time than it takes. Imagine all the time spent debugging a code notebook to find the issue. That process could take even longer if you didn’t write the code. Documentation helps you collaborate and share with others and is the proverbial “breadcrumb trail” to your future self when you have to troubleshoot or enhance something you created three projects ago.

Dataiku, a collaborative data science software platform, makes documentation easy. The visual coding flow allows you to quickly identify where each major command took place.

Blog 6 Image 1

Here are three areas for additional documentation that are well worth the time:

  • Project Description
  • Recipe Short Description
  • Prepare Recipe Comment

The data used for this example is from a Kaggle project using financial transaction data to detect fraudulent transactions and money laundering.

Project description

When creating a project, there should be a clear project description on the project page. Not only does it make it clear why the project exists, but also opens the opportunity for it to be reused by other projects. For example, if the project identifies target markets, another project could use this data for further analysis instead of reinventing the same data cleansing.

The project description should include:

  • The project purpose or goal.
  • A project sponsor or department owner.
  • A short description of the data sources.
  • A description of any data that was filtered before entering this project.

To view and edit your project description go to your project homepage by clicking on the project title near the top left corner of the screen. Add a project short description under the title for quick reference and searching.

Dataiku Documentation Bank Transactions

Recipe short description

Imagine opening a new project and being able to understand the way information is flowing and changing through the project without clicking on anything. Visual recipes make it easy to see the functionality of each recipe, but you can follow the builder’s train of thought even better by adding a recipe short description. This description is displayed when you hover over a recipe, allowing for a simple overview of each step in the flow without clicking into the recipes themselves.

The recipe short description should include:

  • The purpose of the recipe.
  • Business logic or reasons for removal of rows.
  • Key facts that would allow the user to understand the flow.

To edit the short description, click on the recipe and go to the information panel on the right side of the screen. Under the details header, navigate to About and click Edit. Enter the short description and click Save. You can also add a long description which will show in the Details panel to the right. In the long description, you can even use Markdown to add formatting.

Dataiku Documentation Bank Transactions Metadata

Prepare recipe comment

Within the prepare recipe, Dataiku allows you to create steps as part of the preparation script. While these steps give a lot of transparency into the actions taken on the data, it is still important to add comments to any step that isn’t straightforward, or steps you want to remember the reasoning behind.

There should be a comment on the step if:

  • There was business logic incorporated.
  • Any step rows are removed.
  • A column has been duplicated or “find and replace” was used.
  • It is a formula step.
  • A column is renamed.

To add a comment to a preparation recipe step, open the preparation recipe and navigate to a step in the script panel on the left. Click on the “i” symbol on the step to comment and edit. Select Always show comment to keep it in view.

Dataiku Documentation Bank Transactions Dataset

A few minutes of documentation will save you time, help you collaborate with others, and provide you with troubleshooting assistance in the future. Dataiku’s visual recipes already provide valuable documentation out of the box with other great features such as wikis, dashboards, and the timeline on the project page.

These features combined with our list of additional documentation will ensure your project stays clean and tidy and ready to reuse again!

Ready to learn best practices from a top Dataiku partner?

Request a free 30-minute consultation with our experts.

A little documentation today saves a lot of head-scratching tomorrow. By taking the time to capture the why behind your workflows in Dataiku, you set yourself and your team up for better collaboration, faster troubleshooting, and more reusable analytics.

If you’d like support designing documentation standards or levelling up your Dataiku environment, our consultants are here to help. Request a free 30-minute consultation and let’s make your future projects even smoother.