aws_glue_crawler resource

Use the aws_glue_crawler InSpec audit resource to test properties of a single AWS Glue crawler.

The AWS::Glue::Crawler resource specifies an AWS Glue crawler.

For additional information, including details on parameters and properties, see the AWS documentation on Glue Crawler.

Syntax

Ensure that a crawler name exists.

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
  it { should exist }
end

Parameters

name (required): The name of the crawler.

Properties

name: The name of the crawler.
role: The ARN of an IAM role that’s used to access customer resources, such as Amazon Simple Storage Service (Amazon S3) data.
target: A collection of targets to crawl.
database_name: The name of the database in which the crawler’s output is stored.
description: A description of the crawler.
classifier: A list of UTF-8 strings that specify the custom classifiers that are associated with the crawler.
recrawl_policy: A policy that specifies whether to crawl the entire dataset again, or to crawl only folders that were added since the last crawler run.
schema_change_policy: The policy that specifies update and delete behaviors for the crawler.
lineage_configuration: A configuration that specifies whether data lineage is enabled for the crawler.
state: Whether the crawler is running, or whether a run is pending.
table_prefix: The prefix added to the names of tables that are created.
schedule: For scheduled crawlers, the schedule when the crawler runs.
crawl_elapsed_time: If the crawler is running, contains the total time elapsed since the last crawl began.
creation_time: The time that the crawler was created.
last_updated: The time that the crawler was last updated.
last_crawl: The status of the last crawl, and potentially error information if an error occurred.
version: The version of the crawler.
configuration: Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler’s behavior.
crawler_security_configuration: The name of the SecurityConfiguration structure to be used by this crawler.

Examples

Ensure a crawler name is available:

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
  its('name') { should eq 'CRAWLER_NAME' }
end

Verify the database name in the crawler:

describe aws_glue_crawler(name: 'CRAWLER_NAME') do
    its('database_name') { should eq 'CRAWLER_DATABASE_NAME' }
end

Matchers

For a full list of available matchers, see our Universal Matchers page.

This resource has the following special matchers.

exist

Use should to test that the entity exists.

describe aws_glue_crawler(name: 'crawler_name') do
  it { should exist }
end

Use should_not to test the entity does not exist.

describe aws_glue_crawler(name: 'dummy') do
  it { should_not exist }
end

be_available

Use should to check if the work_group name is available.

describe aws_glue_crawler(name: 'crawler_name') do
  it { should be_available }
end

AWS Permissions

Your AWS principal will need the EC2:Client:GetCrawlerResponse action with Effect set to Allow.