In today’s data-driven world, gathering contact information efficiently can be a game-changer for businesses and marketers. Imagine a workflow that scrapes websites, intelligently extracts the most relevant business email addresses, and stores everything neatly in a Google Sheet—all automatically. This blog will walk you through exactly such a workflow, built using the powerful automation tool n8n, OpenAI’s AI capabilities, and Google Sheets.
What Does This Workflow Do?
The workflow automates the process of collecting business leads and contact emails from a list of websites you provide. It works like this:
- Takes website details including URLs and other metadata.
- Visits each website to scrape its content.
- Uses AI to identify the highest-authority business email (e.g., CEO or business owner’s email).
- Saves the results to a Google Sheets spreadsheet.
This automation makes data collection faster, more consistent, and easy to manage.
Initial Setup
Before diving into the automation, you need to set up some credentials in your n8n environment:
1. Google Sheets OAuth2 Credential
- This credential lets n8n access and write data directly to your Google Sheets.
- It’s required so the workflow can save extracted data to your spreadsheet.
- Be sure you have a Google Cloud project with Sheets API enabled and have created OAuth credentials.
2. OpenAI API Credential
- OpenAI’s API powers the AI Agent that extracts emails intelligently.
- You’ll need an API key from OpenAI with billing enabled.
- The AI evaluates parsed website content to find the highest-authority email.
Step-By-Step Workflow Explanation
1. Manual Trigger — Start the Process
- You can trigger the workflow manually from the n8n interface to run tests or process your data set.
- This gives you control when to start scraping.
2. Edit Fields — Prepare and Extract Key Information
- Incoming dataset fields such as:
- Title
- Address
- Website URL
- Phone Number
- Domain
- Place ID
- Email (if any)
- The workflow extracts the domain from the website URL to use for further processing.
3. Loop Over Items — Process Websites Sequentially
- The workflow loops over your list of websites, handling each one individually.
- Sequential processing ensures each website scrape completes before the next starts.
4. Scrape Website — Fetch Website Content
- Using n8n’s HTTP request capabilities or a scraping node, the workflow fetches all relevant content from the target website.
- This raw content is necessary for the AI Agent to analyze.
5. AI Agent (OpenAI) — Extract Business Email
- The AI is tasked specifically to:
- Find the business contact email, prioritizing top authority contacts like CEOs or business owners.
- Return a clear result or “Null” if no proper email is found.
- This step removes guesswork and uses natural language understanding to prioritize the right contacts.
6. Google Sheets — Save Results
- Extracted data (e.g.,
place_id
,email
, and other info) is appended or updated in your Google Sheets document. - The sheet name (“Results”) and columns align with the data structure for easy review and follow-up.
7. Conditional Operations — Handle Missing Emails Gracefully
- If the AI doesn’t return an email, the workflow proceeds without inserting empty or invalid data.
- Ensures your data remains clean and meaningful.
8. Wait and Completion — Smooth Processing
- A brief wait period after each iteration avoids API rate limits and ensures a smooth flow.
- The loop then continues with the next website or ends.
Benefits of This Workflow
- Automation of Email Extraction: No more manual digging through websites.
- Centralized Data: Collected data is immediately accessible in Google Sheets.
- Improved Accuracy: AI prioritizes the most authoritative and relevant business emails.
- Scalability: Easily scales with more websites added to the input.
Customizing the Workflow for Your Needs
- Add More Fields: Want to extract phone numbers, social media links, or other info? Update the ‘Edit Fields’ node to capture these.
- Refine AI Extraction: Customize the AI prompt to focus on specific roles or email types.
- Expand Dataset: Add more websites by uploading a larger dataset into the trigger node.