Google: CustomSearch APIで画像を取得する - ビジネスパーソン・ガジェット置場　empty lot for business

ディープラーニングで学習させるための画像を集める方法の中で、GoogleのCustomSearch APIを使用する方法があります。今回はそちらを紹介されている記事をいくつか実装して画像を集めてみました。

今回参照させていただいたサイト

Google CloudでAPIを取得する方法はこちらの二つのサイトを参照させて頂きました。

【2022年UI変更】Googleの検索結果をAPIで取得する方法～簡単４ステップ・まとめ付き～ | Plusプロジェクトマネージャーオフィシャルページ

Google Custom Search APIを利用した検索結果の取得方法【無料枠】 | プラスブログ

変更の多いGoogleのサービスですが、2022/12/15現在は上記で解説いただいている方法でいけてます。

また、取得したAPIキーを使用して画像を取得する方法は下記のサイトを参照させて頂きました。（目的と幸せを感じる作業が一緒でした）

機械学習で乃木坂46を顏分類してみた | Aidemy | 10秒で始めるAIプログラミング学習サービスAidemy［アイデミー］

コード

上記の乃木坂46の顔分類をご紹介いただいているサイトが、インデントが見えなくなってしまっていたのと、自分はIveの顔分類をしているので、その部分、若干変更しています。

import urllib.request
from urllib.parse import quote
import httplib2
import json
import os
import cv2
import sys
import shutil

# Google Cloud Platformで作成したAPIキー
API_KEY = "ここにAPIキーを入れる"

# Custom Search Engineの検索エンジンID
CUSTOM_SEARCH_ENGINE = "ここに検索エンジンIDを入れる"

# 検索させるキーワード
keywords = ['ive イソ', 'ive ウォニョン', 'ive ガウル', 'ive ユジン', 'ive リズ', 'ive レイ']
img_list = []

# 画像のURLを取得する関数
 def get_image_url(search_item, total_num):    
    i = 0
    while i < total_num:
        query_img = "https://www.googleapis.com/customsearch/v1?key=" + API_KEY + "&cx=" + CUSTOM_SEARCH_ENGINE + "&num=" + str(10 if(total_num-i) > 10 else (total_num-i)) + "&start=" + str(i+1) + "&q=" + quote(search_item) + "&searchType=image"
        res = urllib.request.urlopen(query_img)
        data = json.loads(res.read().decode('utf-8'))
        for j in range(len(data["items"])):
            img_list.append(data['items'][j]['link'])
        i += 10
    return img_list

# 取得したURLから画像を保存していく関数
def get_image(search_item, img_list, j):
    opener = urllib.request.build_opener()
    http = httplib2.Http(".cache")
    for i in range(len(img_list)):
        try:
            # splitextでファイル名から拡張子を取得する
            fn, ext = os.path.splitext(img_list[i])
            print(img_list[i])
            response, content = http.request(img_list[i])
            filename = os.path.join("origin_image", str('{0:02d}'.format(j)) + "." + str(i) + ".jpg")
            with open(filename, 'wb') as f:
                f.write(content)
        except:
            print("failed to download the image.")
            continue

# キーワードごとに上記を実行
for j in range(len(keywords)):
    print(keywords[j])
    img_list = get_image_url(keywords[j], 100)
    get_image(keywords[j], img_list, j)