I am attempting to copy data into snowflake on an AWS Lambda. I have a situation right now where I have a dataframe that has no duplicates in it. I verify this by checking my dataframe like so:
df.duplicated().any()
and verify that it returns False
I then double check by filtering by what should be a unique value in the dataframe
df[df["myColumn"] == "uniqueValue"]
and I get 1 result.
I then run the following:
write_pandas( conn=con, df=df, table_name=table_name, database=database, schema=schema, chunk_size=chunk_size, quote_identifiers=False, )
and then when the data lands in the Snowflake table and I query it, there are 5 of each row in the SF database.
I verified that this function only runs one time as well.
Why am I getting 5 duplicates?
EDITOK so I realized it's not related to this package. The issue is that after 1 minute the lambda is triggered again, and then again 1 minute later, etc. until it's been triggered 5 times.
No idea why it's being triggered multiple times though because all of the executions succeed eventually, but there are 5 of them running before the first one actually completes
UPDATE
Verified that it's not a memory issue and not a timeout issue.
What I have noticed is that when an API Call is made to retrieve some external data is when the next lambda seems to be triggered. Not sure why that would play a role but it seems to be affecting it.
Also, it's not set at 5 times, it will just re-trigger every minute until the first lambda execution finishes. I can see that the logs stop when the API call starts, and it's at that same log mark that I see the next lambda execution start.