Política Brasileira entry#2

Today I've spent the majority of the time trying to figure it out the API, but it has conflicting information: the output values are either XML or HTML, and I cannot seem to reach the desired output. I've read many tutorials on Web Scraping, BeautifulSoup and requests, and none of them seems to have the desired outcome. What I did next was to download the entire database for the year 2019 and starting cleaning up on pandas. The original data frame had 31 columns, and now we're down to 11, all with relevant information. Next, I'll transform one of the columns in a timestamp in order to drop another two. One of the columns, which lists the reasons for any given payment, has about 12 different reasons - I'm gonna further group them in 4 different groups and create dummy variables for them, in order to be able to run machine learning models. Right now I'm stuck on how to split the timestamp(which is still an object, as I haven't been able to transform it yet), but I'm sure I'll be able to figure it out tomorrow. I'm adapting my strategy to clean the data from the *.csv files I've downloaded, rather than trying to figure it out the API. I'm adding another 4 pomodori to the tally. CORRECTION: I've used a lambda expression to split the object and transform it into a timestamp. Code:

df_clean['Data de Emissao'].apply(lambda x: x.split('T')[0])

No Comments Yet