본문 바로가기
서버

전국 주소 파싱 python

by HDobby 2022. 11. 23.

https://github.com/Hangeulkim/NationWideMap

 

GitHub - Hangeulkim/NationWideMap

Contribute to Hangeulkim/NationWideMap development by creating an account on GitHub.

github.com

다운 주소 : https://www.epost.go.kr/search/zipcode/areacdAddressDown.jsp



사용법 : 
  1. py파일의 db주소와 각자의 db에 맞게 세팅을 해줍니다.
  2. py와 같은 폴더에 다운받은 파일 압축을 푼뒤 py파일을 실행합니다.


  
  기본 설정은 로컬호스트의 몽고DB에 데이터를 저장합니다.
  
  zipcode       = 우편번호
  sido          = 시도
  sigungu       = 시군구
  eupmyeon      = 읍면
  eupmyeondong  = 읍면동 (읍면이 없는 경우 동이 대신 들어감)
  roadname      = 도로명 (가로수로 45라길 에서 가로수로 까지만)
  fullroadname  = 도로명 풀네임 (가로수로 45라길)
  lee           = 리
  building      = 건물명
  newAddress    = 신주소
  pastAddress   = 구주소

 

 

코드

더보기
import csv
import os
from pymongo import MongoClient

header = []
inputLists = []


def addDB():
    client = MongoClient(host='localhost', port=27017)
    db = client['jungook']
    collection = db['addresses']
    
    result = collection.insert_many(inputLists)

    
def makeAddress(zipcode, sido, sigungu, eupmyeon, gil, mainNewNum, subNewNum, dong, mainPastNum, subPastNum, building):
    newAddress = sido
    pastAddress = sido
    
    if(sido != sigungu):
        newAddress += " "+sigungu
        pastAddress += " "+sigungu
        
    if(eupmyeon != ""):
        newAddress += " "+eupmyeon
        pastAddress += " "+eupmyeon

    newAddress += " "+gil
    pastAddress += " "+dong
    
    newAddress += " "+mainNewNum
    pastAddress += " "+mainPastNum
    
    if(subNewNum != "0"):
        newAddress+="-"+subNewNum
    if(subPastNum != "0"):
        pastAddress+="-"+subPastNum
        
    newAddress += " ("
    newAddress += dong
    
    if(building != ""):
        newAddress+=", "+building
        pastAddress += " ("+building+")"
    newAddress+=")"
    
    return newAddress, pastAddress
    

def parsing(line):
    global header
    global inputLists
    
    addressDict = {key: value for key, value in zip(header, line)}
    zipcode = addressDict["우편번호"]
    sido = addressDict["시도"]
    sigungu = addressDict["시군구"]
    eupmyeondong = addressDict["읍면"]
    if(eupmyeondong == ""):
        eupmyeondong = addressDict["법정동명"]
        
    eupmyeon = addressDict["읍면"]
    roadname = addressDict["도로명"][:addressDict["도로명"].find("로")+1]
    fullRoadName = addressDict["도로명"]
    mainNewNum = addressDict["건물번호본번"]
    subNewNum = addressDict["건물번호부번"]
    
    dong = addressDict["법정동명"]
    if dong == "":
        dong = addressDict["리명"]
    
    lee = addressDict["리명"]
    mainPastNum = addressDict["지번본번"]
    subPastNum = addressDict["지번부번"]
    building = addressDict["시군구용건물명"]
    newAddress, pastAddress = makeAddress(zipcode, sido, sigungu, eupmyeon, fullRoadName, mainNewNum, subNewNum, dong, mainPastNum, subPastNum, building)
    
    
    inputLists.append({
        "sido": sido,
        "sigungu": sigungu,
        "eupmyeondong": eupmyeondong,
        "lee": lee,
        "roadname": roadname,
        "fullRoadName": fullRoadName,
        "building": building,
        "newAddress": newAddress,
        "pastAddress": pastAddress
        })
    
def readFolder(): 
    global header
    global inputLists
    
    folderPath="./zipcode_DB"
    allList = os.listdir(folderPath)
    fileList = [file for file in allList if file.endswith(".txt")]
    for file in fileList:
        inputLists=[]
        with open(folderPath+"/"+file, encoding="utf-8-sig") as fileData:
            csvReader = list(csv.reader(fileData, delimiter='|'))
            header = csvReader[0]
            for line in csvReader[1:]:
                parsing(line)
        print(file)
        addDB()

    '''
    행정동 과 읍면 일치
    읍면이 없는 경우
    시군구 - 읍면 - 도로명 - 건물 본번 - 건물 부번?
    
    '''
if __name__ == '__main__':
    readFolder()

    print("end")
728x90

'서버' 카테고리의 다른 글

GraphQL vs Restful API  (0) 2022.12.24
REST API 정리  (1) 2022.12.03
Spring MVC vs Spring Boot  (0) 2022.11.28
spring vs spring boot  (0) 2022.11.24
Django python vs Spring Boot kotlin  (0) 2022.11.22

댓글